diff mbox series

[RFC,v2,1/2] x86/fpu: Fix state corruption in __fpu__restore_sig()

Message ID b69df1e42d1235996682178013f61d4120b3b361.1622351443.git.luto@kernel.org
State New
Headers show
Series [RFC,v2,1/2] x86/fpu: Fix state corruption in __fpu__restore_sig() | expand

Commit Message

Andy Lutomirski May 30, 2021, 5:12 a.m. UTC
Prior to adding XSAVES support, some of fpu__clear()'s callers
required that fpu__clear() successfully reset current's FPU state
even if current's XSAVE header was corrupt.  commit b860eb8dce59
("x86/fpu/xstate: Define new functions for clearing fpregs and
xstates") split fpu__clear() into two different functions, neither
of which was capable of correctly handling a corrupt header.

As a practical matter, trying to specify what
fpu__clear_user_states() is supposed to do if current's XSAVE area
is corrupt is difficult -- it's supposed to preverve supervisor
states, but the entire XSAVES buffer may be unusable due to corrupt
XFEATURES and/or XCOMP_BV.

Improve the situation by completely separating fpu__clear_all() and
fpu__clear_user_states().  The former is, once again, a full reset.
The latter requires an intact header and resets only user states.
Update and document __fpu__restore_sig() accordingly.

Cc: stable@vger.kernel.org
Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/fpu/core.c   | 62 ++++++++++++++++++++++++------------
 arch/x86/kernel/fpu/signal.c | 50 +++++++++++++++++++++++++----
 2 files changed, 84 insertions(+), 28 deletions(-)

Comments

Andy Lutomirski May 30, 2021, 11:41 p.m. UTC | #1
On 5/30/21 3:02 PM, Thomas Gleixner wrote:
> Andy,
> 
> On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote:
>>
>> Cc: stable@vger.kernel.org
>> Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
>> Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
> 
> Debugged-by ...
> 
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> 
> ...
> 
>>  /*
>> - * Clear the FPU state back to init state.
>> - *
>> - * Called by sys_execve(), by the signal handler code and by various
>> - * error paths.
>> + * Reset current's user FPU states to the init states.  The caller promises
>> + * that current's supervisor states (in memory or CPU regs as appropriate)
>> + * as well as the XSAVE header in memory are intact.

^^^ The caller promises this

>>   */
>> -static void fpu__clear(struct fpu *fpu, bool user_only)
>> +void fpu__clear_user_states(struct fpu *fpu)
>>  {
>>  	WARN_ON_FPU(fpu != &current->thread.fpu);
>>  

...

>> +	/*
>> +	 * Reset user states in registers.
>> +	 */
>> +	copy_init_fpstate_to_fpregs(xfeatures_mask_user());
>> +
>> +	/*
>> +	 * Now all FPU registers have their desired values.  Inform the
>> +	 * FPU state machine that current's FPU registers are in the
>> +	 * hardware registers.
>> +	 */
>>  	fpregs_mark_activate();
>> +
>>  	fpregs_unlock();
> 
> This is as wrong as before. The corrupted task->fpu.state still
> survives.

See above.  This function is not intended to fix the header.

> 
> For f*cks sake, I gave you a reproducer and a working patch and I
> explained it in great length what's broken and what needs to be fixed.

This patch fixes your reproducer and my (to-be-sent) reproducer.  I
tested it on a machine that physically has the XRSTORS instruction but I
disabled it using virt.  Are you still seeing failures with this patch
applied?  I can try to test on a different CPU.

> 
> And of course you kept the bug which was in the offending commit,
> i.e. not wiping the task->fpu.state corruption which causes the next
> XRSTOR to fail:
> 
> [   34.095020] Bad FPU state detected at copy_kernel_to_fpregs+0x28/0x40, reinitializing FPU registers.
> [   34.095052] WARNING: CPU: 0 PID: 1364 at arch/x86/mm/extable.c:65 ex_handler_fprestore+0x5f/0x70
> ...
> [   34.153472]  switch_fpu_return+0x40/0xb0
> [   34.154196]  exit_to_user_mode_prepare+0x8f/0x180
> [   34.155060]  syscall_exit_to_user_mode+0x23/0x50
> [   34.155912]  do_syscall_64+0x4d/0xb0

I don't think that the code path that calls fpu__clear_user_states() is
subject to header corruption.  If it is, then both your patch and my
patch have issues -- trying to fish the supervisor state out of a
corrupt XSTATE buffer is asking for trouble.

> 
> IOW, this is exactly the same shit as we had before. So what is decent
> about this? Define decent...
> 
> Why the heck do you think I wasted a couple of days to:
> 
>  - Analyze the root cause
> 
>  - Destill a trivial C reproducer
> 
>  - Come up with a fully working and completely correct fix
> 
> Just because, right?
> 
> I'm fine with splitting up clear_all() and clear_user(), but what you
> provided is as much of a clusterfuck as the commit it pretends to fix.
> 
> Your's seriously grumpy
> 
>        Thomas
>
Thomas Gleixner May 31, 2021, 9:03 a.m. UTC | #2
On Sun, May 30 2021 at 16:41, Andy Lutomirski wrote:
> On 5/30/21 3:02 PM, Thomas Gleixner wrote:

>>>  /*

>>> - * Clear the FPU state back to init state.

>>> - *

>>> - * Called by sys_execve(), by the signal handler code and by various

>>> - * error paths.

>>> + * Reset current's user FPU states to the init states.  The caller promises

>>> + * that current's supervisor states (in memory or CPU regs as appropriate)

>>> + * as well as the XSAVE header in memory are intact.

>

> ^^^ The caller promises this


Yes, I misread this, but it's more than non-obvious.

> This patch fixes your reproducer and my (to-be-sent) reproducer.  I

> tested it on a machine that physically has the XRSTORS instruction but I

> disabled it using virt.  Are you still seeing failures with this patch

> applied?  I can try to test on a different CPU.


Seems I applied the patch, built it and then failed to actually boot
that kernel. I retested with brain awake and it indeed works.

Sorry for the rant!

Thanks,

        tglx
Thomas Gleixner May 31, 2021, 10:01 a.m. UTC | #3
On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote:
>  /*

> - * Clear the FPU state back to init state.

> - *

> - * Called by sys_execve(), by the signal handler code and by various

> - * error paths.

> + * Reset current's user FPU states to the init states.  The caller promises

> + * that current's supervisor states (in memory or CPU regs as appropriate)

> + * as well as the XSAVE header in memory are intact.

>   */

> -static void fpu__clear(struct fpu *fpu, bool user_only)

> +void fpu__clear_user_states(struct fpu *fpu)

>  {

>  	WARN_ON_FPU(fpu != &current->thread.fpu);

>  

>  	if (!static_cpu_has(X86_FEATURE_FPU)) {


This can only be safely called if XSAVES is available. So this check is
bogus as it actually should check for !XSAVES. And if at all it should
be:

   if (WARN_ON_ONCE(!XSAVES))
      ....

This is exactly the stuff which causes subtle problems down the road.

I have no idea why you are insisting on having this conditional at the
call site. It's just an invitation for trouble because someone finds
this function and calls it unconditionally. And he will miss the
'promise' part in the comment as I did.

Something like the below.

Thanks,

        tglx
---
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -32,7 +32,7 @@ extern void fpu__save(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
 extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
-extern void fpu__clear_user_states(struct fpu *fpu);
+extern void fpu__handle_sig_restore_fail(struct fpu *fpu);
 extern void fpu__clear_all(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -354,45 +354,72 @@ static inline void copy_init_fpstate_to_
 }
 
 /*
- * Clear the FPU state back to init state.
- *
- * Called by sys_execve(), by the signal handler code and by various
- * error paths.
+ * Reset current's user FPU states to the init states after a restore from
+ * user space supplied state failed in __fpu_restore_sig().
  */
-static void fpu__clear(struct fpu *fpu, bool user_only)
+void fpu__handle_sig_restore_fail(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	if (!static_cpu_has(X86_FEATURE_FPU)) {
-		fpu__drop(fpu);
-		fpu__initialize(fpu);
+	/*
+	 * With XSAVES this can just reset the user space state in the
+	 * registers. The content of task->fpu.state.xsave will be
+	 * overwritten by the next XSAVES, so it does not matter.
+	 *
+	 * For !XSAVES systems task->fpu.state.xsave must be
+	 * reinitialized. As these systems do not have supervisor states,
+	 * start over from a clean state.
+	 */
+	if (!static_cpu_has(X86_FEATURE_XSAVES)) {
+		fpu__clear_all(fpu);
 		return;
 	}
 
 	fpregs_lock();
 
-	if (user_only) {
-		if (!fpregs_state_valid(fpu, smp_processor_id()) &&
-		    xfeatures_mask_supervisor())
-			copy_kernel_to_xregs(&fpu->state.xsave,
-					     xfeatures_mask_supervisor());
-		copy_init_fpstate_to_fpregs(xfeatures_mask_user());
-	} else {
-		copy_init_fpstate_to_fpregs(xfeatures_mask_all);
+	/*
+	 * Ensure that current's supervisor states are loaded into
+	 * their corresponding registers.
+	 */
+	if (!fpregs_state_valid(fpu, smp_processor_id()) &&
+	    xfeatures_mask_supervisor()) {
+		copy_kernel_to_xregs(&fpu->state.xsave,
+				     xfeatures_mask_supervisor());
 	}
 
+	/* Reset user states in registers. */
+	copy_init_fpstate_to_fpregs(xfeatures_mask_user());
+
+	/*
+	 * Now all FPU registers have their desired values.  Inform the
+	 * FPU state machine that current's FPU registers are in the
+	 * hardware registers.
+	 */
 	fpregs_mark_activate();
+
 	fpregs_unlock();
 }
 
-void fpu__clear_user_states(struct fpu *fpu)
-{
-	fpu__clear(fpu, true);
-}
 
+/*
+ * Reset current's FPU registers (user and supervisor) to their INIT values.
+ * This function does not trust the in-memory XSAVE image to be valid at all;
+ * it is safe to call even if the contents of fpu->state may be entirely
+ * malicious, including the header.
+ *
+ * This means that it must not use XSAVE, as XSAVE reads the header from
+ * memory.
+ *
+ * This does not change the actual hardware registers; when fpu__clear_all()
+ * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode
+ * will reload the hardware registers from memory.
+ */
 void fpu__clear_all(struct fpu *fpu)
 {
-	fpu__clear(fpu, false);
+	fpregs_lock();
+	fpu__drop(fpu);
+	fpu__initialize(fpu);
+	fpregs_unlock();
 }
 
 /*
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -345,6 +345,7 @@ static int __fpu__restore_sig(void __use
 		 */
 		fpregs_lock();
 		pagefault_disable();
+
 		ret = copy_user_to_fpregs_zeroing(buf_fx, user_xfeatures, fx_only);
 		pagefault_enable();
 		if (!ret) {
@@ -382,10 +383,9 @@ static int __fpu__restore_sig(void __use
 	}
 
 	/*
-	 * By setting TIF_NEED_FPU_LOAD it is ensured that our xstate is
-	 * not modified on context switch and that the xstate is considered
-	 * to be loaded again on return to userland (overriding last_cpu avoids
-	 * the optimisation).
+	 * By setting TIF_NEED_FPU_LOAD, we ensure that any context switches
+	 * or kernel_fpu_begin() operations (due to scheduling, page faults,
+	 * softirq, etc) do not access fpu->state.
 	 */
 	fpregs_lock();
 
@@ -406,10 +406,18 @@ static int __fpu__restore_sig(void __use
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
 		if (using_compacted_format()) {
+			/*
+			 * copy_user_to_xstate() may write complete garbage
+			 * to the user states, but it will not corrupt the
+			 * XSAVE header, nor will it corrupt any supervisor
+			 * states.
+			 */
 			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		} else {
 			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
-
+			/*
+			 * Careful, the XSAVE header may be corrupt now.
+			 */
 			if (!ret && state_size > offsetof(struct xregs_state, header))
 				ret = validate_user_xstate_header(&fpu->state.xsave.header);
 		}
@@ -457,6 +465,15 @@ static int __fpu__restore_sig(void __use
 		fpregs_lock();
 		ret = copy_kernel_to_fregs_err(&fpu->state.fsave);
 	}
+
+	/*
+	 * At this point, we have modified the hardware FPU regs.  We must
+	 * either mark them active for current or invalidate them for
+	 * any other task that may have owned them.
+	 *
+	 * Yes, the nonsensically named fpregs_deactivate() function
+	 * ignores its parameter and marks this CPU's regs invalid.
+	 */
 	if (!ret)
 		fpregs_mark_activate();
 	else
@@ -465,7 +482,7 @@ static int __fpu__restore_sig(void __use
 
 err_out:
 	if (ret)
-		fpu__clear_user_states(fpu);
+		fpu__handle_sig_restore_fail(fpu);
 	return ret;
 }
Thomas Gleixner May 31, 2021, 6:56 p.m. UTC | #4
On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote:

> On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote:

>>  /*

>> - * Clear the FPU state back to init state.

>> - *

>> - * Called by sys_execve(), by the signal handler code and by various

>> - * error paths.

>> + * Reset current's user FPU states to the init states.  The caller promises

>> + * that current's supervisor states (in memory or CPU regs as appropriate)

>> + * as well as the XSAVE header in memory are intact.

>>   */

>> -static void fpu__clear(struct fpu *fpu, bool user_only)

>> +void fpu__clear_user_states(struct fpu *fpu)

>>  {

>>  	WARN_ON_FPU(fpu != &current->thread.fpu);

>>  

>>  	if (!static_cpu_has(X86_FEATURE_FPU)) {

>

> This can only be safely called if XSAVES is available. So this check is

> bogus as it actually should check for !XSAVES. And if at all it should

> be:

>

>    if (WARN_ON_ONCE(!XSAVES))

>       ....

>

> This is exactly the stuff which causes subtle problems down the road.

>

> I have no idea why you are insisting on having this conditional at the

> call site. It's just an invitation for trouble because someone finds

> this function and calls it unconditionally. And he will miss the

> 'promise' part in the comment as I did.


And of course there is:

__fpu__restore_sig()

	if (!buf) {
                fpu__clear_user_states(fpu);
                return 0;
        }

and

handle_signal()

   if (!failed)
      fpu__clear_user_states(fpu);

which invoke that function unconditionally.

Thanks,

        tglx
Andy Lutomirski May 31, 2021, 7:30 p.m. UTC | #5
On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote:
> On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote:

> 

> > On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote:

> >>  /*

> >> - * Clear the FPU state back to init state.

> >> - *

> >> - * Called by sys_execve(), by the signal handler code and by various

> >> - * error paths.

> >> + * Reset current's user FPU states to the init states.  The caller promises

> >> + * that current's supervisor states (in memory or CPU regs as appropriate)

> >> + * as well as the XSAVE header in memory are intact.

> >>   */

> >> -static void fpu__clear(struct fpu *fpu, bool user_only)

> >> +void fpu__clear_user_states(struct fpu *fpu)

> >>  {

> >>  	WARN_ON_FPU(fpu != &current->thread.fpu);

> >>  

> >>  	if (!static_cpu_has(X86_FEATURE_FPU)) {

> >

> > This can only be safely called if XSAVES is available. So this check is

> > bogus as it actually should check for !XSAVES. And if at all it should

> > be:

> >

> >    if (WARN_ON_ONCE(!XSAVES))

> >       ....

> >

> > This is exactly the stuff which causes subtle problems down the road.

> >

> > I have no idea why you are insisting on having this conditional at the

> > call site. It's just an invitation for trouble because someone finds

> > this function and calls it unconditionally. And he will miss the

> > 'promise' part in the comment as I did.

> 

> And of course there is:

> 

> __fpu__restore_sig()

> 

> 	if (!buf) {

>                 fpu__clear_user_states(fpu);

>                 return 0;

>         }

> 

> and

> 

> handle_signal()

> 

>    if (!failed)

>       fpu__clear_user_states(fpu);


This looks okay.

Really there are two callers of fpu__clear_all() that are special:

execve:  Just in case some part of the xstate buffer mode that’s supposed to be invariant got corrupted or in case there is some side channel that can leak the INIT-but-not-zeroed contents of a state to user code, we should really wipe the memory completely across privilege boundaries.

__fpu__restore_sig: the utterly daft copy from user space needs special recovery.

Maybe the right solution is to rename it. Instead of fpu__clear_all(), how about fpu__wipe_and_reset()?

> 

> which invoke that function unconditionally.

> 

> Thanks,

> 

>         tglx

>
Thomas Gleixner May 31, 2021, 7:48 p.m. UTC | #6
On Mon, May 31 2021 at 20:56, Thomas Gleixner wrote:
> On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote:

> __fpu__restore_sig()

>

> 	if (!buf) {

>                 fpu__clear_user_states(fpu);

>                 return 0;

>         }

>

> and

>

> handle_signal()

>

>    if (!failed)

>       fpu__clear_user_states(fpu);

>

> which invoke that function unconditionally.


So we cannot warn there.

This is all wrong and everything should use copy_kernel_to_xstate()
after copying the buffer from user space. But of course allocating
memory there is daft.

There is also xstateregs_set() which invokes fpstate_init() on fail
which means it blows away _ALL_ state including supervisor state.

Even without supervisor state this function is bonkers. If the ptracer
provides a bogus data set then this just invalidates the target tasks
FPU state for no real good reason.

This should just use a kernel buffer.  If the copy from user fails, the
caller gets the EFAULT. If the header is bogus, then
copy_kernel_to_xstate() returns -EINVAL and that's handed back to the
caller. No reason to invalidate anything.

Thanks,

        tglx
Thomas Gleixner May 31, 2021, 10:46 p.m. UTC | #7
On Mon, May 31 2021 at 12:30, Andy Lutomirski wrote:
> On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote:

>> And of course there is:

>> 

>> __fpu__restore_sig()

>> 

>> 	if (!buf) {

>>                 fpu__clear_user_states(fpu);

>>                 return 0;

>>         }

>> 

>> and

>> 

>> handle_signal()

>> 

>>    if (!failed)

>>       fpu__clear_user_states(fpu);

>

> This looks okay.


Looks okay is meh... That other stuff obviously looked okay as well...

That FPU code is an unpenetrable mess.

> Really there are two callers of fpu__clear_all() that are special:

>

> execve: Just in case some part of the xstate buffer mode that’s

> supposed to be invariant got corrupted or in case there is some side

> channel that can leak the INIT-but-not-zeroed contents of a state to

> user code, we should really wipe the memory completely across

> privilege boundaries.

>

> __fpu__restore_sig: the utterly daft copy from user space needs

> special recovery.

>

> Maybe the right solution is to rename it. Instead of fpu__clear_all(),

> how about fpu__wipe_and_reset()?


The right solution is to just use copy_user_to_xstate() unconditionally.

That fixes the issue even without cleaning up that fpu_clear() mess
which we want to do nevertheless.

I have a similar fix for the related xstateregs_set() trainwreck, but
that really needs to allocate a buffer because it's operating on a
different task contrary to signal handling.

I'm too tired now to test the xstateregs_set() muck, but if you want to
have a look:

     https://tglx.de/~tglx/patches.tar

Thanks,

        tglx
---
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr
 void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
 void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
 
-
-/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr);
-
 #endif
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __use
 	if (use_xsave() && !fx_only) {
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
-		if (using_compacted_format()) {
-			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
-		} else {
-			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
-
-			if (!ret && state_size > offsetof(struct xregs_state, header))
-				ret = validate_user_xstate_header(&fpu->state.xsave.header);
-		}
+		ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		if (ret)
 			goto err_out;
 
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -515,7 +515,7 @@ int using_compacted_format(void)
 }
 
 /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr)
+static int validate_user_xstate_header(const struct xstate_header *hdr)
 {
 	/* No unknown or supervisor features may be set */
 	if (hdr->xfeatures & ~xfeatures_mask_user())
Andy Lutomirski June 1, 2021, 4:58 a.m. UTC | #8
On 5/31/21 3:46 PM, Thomas Gleixner wrote:
> On Mon, May 31 2021 at 12:30, Andy Lutomirski wrote:

>> On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote:

>>> And of course there is:

>>>

>>> __fpu__restore_sig()

>>>

>>> 	if (!buf) {

>>>                 fpu__clear_user_states(fpu);

>>>                 return 0;

>>>         }

>>>

>>> and

>>>

>>> handle_signal()

>>>

>>>    if (!failed)

>>>       fpu__clear_user_states(fpu);

>>

>> This looks okay.

> 

> Looks okay is meh... That other stuff obviously looked okay as well...

> 

> That FPU code is an unpenetrable mess.

> 

>> Really there are two callers of fpu__clear_all() that are special:

>>

>> execve: Just in case some part of the xstate buffer mode that’s

>> supposed to be invariant got corrupted or in case there is some side

>> channel that can leak the INIT-but-not-zeroed contents of a state to

>> user code, we should really wipe the memory completely across

>> privilege boundaries.

>>

>> __fpu__restore_sig: the utterly daft copy from user space needs

>> special recovery.

>>

>> Maybe the right solution is to rename it. Instead of fpu__clear_all(),

>> how about fpu__wipe_and_reset()?

> 

> The right solution is to just use copy_user_to_xstate() unconditionally.

> 

> That fixes the issue even without cleaning up that fpu_clear() mess

> which we want to do nevertheless.

> 

> I have a similar fix for the related xstateregs_set() trainwreck, but

> that really needs to allocate a buffer because it's operating on a

> different task contrary to signal handling.

> 

> I'm too tired now to test the xstateregs_set() muck, but if you want to

> have a look:

> 

>      https://tglx.de/~tglx/patches.tar

> 

> Thanks,

> 

>         tglx

> ---

> --- a/arch/x86/include/asm/fpu/xstate.h

> +++ b/arch/x86/include/asm/fpu/xstate.h

> @@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr

>  void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);

>  void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);

>  

> -

> -/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */

> -int validate_user_xstate_header(const struct xstate_header *hdr);

> -

>  #endif

> --- a/arch/x86/kernel/fpu/signal.c

> +++ b/arch/x86/kernel/fpu/signal.c

> @@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __use

>  	if (use_xsave() && !fx_only) {

>  		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;

>  

> -		if (using_compacted_format()) {

> -			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);

> -		} else {

> -			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);

> -

> -			if (!ret && state_size > offsetof(struct xregs_state, header))

> -				ret = validate_user_xstate_header(&fpu->state.xsave.header);

> -		}

> +		ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);

>  		if (ret)

>  			goto err_out;

>  

> --- a/arch/x86/kernel/fpu/xstate.c

> +++ b/arch/x86/kernel/fpu/xstate.c

> @@ -515,7 +515,7 @@ int using_compacted_format(void)

>  }

>  

>  /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */

> -int validate_user_xstate_header(const struct xstate_header *hdr)

> +static int validate_user_xstate_header(const struct xstate_header *hdr)

>  {

>  	/* No unknown or supervisor features may be set */

>  	if (hdr->xfeatures & ~xfeatures_mask_user())

> 


I like it.

Here's my current patch pile.  Not even really compile tested -- it's
bedtime.  Your patch is copied in verbatim.
Dave Hansen June 1, 2021, 2:48 p.m. UTC | #9
> +static void sigusr1(int sig, siginfo_t *info, void *uc_void)

> +{

> +       ucontext_t *uc = uc_void;

> +       uint8_t *fpstate = (uint8_t *)uc->uc_mcontext.fpregs;

> +       uint64_t *xfeatures = (uint64_t *)(fpstate + 512);

> +

> +       printf("\tOriginal XFEATURES = 0x%lx\n", *xfeatures);

> +       *(xfeatures+2) = 0xfffffff;

> +       //*xfeatures = ~(uint64_t)0;

> +}


I was trying to think of something which might be more durable than
this, like if a new feature started using bytes 16->23 for something useful.

How about a memset() of the entire 64-byte header to 0xff?

On to

	[PATCH v3 2/5] x86/fpu: Fix state corruption in ...

> Instead of trying to give sensible semantics to the fpu clear functions

> in the presence of XSAVE header corruption, simply avoid corrupting the

> header in the first place by using copy_user_to_xstate().


Should we mention that the verbatim copying via __copy_from_user()
copies the (bad) reserved bit contents and how copy_user_to_xstate()
avoids it?

Maybe:

__copy_from_user() copies the entire user buffer into the kernel buffer,
verbatim.  This means that the kernel buffer may now contain entirely
invalid state on which XRSTOR will #GP.  validate_user_xstate_header()
can detect some of that corruption, but that leaves the onus on callers
to clear the buffer.

Avoid corruption of the kernel XSAVE buffer.  Use copy_user_to_xstate()
which validates the XSAVE header contents *BEFORE* copying the buffer
into the kernel.

Note: copy_user_to_xstate() was previously only called for
compacted-format kernel buffers.  It functions for both compacted and
non-compacted forms.  Using it for the non-compacted form will be slower
because of multiple __copy_from_user() operations in the
compacted-format code, but that cost is less important than simpler code
in an already slow path.

> --- a/arch/x86/kernel/fpu/signal.c

> +++ b/arch/x86/kernel/fpu/signal.c

> @@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)

>         if (use_xsave() && !fx_only) {

>                 u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;

>  

> -               if (using_compacted_format()) {

> -                       ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);

> -               } else {

> -                       ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);

> -

> -                       if (!ret && state_size > offsetof(struct xregs_state, header))

> -                               ret = validate_user_xstate_header(&fpu->state.xsave.header);

> -               }

> +               ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);

>                 if (ret)

>                         goto err_out;

>
Dave Hansen June 1, 2021, 6:06 p.m. UTC | #10
On to patch 3:

> fpu__clear_all() and fpu__clear_user_states() share an implementation,

> and the resulting code is almost unreadable.  Clean it up.


Could we flesh this changelog out a bit?  I basically write these in the
process of understanding what the patch does, so this is less of "your
changelog needs help" and more of, "here's some more detail if you want it":

fpu__clear() currently resets both register state and kernel XSAVE
buffer state.  It has two modes: one for all state (supervisor and user)
and another for user state only.  fpu__clear_all() uses the "all state"
(user_only=0) mode, while a number of signal paths use the user_only=1 mode.

Make fpu__clear() work only for user state (user_only=1) and remove the
"all state" (user_only=0) code.  Rename it to match so it can be used by
the signal paths.

Replace the "all state" (user_only=0) fpu__clear() functionality.  Use
the TIF_NEED_FPU_LOAD functionality instead of making any actual
hardware registers changes in this path.

>  arch/x86/kernel/fpu/core.c | 63 +++++++++++++++++++++++++-------------

>  1 file changed, 42 insertions(+), 21 deletions(-)

> 

> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c

> index 571220ac8bea..a01cbb6a08bb 100644

> --- a/arch/x86/kernel/fpu/core.c

> +++ b/arch/x86/kernel/fpu/core.c

> @@ -354,45 +354,66 @@ static inline void copy_init_fpstate_to_fpregs(u64 features_mask)

>  }

>  

>  /*

> - * Clear the FPU state back to init state.

> - *

> - * Called by sys_execve(), by the signal handler code and by various

> - * error paths.

> + * Reset current's user FPU states to the init states.  current's supervisor

> + * states, if any, are not modified by this function.  The XSTATE header

> + * in memory is assumed to be intact when this is called.

>   */

> -static void fpu__clear(struct fpu *fpu, bool user_only)

> +void fpu__clear_user_states(struct fpu *fpu)

>  {

>         WARN_ON_FPU(fpu != &current->thread.fpu);

>  

>         if (!static_cpu_has(X86_FEATURE_FPU)) {

> -               fpu__drop(fpu);

> -               fpu__initialize(fpu);

> +               fpu__clear_all(fpu);

>                 return;

>         }

>  

>         fpregs_lock();

>  

> -       if (user_only) {

> -               if (!fpregs_state_valid(fpu, smp_processor_id()) &&

> -                   xfeatures_mask_supervisor())

> -                       copy_kernel_to_xregs(&fpu->state.xsave,

> -                                            xfeatures_mask_supervisor());

> -               copy_init_fpstate_to_fpregs(xfeatures_mask_user());

> -       } else {

> -               copy_init_fpstate_to_fpregs(xfeatures_mask_all);

> -       }

> +       /*

> +        * Ensure that current's supervisor states are loaded into

> +        * their corresponding registers.

> +        */

> +       if (!fpregs_state_valid(fpu, smp_processor_id()) &&

> +           xfeatures_mask_supervisor())

> +               copy_kernel_to_xregs(&fpu->state.xsave,

> +                                    xfeatures_mask_supervisor());

>  

> +       /*

> +        * Reset user states in registers.

> +        */

> +       copy_init_fpstate_to_fpregs(xfeatures_mask_user());

> +

> +       /*

> +        * Now all FPU registers have their desired values.  Inform the

> +        * FPU state machine that current's FPU registers are in the

> +        * hardware registers.

> +        */

>         fpregs_mark_activate();

> +       

>         fpregs_unlock();

>  }

>  

> -void fpu__clear_user_states(struct fpu *fpu)

> -{

> -       fpu__clear(fpu, true);

> -}

> +/*

> + * Reset current's FPU registers (user and supervisor) to their INIT values.

> + * This is used by execve(); out of an abundance of caution, it completely

> + * wipes and resets the XSTATE buffer in memory.

> + *

> + * Note that XSAVE (unlike XSAVES) expects the XSTATE buffer in memory to

> + * be valid, so there are certain forms of corruption of the XSTATE buffer

> + * in memory that would survive initializing the FPU registers and XSAVEing

> + * them to memory.

> + *

> + * This does not change the actual hardware registers; when fpu__clear_all()

> + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode

> + * will reload the hardware registers from memory.

> + */

>  void fpu__clear_all(struct fpu *fpu)

>  {

> -       fpu__clear(fpu, false);

> +       fpregs_lock();

> +       fpu__drop(fpu);

> +       fpu__initialize(fpu);

> +       fpregs_unlock();

>  }


Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right
next to the fpu__initialize() call?  It would make it painfully obvious
which call is responsible.  The naming isn't super helpful here.
Andy Lutomirski June 1, 2021, 6:14 p.m. UTC | #11
On Tue, Jun 1, 2021, at 11:06 AM, Dave Hansen wrote:
> On to patch 3:

> 

> > fpu__clear_all() and fpu__clear_user_states() share an implementation,

> > and the resulting code is almost unreadable.  Clean it up.

> 

> Could we flesh this changelog out a bit?  I basically write these in the

> process of understanding what the patch does, so this is less of "your

> changelog needs help" and more of, "here's some more detail if you want it":

> 

> fpu__clear() currently resets both register state and kernel XSAVE

> buffer state.  It has two modes: one for all state (supervisor and user)

> and another for user state only.  fpu__clear_all() uses the "all state"

> (user_only=0) mode, while a number of signal paths use the user_only=1 mode.

> 

> Make fpu__clear() work only for user state (user_only=1) and remove the

> "all state" (user_only=0) code.  Rename it to match so it can be used by

> the signal paths.

> 

> Replace the "all state" (user_only=0) fpu__clear() functionality.  Use

> the TIF_NEED_FPU_LOAD functionality instead of making any actual

> hardware registers changes in this path.


I’ll steal some of this.

> 

> >  arch/x86/kernel/fpu/core.c | 63 +++++++++++++++++++++++++-------------

> >  1 file changed, 42 insertions(+), 21 deletions(-)

> > 

> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c

> > index 571220ac8bea..a01cbb6a08bb 100644

> > --- a/arch/x86/kernel/fpu/core.c

> > +++ b/arch/x86/kernel/fpu/core.c

> > @@ -354,45 +354,66 @@ static inline void copy_init_fpstate_to_fpregs(u64 features_mask)

> >  }

> >  

> >  /*

> > - * Clear the FPU state back to init state.

> > - *

> > - * Called by sys_execve(), by the signal handler code and by various

> > - * error paths.

> > + * Reset current's user FPU states to the init states.  current's supervisor

> > + * states, if any, are not modified by this function.  The XSTATE header

> > + * in memory is assumed to be intact when this is called.

> >   */

> > -static void fpu__clear(struct fpu *fpu, bool user_only)

> > +void fpu__clear_user_states(struct fpu *fpu)

> >  {

> >         WARN_ON_FPU(fpu != &current->thread.fpu);

> >  

> >         if (!static_cpu_has(X86_FEATURE_FPU)) {

> > -               fpu__drop(fpu);

> > -               fpu__initialize(fpu);

> > +               fpu__clear_all(fpu);

> >                 return;

> >         }

> >  

> >         fpregs_lock();

> >  

> > -       if (user_only) {

> > -               if (!fpregs_state_valid(fpu, smp_processor_id()) &&

> > -                   xfeatures_mask_supervisor())

> > -                       copy_kernel_to_xregs(&fpu->state.xsave,

> > -                                            xfeatures_mask_supervisor());

> > -               copy_init_fpstate_to_fpregs(xfeatures_mask_user());

> > -       } else {

> > -               copy_init_fpstate_to_fpregs(xfeatures_mask_all);

> > -       }

> > +       /*

> > +        * Ensure that current's supervisor states are loaded into

> > +        * their corresponding registers.

> > +        */

> > +       if (!fpregs_state_valid(fpu, smp_processor_id()) &&

> > +           xfeatures_mask_supervisor())

> > +               copy_kernel_to_xregs(&fpu->state.xsave,

> > +                                    xfeatures_mask_supervisor());

> >  

> > +       /*

> > +        * Reset user states in registers.

> > +        */

> > +       copy_init_fpstate_to_fpregs(xfeatures_mask_user());

> > +

> > +       /*

> > +        * Now all FPU registers have their desired values.  Inform the

> > +        * FPU state machine that current's FPU registers are in the

> > +        * hardware registers.

> > +        */

> >         fpregs_mark_activate();

> > +       

> >         fpregs_unlock();

> >  }

> >  

> > -void fpu__clear_user_states(struct fpu *fpu)

> > -{

> > -       fpu__clear(fpu, true);

> > -}

> > +/*

> > + * Reset current's FPU registers (user and supervisor) to their INIT values.

> > + * This is used by execve(); out of an abundance of caution, it completely

> > + * wipes and resets the XSTATE buffer in memory.

> > + *

> > + * Note that XSAVE (unlike XSAVES) expects the XSTATE buffer in memory to

> > + * be valid, so there are certain forms of corruption of the XSTATE buffer

> > + * in memory that would survive initializing the FPU registers and XSAVEing

> > + * them to memory.

> > + *

> > + * This does not change the actual hardware registers; when fpu__clear_all()

> > + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode

> > + * will reload the hardware registers from memory.

> > + */

> >  void fpu__clear_all(struct fpu *fpu)

> >  {

> > -       fpu__clear(fpu, false);

> > +       fpregs_lock();

> > +       fpu__drop(fpu);

> > +       fpu__initialize(fpu);

> > +       fpregs_unlock();

> >  }

> 

> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right

> next to the fpu__initialize() call?  It would make it painfully obvious

> which call is responsible.  The naming isn't super helpful here.

> 


Hmm. I was actually thinking of open-coding it all.  fpu__initialize() and fpu__drop() have very few callers, and I’m not even convinced that the other callers are calling them for a valid reason.
Thomas Gleixner June 1, 2021, 6:25 p.m. UTC | #12
On Tue, Jun 01 2021 at 11:06, Dave Hansen wrote:
> On to patch 3:

>

> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right

> next to the fpu__initialize() call?  It would make it painfully obvious

> which call is responsible.  The naming isn't super helpful here.


I've done that and picked up your changelog improvements. I did some
other tweaks myself during the day when I had a few cycles.

Here is my current pile which is based on Andy's.

     https://tglx.de/~tglx/patches.tar

Need to vanish for an hour. Can work on it afterwards again.

Thanks,

        Thomas
Dave Hansen June 1, 2021, 6:35 p.m. UTC | #13
On 6/1/21 11:14 AM, Andy Lutomirski wrote:
>> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD

>> right next to the fpu__initialize() call?  It would make it

>> painfully obvious which call is responsible.  The naming isn't

>> super helpful here.

>> 

> Hmm. I was actually thinking of open-coding it all.

> fpu__initialize() and fpu__drop() have very few callers, and I’m not

> even convinced that the other callers are calling them for a valid

> reason.


Fine by me.  We all know what the TIF flag does, but its connection to
this code is totally opaque.  It's a place where removing the
abstraction might actually make sense.
Andy Lutomirski June 1, 2021, 10:44 p.m. UTC | #14
On 6/1/21 11:35 AM, Dave Hansen wrote:
> On 6/1/21 11:14 AM, Andy Lutomirski wrote:

>>> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD

>>> right next to the fpu__initialize() call?  It would make it

>>> painfully obvious which call is responsible.  The naming isn't

>>> super helpful here.

>>>

>> Hmm. I was actually thinking of open-coding it all.

>> fpu__initialize() and fpu__drop() have very few callers, and I’m not

>> even convinced that the other callers are calling them for a valid

>> reason.

> 

> Fine by me.  We all know what the TIF flag does, but its connection to

> this code is totally opaque.  It's a place where removing the

> abstraction might actually make sense.

> 


After mulling this over for a while, I think the TIF flag's existence is
fine, but the APIs are all awful.  I think we should give the states in
the little FPU state machine explicit names:

ACTIVE: If the current CPU's FPU regs are ACTIVE, then those regs are
the current task's registers and current->fpu does not necessarily
contain the current task's registers.  (This is !TIF_NEED_FPU_LOAD.)

INACTIVE: If the current CPU's FPU regs are INACTIVE, then those regs
belong to the kernel and do not belong to any particular task.  The
current task's FPU registers are in current->fpu.

PRESERVED: If the current CPU's FPU regs are PRESERVED for a given task,
then they are guaranteed to match that task's task->fpu.  It is possible
to tell whether a given task's FPU regs are PRESERVED on a given CPU,
but it is not possible to tell whether a given CPU's regs are PRESERVED
or INACTIVE, as it is possible that fpu_fpregs_owner_ctx points to
memory that has been reused for a different purpose, so dereferencing
->last_fpu is not safe.


If a non-current task is running, then its FPU state may not be accessed
(obviously).  Similarly, in interrupt context, neither FPU regs nor
current->fpu may be accessed, as interrupts can nest inside
fpregs_lock()ed regions and the state may be indeterminate.

If a non-current task is *stopped*, then its FPU registers may be read.
If its FPU registers are invalidated, then they may also be written.
fpu__prepare_read() and fpu__prepare_write() may be used for these
purposes.  (But fpu__prepare_read() should be rewritten.  It currently
copies the regs to memory but leaves the state ACTIVE, which is just
silly, especially the absurd way it manages the FSAVE clobber
workaround.  It should just do switch_fpu_prepare().)



And we add some APIs that have locking assertions:

fpu__current_regs_active(): returns true if the current task's regs are
ACTIVE.  This replaces most checks for TIF_NEED_FPU_LOAD.  Asserts
!preemptible().

and more to come, but not today.
Thomas Gleixner June 1, 2021, 11:17 p.m. UTC | #15
On Tue, Jun 01 2021 at 00:46, Thomas Gleixner wrote:
> I'm too tired now to test the xstateregs_set() muck, but if you want to

> have a look:

>

>      https://tglx.de/~tglx/patches.tar


And of course the last patch in that series was f*cked up by me. Working
set with all updates included is available at the URL above.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 571220ac8bea..55792ef6ba6d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -354,45 +354,65 @@  static inline void copy_init_fpstate_to_fpregs(u64 features_mask)
 }
 
 /*
- * Clear the FPU state back to init state.
- *
- * Called by sys_execve(), by the signal handler code and by various
- * error paths.
+ * Reset current's user FPU states to the init states.  The caller promises
+ * that current's supervisor states (in memory or CPU regs as appropriate)
+ * as well as the XSAVE header in memory are intact.
  */
-static void fpu__clear(struct fpu *fpu, bool user_only)
+void fpu__clear_user_states(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
 	if (!static_cpu_has(X86_FEATURE_FPU)) {
-		fpu__drop(fpu);
-		fpu__initialize(fpu);
+		fpu__clear_all(fpu);
 		return;
 	}
 
 	fpregs_lock();
 
-	if (user_only) {
-		if (!fpregs_state_valid(fpu, smp_processor_id()) &&
-		    xfeatures_mask_supervisor())
-			copy_kernel_to_xregs(&fpu->state.xsave,
-					     xfeatures_mask_supervisor());
-		copy_init_fpstate_to_fpregs(xfeatures_mask_user());
-	} else {
-		copy_init_fpstate_to_fpregs(xfeatures_mask_all);
-	}
+	/*
+	 * Ensure that current's supervisor states are loaded into
+	 * their corresponding registers.
+	 */
+	if (!fpregs_state_valid(fpu, smp_processor_id()) &&
+	    xfeatures_mask_supervisor())
+		copy_kernel_to_xregs(&fpu->state.xsave,
+				     xfeatures_mask_supervisor());
 
+	/*
+	 * Reset user states in registers.
+	 */
+	copy_init_fpstate_to_fpregs(xfeatures_mask_user());
+
+	/*
+	 * Now all FPU registers have their desired values.  Inform the
+	 * FPU state machine that current's FPU registers are in the
+	 * hardware registers.
+	 */
 	fpregs_mark_activate();
+
 	fpregs_unlock();
 }
 
-void fpu__clear_user_states(struct fpu *fpu)
-{
-	fpu__clear(fpu, true);
-}
 
+/*
+ * Reset current's FPU registers (user and supervisor) to their INIT values.
+ * This function does not trust the in-memory XSAVE image to be valid at all;
+ * it is safe to call even if the contents of fpu->state may be entirely
+ * malicious, including the header.
+ *
+ * This means that it must not use XSAVE, as XSAVE reads the header from
+ * memory.
+ *
+ * This does not change the actual hardware registers; when fpu__clear_all()
+ * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode
+ * will reload the hardware registers from memory.
+ */
 void fpu__clear_all(struct fpu *fpu)
 {
-	fpu__clear(fpu, false);
+	fpregs_lock();
+	fpu__drop(fpu);
+	fpu__initialize(fpu);
+	fpregs_unlock();
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a4ec65317a7f..3ff7e75c7b8f 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -345,6 +345,7 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 */
 		fpregs_lock();
 		pagefault_disable();
+
 		ret = copy_user_to_fpregs_zeroing(buf_fx, user_xfeatures, fx_only);
 		pagefault_enable();
 		if (!ret) {
@@ -382,10 +383,9 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	}
 
 	/*
-	 * By setting TIF_NEED_FPU_LOAD it is ensured that our xstate is
-	 * not modified on context switch and that the xstate is considered
-	 * to be loaded again on return to userland (overriding last_cpu avoids
-	 * the optimisation).
+	 * By setting TIF_NEED_FPU_LOAD, we ensure that any context switches
+	 * or kernel_fpu_begin() operations (due to scheduling, page faults,
+	 * softirq, etc) do not access fpu->state.
 	 */
 	fpregs_lock();
 
@@ -406,10 +406,18 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
 		if (using_compacted_format()) {
+			/*
+			 * copy_user_to_xstate() may write complete garbage
+			 * to the user states, but it will not corrupt the
+			 * XSAVE header, nor will it corrupt any supervisor
+			 * states.
+			 */
 			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		} else {
 			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
-
+			/*
+			 * Careful, the XSAVE header may be corrupt now.
+			 */
 			if (!ret && state_size > offsetof(struct xregs_state, header))
 				ret = validate_user_xstate_header(&fpu->state.xsave.header);
 		}
@@ -457,6 +465,15 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		fpregs_lock();
 		ret = copy_kernel_to_fregs_err(&fpu->state.fsave);
 	}
+
+	/*
+	 * At this point, we have modified the hardware FPU regs.  We must
+	 * either mark them active for current or invalidate them for
+	 * any other task that may have owned them.
+	 *
+	 * Yes, the nonsensically named fpregs_deactivate() function
+	 * ignores its parameter and marks this CPU's regs invalid.
+	 */
 	if (!ret)
 		fpregs_mark_activate();
 	else
@@ -464,8 +481,27 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	fpregs_unlock();
 
 err_out:
-	if (ret)
-		fpu__clear_user_states(fpu);
+	if (ret) {
+		if (using_compacted_format()) {
+			/*
+			 * Our failed attempt to load the new state is
+			 * guaranteed to have preserved the XSAVE header
+			 * and all supervisor state.
+			 */
+			fpu__clear_user_states(fpu);
+		} else {
+			/*
+			 * If an error above caused garbage to be written
+			 * to XFEATURES, then XSAVE would preserve that
+			 * garbage in memory, and a subsequent XRSTOR would
+			 * fail or worse.  XSAVES does not have this problem.
+			 *
+			 * Wipe out the FPU state and start over from a clean
+			 * slate.
+			 */
+			fpu__clear_all(fpu);
+		}
+	}
 	return ret;
 }