diff mbox

arm64: restore get_current() optimisation

Message ID 1483468021-8237-1-git-send-email-mark.rutland@arm.com
State Accepted
Commit 9d84fb27fa135c99c9fe3de33628774a336a70a8
Headers show

Commit Message

Mark Rutland Jan. 3, 2017, 6:27 p.m. UTC
Hi Catalin,

My THREAD_INFO_IN_TASK series had an unintended performance regression in
get_current() / current_thread_info(). Could you please take the below as a
fix for the next rc?

Thanks,
Mark.

---->8----
Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
inverted the relationship between get_current() and
current_thread_info(), with sp_el0 now holding the current task_struct
rather than the current thead_info. The new implementation of
get_current() prevents the compiler from being able to optimize repeated
calls to either, resulting in a noticeable penalty in some
microbenchmarks.

This patch restores the previous optimisation by implementing
get_current() in the same way as our old current_thread_info(), using a
non-volatile asm statement.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Davidlohr Bueso <dbueso@suse.de>
---
 arch/arm64/include/asm/current.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
1.9.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Will Deacon Jan. 4, 2017, 3:23 p.m. UTC | #1
On Tue, Jan 03, 2017 at 06:27:01PM +0000, Mark Rutland wrote:
> Hi Catalin,

> 

> My THREAD_INFO_IN_TASK series had an unintended performance regression in

> get_current() / current_thread_info(). Could you please take the below as a

> fix for the next rc?

> 

> Thanks,

> Mark.

> 

> ---->8----

> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")

> inverted the relationship between get_current() and

> current_thread_info(), with sp_el0 now holding the current task_struct

> rather than the current thead_info. The new implementation of

> get_current() prevents the compiler from being able to optimize repeated

> calls to either, resulting in a noticeable penalty in some

> microbenchmarks.

> 

> This patch restores the previous optimisation by implementing

> get_current() in the same way as our old current_thread_info(), using a

> non-volatile asm statement.

> 

> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Cc: Will Deacon <will.deacon@arm.com>

> Cc: Catalin Marinas <catalin.marinas@arm.com>

> Reported-by: Davidlohr Bueso <dbueso@suse.de>

> ---

>  arch/arm64/include/asm/current.h | 10 +++++++++-

>  1 file changed, 9 insertions(+), 1 deletion(-)


Acked-by: Will Deacon <will.deacon@arm.com>


Thanks for putting this back like it was!

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h
index f2bcbe2..86c4041 100644
--- a/arch/arm64/include/asm/current.h
+++ b/arch/arm64/include/asm/current.h
@@ -9,9 +9,17 @@ 
 
 struct task_struct;
 
+/*
+ * We don't use read_sysreg() as we want the compiler to cache the value where
+ * possible.
+ */
 static __always_inline struct task_struct *get_current(void)
 {
-	return (struct task_struct *)read_sysreg(sp_el0);
+	unsigned long sp_el0;
+
+	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
+
+	return (struct task_struct *)sp_el0;
 }
 
 #define current get_current()