diff mbox series

[v2] arm64: cacheflush: Fix KGDB trap detection

Message ID 20200504170518.2959478-1-daniel.thompson@linaro.org
State Accepted
Commit ab8ad279ceac4fc78ae4dcf1a26326e05695e537
Headers show
Series [v2] arm64: cacheflush: Fix KGDB trap detection | expand

Commit Message

Daniel Thompson May 4, 2020, 5:05 p.m. UTC
flush_icache_range() contains a bodge to avoid issuing IPIs when the kgdb
trap handler is running because issuing IPIs is unsafe (and not needed)
in this execution context. However the current test, based on
kgdb_connected is flawed: it both over-matches and under-matches.

The over match occurs because kgdb_connected is set when gdb attaches
to the stub and remains set during normal running. This is relatively
harmelss because in almost all cases irq_disabled() will be false.

The under match is more serious. When kdb is used instead of kgdb to access
the debugger then kgdb_connected is not set in all the places that the
debug core updates sw breakpoints (and hence flushes the icache). This
can lead to deadlock.

Fix by replacing the ad-hoc check with the proper kgdb macro. This also
allows us to drop the #ifdef wrapper.

Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

Reviewed-by: Douglas Anderson <dianders@chromium.org>

---

Notes:
    v2: Improve the commit message based based on feedback from Doug
        Anderson

 arch/arm64/include/asm/cacheflush.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c
--
2.25.1

Comments

Will Deacon May 4, 2020, 8:48 p.m. UTC | #1
On Mon, May 04, 2020 at 06:05:18PM +0100, Daniel Thompson wrote:
> flush_icache_range() contains a bodge to avoid issuing IPIs when the kgdb

> trap handler is running because issuing IPIs is unsafe (and not needed)

> in this execution context. However the current test, based on

> kgdb_connected is flawed: it both over-matches and under-matches.

> 

> The over match occurs because kgdb_connected is set when gdb attaches

> to the stub and remains set during normal running. This is relatively

> harmelss because in almost all cases irq_disabled() will be false.

> 

> The under match is more serious. When kdb is used instead of kgdb to access

> the debugger then kgdb_connected is not set in all the places that the

> debug core updates sw breakpoints (and hence flushes the icache). This

> can lead to deadlock.

> 

> Fix by replacing the ad-hoc check with the proper kgdb macro. This also

> allows us to drop the #ifdef wrapper.

> 

> Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")

> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

> Reviewed-by: Douglas Anderson <dianders@chromium.org>

> ---

> 

> Notes:

>     v2: Improve the commit message based based on feedback from Doug

>         Anderson

> 

>  arch/arm64/include/asm/cacheflush.h | 6 +++---

>  1 file changed, 3 insertions(+), 3 deletions(-)

> 

> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h

> index e6cca3d4acf7..ce50c1f1f1ea 100644

> --- a/arch/arm64/include/asm/cacheflush.h

> +++ b/arch/arm64/include/asm/cacheflush.h

> @@ -79,7 +79,7 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

>  	 * IPI all online CPUs so that they undergo a context synchronization

>  	 * event and are forced to refetch the new instructions.

>  	 */

> -#ifdef CONFIG_KGDB

> +

>  	/*

>  	 * KGDB performs cache maintenance with interrupts disabled, so we

>  	 * will deadlock trying to IPI the secondary CPUs. In theory, we can

> @@ -89,9 +89,9 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

>  	 * the patching operation, so we don't need extra IPIs here anyway.

>  	 * In which case, add a KGDB-specific bodge and return early.

>  	 */

> -	if (kgdb_connected && irqs_disabled())

> +	if (in_dbg_master())


Does this imply that irqs are disabled?

Will
Daniel Thompson May 5, 2020, 2:15 p.m. UTC | #2
On Mon, May 04, 2020 at 09:48:04PM +0100, Will Deacon wrote:
> On Mon, May 04, 2020 at 06:05:18PM +0100, Daniel Thompson wrote:

> > flush_icache_range() contains a bodge to avoid issuing IPIs when the kgdb

> > trap handler is running because issuing IPIs is unsafe (and not needed)

> > in this execution context. However the current test, based on

> > kgdb_connected is flawed: it both over-matches and under-matches.

> > 

> > The over match occurs because kgdb_connected is set when gdb attaches

> > to the stub and remains set during normal running. This is relatively

> > harmelss because in almost all cases irq_disabled() will be false.

> > 

> > The under match is more serious. When kdb is used instead of kgdb to access

> > the debugger then kgdb_connected is not set in all the places that the

> > debug core updates sw breakpoints (and hence flushes the icache). This

> > can lead to deadlock.

> > 

> > Fix by replacing the ad-hoc check with the proper kgdb macro. This also

> > allows us to drop the #ifdef wrapper.

> > 

> > Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")

> > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

> > Reviewed-by: Douglas Anderson <dianders@chromium.org>

> > ---

> > 

> > Notes:

> >     v2: Improve the commit message based based on feedback from Doug

> >         Anderson

> > 

> >  arch/arm64/include/asm/cacheflush.h | 6 +++---

> >  1 file changed, 3 insertions(+), 3 deletions(-)

> > 

> > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h

> > index e6cca3d4acf7..ce50c1f1f1ea 100644

> > --- a/arch/arm64/include/asm/cacheflush.h

> > +++ b/arch/arm64/include/asm/cacheflush.h

> > @@ -79,7 +79,7 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> >  	 * IPI all online CPUs so that they undergo a context synchronization

> >  	 * event and are forced to refetch the new instructions.

> >  	 */

> > -#ifdef CONFIG_KGDB

> > +

> >  	/*

> >  	 * KGDB performs cache maintenance with interrupts disabled, so we

> >  	 * will deadlock trying to IPI the secondary CPUs. In theory, we can

> > @@ -89,9 +89,9 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> >  	 * the patching operation, so we don't need extra IPIs here anyway.

> >  	 * In which case, add a KGDB-specific bodge and return early.

> >  	 */

> > -	if (kgdb_connected && irqs_disabled())

> > +	if (in_dbg_master())

> 

> Does this imply that irqs are disabled?


Yes.

Assuming CONFIG_KGDB is enabled then in_dbg_master() expands to:

    (raw_smp_processor_id() == atomic_read(&kgdb_active))

kgdb_active is written to from exactly four locations in the kernel and
all are within a single function, albeit a very big function with control
flow the that could politely be called "quirky". I try not to think about
what it might be impolitely called.

kgdb_active is only ever set to a value other than -1 when we are
executing the kgdb exception handler and interrupts have been
explicitly disabled.


Daniel.
Will Deacon May 5, 2020, 3:09 p.m. UTC | #3
On Tue, May 05, 2020 at 03:15:29PM +0100, Daniel Thompson wrote:
> On Mon, May 04, 2020 at 09:48:04PM +0100, Will Deacon wrote:

> > On Mon, May 04, 2020 at 06:05:18PM +0100, Daniel Thompson wrote:

> > > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h

> > > index e6cca3d4acf7..ce50c1f1f1ea 100644

> > > --- a/arch/arm64/include/asm/cacheflush.h

> > > +++ b/arch/arm64/include/asm/cacheflush.h

> > > @@ -79,7 +79,7 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> > >  	 * IPI all online CPUs so that they undergo a context synchronization

> > >  	 * event and are forced to refetch the new instructions.

> > >  	 */

> > > -#ifdef CONFIG_KGDB

> > > +

> > >  	/*

> > >  	 * KGDB performs cache maintenance with interrupts disabled, so we

> > >  	 * will deadlock trying to IPI the secondary CPUs. In theory, we can

> > > @@ -89,9 +89,9 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> > >  	 * the patching operation, so we don't need extra IPIs here anyway.

> > >  	 * In which case, add a KGDB-specific bodge and return early.

> > >  	 */

> > > -	if (kgdb_connected && irqs_disabled())

> > > +	if (in_dbg_master())

> > 

> > Does this imply that irqs are disabled?

> 

> Yes.

> 

> Assuming CONFIG_KGDB is enabled then in_dbg_master() expands to:

> 

>     (raw_smp_processor_id() == atomic_read(&kgdb_active))


Aha, so this can drop the raw_ prefix and call smp_processor_id() instead?
I can queue the arm64 patch regardless.

Cheers,

Will
Daniel Thompson May 6, 2020, 3:06 p.m. UTC | #4
On Tue, May 05, 2020 at 04:09:16PM +0100, Will Deacon wrote:
> On Tue, May 05, 2020 at 03:15:29PM +0100, Daniel Thompson wrote:

> > On Mon, May 04, 2020 at 09:48:04PM +0100, Will Deacon wrote:

> > > On Mon, May 04, 2020 at 06:05:18PM +0100, Daniel Thompson wrote:

> > > > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h

> > > > index e6cca3d4acf7..ce50c1f1f1ea 100644

> > > > --- a/arch/arm64/include/asm/cacheflush.h

> > > > +++ b/arch/arm64/include/asm/cacheflush.h

> > > > @@ -79,7 +79,7 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> > > >  	 * IPI all online CPUs so that they undergo a context synchronization

> > > >  	 * event and are forced to refetch the new instructions.

> > > >  	 */

> > > > -#ifdef CONFIG_KGDB

> > > > +

> > > >  	/*

> > > >  	 * KGDB performs cache maintenance with interrupts disabled, so we

> > > >  	 * will deadlock trying to IPI the secondary CPUs. In theory, we can

> > > > @@ -89,9 +89,9 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)

> > > >  	 * the patching operation, so we don't need extra IPIs here anyway.

> > > >  	 * In which case, add a KGDB-specific bodge and return early.

> > > >  	 */

> > > > -	if (kgdb_connected && irqs_disabled())

> > > > +	if (in_dbg_master())

> > > 

> > > Does this imply that irqs are disabled?

> > 

> > Yes.


Except for bugs...


> > 

> > Assuming CONFIG_KGDB is enabled then in_dbg_master() expands to:

> > 

> >     (raw_smp_processor_id() == atomic_read(&kgdb_active))

> 

> Aha, so this can drop the raw_ prefix and call smp_processor_id() instead?


We need to allow in_dbg_master() to be called from preemptible contexts
(because its job it to disclose information about our executions
context) but given irqs are always disabled when we in_dbg_master()
then I think we can make this and rely on short-circuit eval to
avoid PREEMPT_DEBUG errors:

    (irqs_disabled() && (smp_processor_id() == atomic_read(&kgdb_active)))


> I can queue the arm64 patch regardless.


I don't want to hide anything... when I looked closer I realized
that the above change also eliminates a small window where the original
macro can spuriously evaluate to true.

Specifically if we migrate to a new core after reading the processor
id and the previous core takes a breakpoint then we would evaluate
true if we read kgdb_active before we get the IPI to bring us to halt.

Sorry for overlooking this in my reply yesterday! I'll have a patch out
for this shortly.


Daniel.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index e6cca3d4acf7..ce50c1f1f1ea 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -79,7 +79,7 @@  static inline void flush_icache_range(unsigned long start, unsigned long end)
 	 * IPI all online CPUs so that they undergo a context synchronization
 	 * event and are forced to refetch the new instructions.
 	 */
-#ifdef CONFIG_KGDB
+
 	/*
 	 * KGDB performs cache maintenance with interrupts disabled, so we
 	 * will deadlock trying to IPI the secondary CPUs. In theory, we can
@@ -89,9 +89,9 @@  static inline void flush_icache_range(unsigned long start, unsigned long end)
 	 * the patching operation, so we don't need extra IPIs here anyway.
 	 * In which case, add a KGDB-specific bodge and return early.
 	 */
-	if (kgdb_connected && irqs_disabled())
+	if (in_dbg_master())
 		return;
-#endif
+
 	kick_all_cpus_sync();
 }