diff mbox series

[printk,10/18] kgbd: Pretend that console list walk is safe

Message ID 20220924000454.3319186-11-john.ogness@linutronix.de
State New
Headers show
Series preparation for threaded/atomic printing | expand

Commit Message

John Ogness Sept. 24, 2022, 12:04 a.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

Provide a special list iterator macro for KGDB to allow unprotected list
walks and add a few comments to explain the hope based approach.

Preperatory change for changing the console list to hlist and adding
lockdep asserts to regular list walks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/tty/serial/kgdboc.c |  5 ++++-
 include/linux/console.h     | 10 ++++++++++
 kernel/debug/kdb/kdb_io.c   |  7 ++++++-
 3 files changed, 20 insertions(+), 2 deletions(-)

Comments

Doug Anderson Sept. 28, 2022, 11:32 p.m. UTC | #1
Hi,

On Fri, Sep 23, 2022 at 5:05 PM John Ogness <john.ogness@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Provide a special list iterator macro for KGDB to allow unprotected list
> walks and add a few comments to explain the hope based approach.
>
> Preperatory change for changing the console list to hlist and adding

s/Preperatory/Preparatory

> lockdep asserts to regular list walks.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  drivers/tty/serial/kgdboc.c |  5 ++++-
>  include/linux/console.h     | 10 ++++++++++
>  kernel/debug/kdb/kdb_io.c   |  7 ++++++-
>  3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index af2aa76bae15..57a5fd27dffe 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -462,10 +462,13 @@ static void kgdboc_earlycon_pre_exp_handler(void)
>          * we have no other choice so we keep using it.  Since not all
>          * serial drivers might be OK with this, print a warning once per
>          * boot if we detect this case.
> +        *
> +        * Pretend that walking the console list is safe...

To be fair, this is not quite as unsafe as your comment makes it
sound. kgdb is a "stop the world" debugger and when this function is
executing then all of the other CPUs in the system should have been
rounded up and idle (or, perhaps, busy looping). Essentially as long
as console list manipulation is always made in a way that each
instruction keeps the list in a reasonable state then what kgdb is
doing is actually "safe". Said another way: we could drop into the
debugger at any point when a task is manipulating the console list,
but once we're in the debugger and are executing the "pre_exp_handler"
then all the other CPUs have been frozen in time.


>          */
> -       for_each_console(con)
> +       for_each_console_kgdb(con) {
>                 if (con == kgdboc_earlycon_io_ops.cons)
>                         return;
> +       }
>
>         already_warned = true;
>         pr_warn("kgdboc_earlycon is still using bootconsole\n");
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 24344f9b0bc1..86a6125512b9 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -187,6 +187,16 @@ extern void console_list_unlock(void) __releases(console_mutex);
>  #define for_each_console(con)                                          \
>         for (con = console_drivers; con != NULL; con = con->next)
>
> +/**
> + * for_each_console_kgdb() - Iterator over registered consoles for KGDB
> + * @con:       struct console pointer used as loop cursor
> + *
> + * Has no serialization requirements and KGDB pretends that this is safe.
> + * Don't use outside of the KGDB fairy tale land!
> + */
> +#define for_each_console_kgdb(con)                                     \
> +       for (con = console_drivers; con != NULL; con = con->next)
> +
>  extern int console_set_on_cmdline;
>  extern struct console *early_console;
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 67d3c48a1522..fb3775e61a3b 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -558,7 +558,12 @@ static void kdb_msg_write(const char *msg, int msg_len)
>                 cp++;
>         }
>
> -       for_each_console(c) {
> +       /*
> +        * This is a completely unprotected list walk designed by the
> +        * wishful thinking department. See the oops_in_progress comment
> +        * below - especially the encourage section...

The reality is also a little less dire here than the comment suggests.
IMO this is actually not the same as the "oops_in_progress" case that
the comment refers to.

Specifically, the "oops_in_progress" is referring to the fact that
it's not uncommon to drop into the debugger when a serial driver (the
same one you're using for kgdb) is holding its lock. Possibly it's
printing something to the tty running on the UART dumping stuff out
from the kernel's console. That's not great and I won't pretend that
the kgdb design is amazing here, but...

Just like above, I don't feel like iterating through the console list
here without holding the lock is necessarily unsafe. Just like above,
all the rest of the CPUs in the system are in a holding pattern and
aren't actively executing any code. While we may have interrupted them
at any given instruction, they won't execute any more instruction
until we leave kgdb and resume running.
Petr Mladek Sept. 30, 2022, 8:39 a.m. UTC | #2
On Wed 2022-09-28 16:32:15, Doug Anderson wrote:
> Hi,
> 
> On Fri, Sep 23, 2022 at 5:05 PM John Ogness <john.ogness@linutronix.de> wrote:
> >
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > Provide a special list iterator macro for KGDB to allow unprotected list
> > walks and add a few comments to explain the hope based approach.
> >
> > Preperatory change for changing the console list to hlist and adding
> 
> s/Preperatory/Preparatory
> 
> > lockdep asserts to regular list walks.
> >
> > diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> > index af2aa76bae15..57a5fd27dffe 100644
> > --- a/drivers/tty/serial/kgdboc.c
> > +++ b/drivers/tty/serial/kgdboc.c
> > @@ -462,10 +462,13 @@ static void kgdboc_earlycon_pre_exp_handler(void)
> >          * we have no other choice so we keep using it.  Since not all
> >          * serial drivers might be OK with this, print a warning once per
> >          * boot if we detect this case.
> > +        *
> > +        * Pretend that walking the console list is safe...
> 
> To be fair, this is not quite as unsafe as your comment makes it
> sound. kgdb is a "stop the world" debugger and when this function is
> executing then all of the other CPUs in the system should have been
> rounded up and idle (or, perhaps, busy looping). Essentially as long
> as console list manipulation is always made in a way that each
> instruction keeps the list in a reasonable state then what kgdb is
> doing is actually "safe". Said another way: we could drop into the
> debugger at any point when a task is manipulating the console list,
> but once we're in the debugger and are executing the "pre_exp_handler"
> then all the other CPUs have been frozen in time.

The code in register_console()/unregister_console() seems to
manipulate the list in the right order. But the correctness
is not guaranteed because there are neither compiler nor
memory barriers.

That said, later patches add for_each_console_srcu(). IMHO,
the SRCU walk should be safe here.

> 
> >          */
> > -       for_each_console(con)
> > +       for_each_console_kgdb(con) {
> >                 if (con == kgdboc_earlycon_io_ops.cons)
> >                         return;
> > +       }
> >
> >         already_warned = true;
> >         pr_warn("kgdboc_earlycon is still using bootconsole\n");
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -558,7 +558,12 @@ static void kdb_msg_write(const char *msg, int msg_len)
> >                 cp++;
> >         }
> >
> > -       for_each_console(c) {
> > +       /*
> > +        * This is a completely unprotected list walk designed by the
> > +        * wishful thinking department. See the oops_in_progress comment
> > +        * below - especially the encourage section...
> 
> The reality is also a little less dire here than the comment suggests.
> IMO this is actually not the same as the "oops_in_progress" case that
> the comment refers to.
>
> Specifically, the "oops_in_progress" is referring to the fact that
> it's not uncommon to drop into the debugger when a serial driver (the
> same one you're using for kgdb) is holding its lock. Possibly it's
> printing something to the tty running on the UART dumping stuff out
> from the kernel's console. That's not great and I won't pretend that
> the kgdb design is amazing here, but...
>
> Just like above, I don't feel like iterating through the console list
> here without holding the lock is necessarily unsafe. Just like above,
> all the rest of the CPUs in the system are in a holding pattern and
> aren't actively executing any code. While we may have interrupted them
> at any given instruction, they won't execute any more instruction
> until we leave kgdb and resume running.

The atomic consoles might improve the situation. Well, the hand shake
will not really work because the current owner might be stopped.
But we will at least know that the port is not in a safe state.

Anyway, what about using the later added SRCU walk here?
After all, this is exactly what RCU is for, isn't it?

Best Regards,
Petr
John Ogness Sept. 30, 2022, 1:44 p.m. UTC | #3
On 2022-09-30, Petr Mladek <pmladek@suse.com> wrote:
> Anyway, what about using the later added SRCU walk here?
> After all, this is exactly what RCU is for, isn't it?

So I think a lot of the problems with this series is that SRCU is
introduced too late. We are debating things in patch 6 that are
irrelevant by patch 12.

I will rework the series so that the changes come in the following
order:

1. provide an atomic console_is_enabled()

2. convert the list to SRCU

3. move all iterators from console_lock()/console_trylock() to SRCU

Step 3 may result in console_lock()/console_trylock() calls disappearing
or relocating to where they are needed for non-list-synchronization
purposes.

John
Petr Mladek Sept. 30, 2022, 5:27 p.m. UTC | #4
On Fri 2022-09-30 15:50:56, John Ogness wrote:
> On 2022-09-30, Petr Mladek <pmladek@suse.com> wrote:
> > Anyway, what about using the later added SRCU walk here?
> > After all, this is exactly what RCU is for, isn't it?
> 
> So I think a lot of the problems with this series is that SRCU is
> introduced too late. We are debating things in patch 6 that are
> irrelevant by patch 12.

> I will rework the series so that the changes come in the following
> order:
> 
> 1. provide an atomic console_is_enabled()
>
> 2. convert the list to SRCU
> 
> 3. move all iterators from console_lock()/console_trylock() to SRCU
> 
> Step 3 may result in console_lock()/console_trylock() calls disappearing
> or relocating to where they are needed for non-list-synchronization
> purposes.

I agree that introding SRCU as early as possible would
help. The current patchset converts the same code several times...

Best Regards,
Petr
diff mbox series

Patch

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index af2aa76bae15..57a5fd27dffe 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -462,10 +462,13 @@  static void kgdboc_earlycon_pre_exp_handler(void)
 	 * we have no other choice so we keep using it.  Since not all
 	 * serial drivers might be OK with this, print a warning once per
 	 * boot if we detect this case.
+	 *
+	 * Pretend that walking the console list is safe...
 	 */
-	for_each_console(con)
+	for_each_console_kgdb(con) {
 		if (con == kgdboc_earlycon_io_ops.cons)
 			return;
+	}
 
 	already_warned = true;
 	pr_warn("kgdboc_earlycon is still using bootconsole\n");
diff --git a/include/linux/console.h b/include/linux/console.h
index 24344f9b0bc1..86a6125512b9 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,16 @@  extern void console_list_unlock(void) __releases(console_mutex);
 #define for_each_console(con)						\
 	for (con = console_drivers; con != NULL; con = con->next)
 
+/**
+ * for_each_console_kgdb() - Iterator over registered consoles for KGDB
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Has no serialization requirements and KGDB pretends that this is safe.
+ * Don't use outside of the KGDB fairy tale land!
+ */
+#define for_each_console_kgdb(con)					\
+	for (con = console_drivers; con != NULL; con = con->next)
+
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 67d3c48a1522..fb3775e61a3b 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -558,7 +558,12 @@  static void kdb_msg_write(const char *msg, int msg_len)
 		cp++;
 	}
 
-	for_each_console(c) {
+	/*
+	 * This is a completely unprotected list walk designed by the
+	 * wishful thinking department. See the oops_in_progress comment
+	 * below - especially the encourage section...
+	 */
+	for_each_console_kgdb(c) {
 		if (!(c->flags & CON_ENABLED))
 			continue;
 		if (c == dbg_io_ops->cons)