diff mbox series

[v3,3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

Message ID 20200914130143.1322802-4-daniel.thompson@linaro.org
State Superseded
Headers show
Series kgdb: Honour the kprobe blocklist when setting breakpoints | expand

Commit Message

Daniel Thompson Sept. 14, 2020, 1:01 p.m. UTC
During debug trap execution we expect dbg_deactivate_sw_breakpoints()
to be paired with an dbg_activate_sw_breakpoint(). Currently although
the calls are paired correctly they are needlessly smeared across three
different functions. Worse this also results in code to drive polled I/O
being called with breakpoints activated which, in turn, needlessly
increases the set of functions that will recursively trap if breakpointed.

Fix this by moving the activation of breakpoints into the debug core.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

---
 kernel/debug/debug_core.c       | 2 ++
 kernel/debug/gdbstub.c          | 1 -
 kernel/debug/kdb/kdb_debugger.c | 2 --
 3 files changed, 2 insertions(+), 3 deletions(-)

-- 
2.25.4

Comments

Doug Anderson Sept. 15, 2020, 12:13 a.m. UTC | #1
Hi,

On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>

> During debug trap execution we expect dbg_deactivate_sw_breakpoints()

> to be paired with an dbg_activate_sw_breakpoint(). Currently although

> the calls are paired correctly they are needlessly smeared across three

> different functions. Worse this also results in code to drive polled I/O

> being called with breakpoints activated which, in turn, needlessly

> increases the set of functions that will recursively trap if breakpointed.

>

> Fix this by moving the activation of breakpoints into the debug core.

>

> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

> ---

>  kernel/debug/debug_core.c       | 2 ++

>  kernel/debug/gdbstub.c          | 1 -

>  kernel/debug/kdb/kdb_debugger.c | 2 --

>  3 files changed, 2 insertions(+), 3 deletions(-)


I like the idea, but previously the kgdb_arch_handle_exception() was
always called after the SW breakpoints were activated.  Are you sure
it's OK to swap those two orders across all architectures?

-Doug
Daniel Thompson Sept. 15, 2020, 1:45 p.m. UTC | #2
On Mon, Sep 14, 2020 at 05:13:28PM -0700, Doug Anderson wrote:
> Hi,

> 

> On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson

> <daniel.thompson@linaro.org> wrote:

> >

> > During debug trap execution we expect dbg_deactivate_sw_breakpoints()

> > to be paired with an dbg_activate_sw_breakpoint(). Currently although

> > the calls are paired correctly they are needlessly smeared across three

> > different functions. Worse this also results in code to drive polled I/O

> > being called with breakpoints activated which, in turn, needlessly

> > increases the set of functions that will recursively trap if breakpointed.

> >

> > Fix this by moving the activation of breakpoints into the debug core.

> >

> > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

> > ---

> >  kernel/debug/debug_core.c       | 2 ++

> >  kernel/debug/gdbstub.c          | 1 -

> >  kernel/debug/kdb/kdb_debugger.c | 2 --

> >  3 files changed, 2 insertions(+), 3 deletions(-)

> 

> I like the idea, but previously the kgdb_arch_handle_exception() was

> always called after the SW breakpoints were activated.  Are you sure

> it's OK to swap those two orders across all architectures?


Pretty sure, yes.

However, given the poor attention to detail I demonstrated in patch 2/3,
I figure I'd better write out the full chain of reasoning if I want
you to trust me ;-) .

kgdb_arch_handle_exception() is already called frequently with
breakpoints disabled since it is basically a fallback that is called
whenever the architecture neutral parts of the gdb packet processing
cannot cope.

So your question becomes more specific: is it OK to swap orders when the
architecture code is handling a (c)ontinue or (s)tep packet[1]?

The reason the architecture neutral part cannot cope is because it
because *if* gdb has been instructed that the PC must be changed before
resuming then the architecture neutral code does not know how to effect
this. In other words the call to kgdb_arch_handle_exception() will
boil down to doing whatever the architecture equivalent of writing to
linux_regs->pc actually is (or return an error).

Updating the PC is safe whether or not breakpoints are deactivated
and activating the breakpoints if the architecture code actually
censors the resume is a bug (which we just fixed).


Daniel.


[1]
   The fallthroughs aren't a whole lot of fun to read but
   if gdb_cmd_exception_pass() provokes a fallthrough then it will
   have rewritten the packet as a (c)ontinue.
Doug Anderson Sept. 16, 2020, 11:34 p.m. UTC | #3
Hi,

On Tue, Sep 15, 2020 at 6:45 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>

> On Mon, Sep 14, 2020 at 05:13:28PM -0700, Doug Anderson wrote:

> > Hi,

> >

> > On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson

> > <daniel.thompson@linaro.org> wrote:

> > >

> > > During debug trap execution we expect dbg_deactivate_sw_breakpoints()

> > > to be paired with an dbg_activate_sw_breakpoint(). Currently although

> > > the calls are paired correctly they are needlessly smeared across three

> > > different functions. Worse this also results in code to drive polled I/O

> > > being called with breakpoints activated which, in turn, needlessly

> > > increases the set of functions that will recursively trap if breakpointed.

> > >

> > > Fix this by moving the activation of breakpoints into the debug core.

> > >

> > > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

> > > ---

> > >  kernel/debug/debug_core.c       | 2 ++

> > >  kernel/debug/gdbstub.c          | 1 -

> > >  kernel/debug/kdb/kdb_debugger.c | 2 --

> > >  3 files changed, 2 insertions(+), 3 deletions(-)

> >

> > I like the idea, but previously the kgdb_arch_handle_exception() was

> > always called after the SW breakpoints were activated.  Are you sure

> > it's OK to swap those two orders across all architectures?

>

> Pretty sure, yes.

>

> However, given the poor attention to detail I demonstrated in patch 2/3,

> I figure I'd better write out the full chain of reasoning if I want

> you to trust me ;-) .

>

> kgdb_arch_handle_exception() is already called frequently with

> breakpoints disabled since it is basically a fallback that is called

> whenever the architecture neutral parts of the gdb packet processing

> cannot cope.

>

> So your question becomes more specific: is it OK to swap orders when the

> architecture code is handling a (c)ontinue or (s)tep packet[1]?

>

> The reason the architecture neutral part cannot cope is because it

> because *if* gdb has been instructed that the PC must be changed before

> resuming then the architecture neutral code does not know how to effect

> this. In other words the call to kgdb_arch_handle_exception() will

> boil down to doing whatever the architecture equivalent of writing to

> linux_regs->pc actually is (or return an error).

>

> Updating the PC is safe whether or not breakpoints are deactivated

> and activating the breakpoints if the architecture code actually

> censors the resume is a bug (which we just fixed).


OK, fair enough.  Without digging into all the arch code, I was
assuming that some arch somewhere could be playing tricks on us.
After all, the arch code is told about all the breakpoints
(kgdb_arch_set_breakpoint) so, in theory, it could be doing something
funky (not applying the breakpoint until it sees the "continue" or
something?).

I guess the one thing that seems the most plausible that an
architecture might be doing is doing something special if it is told
to "continue" or "single step" an address that had a breakpoint on it.
I guess I could imagine that on some architectures this might require
special handling (could it be somehow illegal to run the CPU in step
mode over a breakpoint instruction?).  ...or maybe if it was using
hardware breakpoints those don't trigger properly if you're continuing
to the address of the breakpoint?  I'm making stuff up here and
presumably none of this is true, but it's what I was worried about.

From a quick glance, I don't _think_ anyone is doing this.  Presumably
today they'd either need a) a funky implementation for
kgdb_arch_set_breakpoint() or they'd need b) code somewhere which read
memory and looked for "gdb_bpt_instr".

a) Looking at kgdb_arch_set_breakpoint(), I don't see anything too
funky.  They all look like nearly the same code of writing the
breakpoint to memory, perhaps taking into account locked .text space
by using a special patch routine.

b) Looking at users of "gdb_bpt_instr" I do see some funkiness in
"h8300" and "microblaze", but nothing that looks like it fits the bill
of what we're looking for.

In any case, even if someone was doing something funky, it would be
possible to adapt the code to the new way of the world.  You'd just
check the PC when applying breakpoints rather than checking the
breakpoints when continuing.  So I'm going to go ahead and say:

Reviewed-by: Douglas Anderson <dianders@chromium.org>


...and I guess if it really truly causes problems for someone we'll
figure it out.


Feel free to ignore the below.  I wrote it up but realized it probably
wasn't useful but couldn't bear to delete it.  :-P

One somewhat plausible example (I don't think this happens, so feel
free to let your eyes glaze over and ignore).  Assume that the kernel
has a mix of ARM and Thumb code.  When told to set a breakpoint at an
address the kernel wouldn't know whether the address refers to ARM or
Thumb code.  In the past I've solved this type of problem by using an
instruction as a breakpoint that is an illegal instruction when
interpreted any which way (the instruction is illegal as an ARM
instruction and both halves illegal as a Thumb instruction).  Right
when we're about to continue, though, we actually have extra
information about the PC we're going to continue to.  If we know we're
about to continue in Thumb mode and the address we're going to
continue on is the 2nd half of a breakpoint instruction we suddenly
know that the breakpoint should have been a Thumb-mode instruction and
we could fix it up.

AKA:

1. kernel is asked to set a breakpoint at 0x12340000.  We don't know
if this is arm or thumb so we set a 32-bit (ARM) breakpoint at
0x12340000

2. We're told to continue in thumb mode at address 0x12340002

3. We suddenly know that the breakpoint at 0x12340000 should have been
a 16-bit (Thumb) breakpoint, not a 32-bit (ARM) breakpoint, so we
could fix it up before continuing.

OK, that probably was just confusing, and, like I said, I don't think
kdb in Linux does that.  ;-)

-Doug
diff mbox series

Patch

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 9618c1e2faf6..74d42d3f3180 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -763,6 +763,8 @@  static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 		}
 	}
 
+	dbg_activate_sw_breakpoints();
+
 	/* Call the I/O driver's post_exception routine */
 	if (dbg_io_ops->post_exception)
 		dbg_io_ops->post_exception();
diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index cc3c43dfec44..e9a9c3097089 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -1061,7 +1061,6 @@  int gdb_serial_stub(struct kgdb_state *ks)
 				error_packet(remcom_out_buffer, -EINVAL);
 				break;
 			}
-			dbg_activate_sw_breakpoints();
 			fallthrough;	/* to default processing */
 		default:
 default_handle:
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 53a0df6e4d92..0220afda3200 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -147,7 +147,6 @@  int kdb_stub(struct kgdb_state *ks)
 		return DBG_PASS_EVENT;
 	}
 	kdb_bp_install(ks->linux_regs);
-	dbg_activate_sw_breakpoints();
 	/* Set the exit state to a single step or a continue */
 	if (KDB_STATE(DOING_SS))
 		gdbstub_state(ks, "s");
@@ -167,7 +166,6 @@  int kdb_stub(struct kgdb_state *ks)
 		 * differently vs the gdbstub
 		 */
 		kgdb_single_step = 0;
-		dbg_deactivate_sw_breakpoints();
 		return DBG_SWITCH_CPU_EVENT;
 	}
 	return kgdb_info[ks->cpu].ret_state;