diff mbox series

[09/12] trace-cmd analyze: Add wake up latency timings

Message ID 20220324025726.1727204-10-rostedt@goodmis.org
State New
Headers show
Series trace-cmd: Add trace-cmd analyze command | expand

Commit Message

Steven Rostedt March 24, 2022, 2:57 a.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

If the sched_waking (or sched_wakeup) event is found, then use it to time
the wake up latency for each task.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 tracecmd/trace-analyze.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

John Keeping March 25, 2022, 7:34 p.m. UTC | #1
On Wed, Mar 23, 2022 at 10:57:23PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> If the sched_waking (or sched_wakeup) event is found, then use it to time
> the wake up latency for each task.
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> +static void process_wakeup(struct analysis_data *data,
> +			   struct tep_handle *tep,
> +			   struct tep_record *record)
> +{
> +	struct cpu_data *cpu_data = &data->cpu_data[record->cpu];
> +	struct task_cpu_item *cpu_task;
> +	struct task_item *task;
> +	unsigned long long val;
> +	int pid;
> +
> +	tep_read_number_field(data->wakeup_pid, record->data, &val);
> +	pid = val;
> +	cpu_task = get_cpu_task(cpu_data, pid);

Should this use get_task() instead of get_cpu_task()?

I was trying this out and it provides a great overview of a trace but I
found it strange that tasks affine to CPU N were also appearing in the
list for CPU M.

> +	task = cpu_task->task;
> +	task->wakeup.last = record->ts;
> +	task->woken = true;
> +}
Steven Rostedt March 25, 2022, 8:13 p.m. UTC | #2
On Fri, 25 Mar 2022 19:34:20 +0000
John Keeping <john@metanate.com> wrote:

> > +static void process_wakeup(struct analysis_data *data,
> > +			   struct tep_handle *tep,
> > +			   struct tep_record *record)
> > +{
> > +	struct cpu_data *cpu_data = &data->cpu_data[record->cpu];
> > +	struct task_cpu_item *cpu_task;
> > +	struct task_item *task;
> > +	unsigned long long val;
> > +	int pid;
> > +
> > +	tep_read_number_field(data->wakeup_pid, record->data, &val);
> > +	pid = val;
> > +	cpu_task = get_cpu_task(cpu_data, pid);  
> 
> Should this use get_task() instead of get_cpu_task()?
> 
> I was trying this out and it provides a great overview of a trace but I
> found it strange that tasks affine to CPU N were also appearing in the
> list for CPU M.

Yes, you are correct, because I noticed the same issue :-D

I'll fix it in v2.

Thanks,

-- Steve
Steven Rostedt March 25, 2022, 8:31 p.m. UTC | #3
On Fri, 25 Mar 2022 16:13:33 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I'll fix it in v2.

BTW, I plan on rewriting a lot of this to move it into libtracecmd. That
way other tools can easily extract data from it.

For example, instead of hard coding "page faults" and "wake ups" I plan on
adding a way to register events to trigger on. Something like:

 tracecmd_analyze_register_event_timing(handle, name, start_event,
		 start_field, start_record, end_event, end_field,
		 end_record);

Where "start_record" and "end_record" is if the field is a task to record.

enum {
	TRACECMD_ANALYZE_FL_NONE,
	TRACECMD_ANALYZE_FL_PID,
	TRACECMD_ANALYZE_FL_PID_CPU,
};

Thus, we could have:

	tracecmd_analyze_register_event_timing(handle,
			"Wakeup", sched_waking,
			pid_field, TRACECMD_ANALYZE_FL_PID,
			sched_switch, next_pid,
			TRACECMD_ANALYZE_FL_PID_CPU);

Or something like the above, that will end up turning into the
process_wakeup() function (and properly record the waking pid globally).

One tool I want to use this with, is my ktrace shell (development has
stalled, but I do plan on starting it again). 

  https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/ktrace.git/

And then I could have commands like:

ktrace> open trace.dat
ktrace> analyze register timing sched.sched_switch pid global \
  sched.sched_switch next_pid cpu
ktrace> analyze
ktrace> analyze show cpu 1

And it only shows cpu 1 data.

ktrace> analyze show pid 512

to see just pid 512

etc.

-- Steve
John Keeping March 26, 2022, 11:14 a.m. UTC | #4
On Fri, Mar 25, 2022 at 04:31:48PM -0400, Steven Rostedt wrote:
> On Fri, 25 Mar 2022 16:13:33 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I'll fix it in v2.
> 
> BTW, I plan on rewriting a lot of this to move it into libtracecmd. That
> way other tools can easily extract data from it.
> 
> For example, instead of hard coding "page faults" and "wake ups" I plan on
> adding a way to register events to trigger on. Something like:
> 
>  tracecmd_analyze_register_event_timing(handle, name, start_event,
> 		 start_field, start_record, end_event, end_field,
> 		 end_record);
> 
> Where "start_record" and "end_record" is if the field is a task to record.
> 
> enum {
> 	TRACECMD_ANALYZE_FL_NONE,
> 	TRACECMD_ANALYZE_FL_PID,
> 	TRACECMD_ANALYZE_FL_PID_CPU,
> };
> 
> Thus, we could have:
> 
> 	tracecmd_analyze_register_event_timing(handle,
> 			"Wakeup", sched_waking,
> 			pid_field, TRACECMD_ANALYZE_FL_PID,
> 			sched_switch, next_pid,
> 			TRACECMD_ANALYZE_FL_PID_CPU);
> 
> Or something like the above, that will end up turning into the
> process_wakeup() function (and properly record the waking pid globally).

This sounds really neat.

I started having a look at whether it's possible to add an overview of
interrupts from irq:irq_handler_entry & irq:irq_handler_exit and I guess
that would fit in TRACECMD_ANALYZE_FL_NONE above, but it would be good
to have per-cpu stats for irqs as well so does that mean an additional
record type is needed?
Steven Rostedt March 26, 2022, 3:32 p.m. UTC | #5
On Sat, 26 Mar 2022 11:14:06 +0000
John Keeping <john@metanate.com> wrote:

> I started having a look at whether it's possible to add an overview of
> interrupts from irq:irq_handler_entry & irq:irq_handler_exit and I guess
> that would fit in TRACECMD_ANALYZE_FL_NONE above, but it would be good
> to have per-cpu stats for irqs as well so does that mean an additional
> record type is needed?

I actually created that enum for exactly that ;-) To be able to add
different metrics in the future. Which is why it states "PID". I
actually did think about adding things like

  TRACECMD_ANALYZE_FL_IRQ

To say this is to add irq interruptions. Perhaps even add:

  TRACECMD_ANALYZE_FL_STOLEN

for guest stolen times.

I haven't decided on that API yet. Perhaps we could add new custom
timings too, and include them somehow. I have to think about a good API
to use.

-- Steve
diff mbox series

Patch

diff --git a/tracecmd/trace-analyze.c b/tracecmd/trace-analyze.c
index 4db93367727b..7cf63923b962 100644
--- a/tracecmd/trace-analyze.c
+++ b/tracecmd/trace-analyze.c
@@ -33,10 +33,12 @@  struct analysis_data {
 	unsigned long long	start_ts;
 	unsigned long long	last_ts;
 	struct tep_event	*switch_event;
+	struct tep_event	*wakeup_event;
 	struct tep_format_field	*prev_comm;
 	struct tep_format_field	*prev_state;
 	struct tep_format_field	*next_comm;
 	struct tep_format_field	*next_pid;
+	struct tep_format_field	*wakeup_pid;
 	struct cpu_data		*cpu_data;
 	struct trace_hash	tasks;
 	int			nr_tasks;
@@ -60,11 +62,13 @@  struct task_item {
 	struct sched_timings	sleep;
 	struct sched_timings	blocked;
 	struct sched_timings	other;
+	struct sched_timings	wakeup;
 	char			*comm;
 	struct trace_hash_item	hash;
 	int			pid;
 	int			last_cpu;
 	int			last_state;
+	bool			woken;
 };
 
 struct task_cpu_item {
@@ -420,9 +424,31 @@  static void process_switch(struct analysis_data *data,
 			comm = (char *)(record->data + data->next_comm->offset);
 			task->comm = strdup(comm);
 		}
+
+		if (task->woken)
+			update_sched_timings(&task->wakeup, record->ts);
+		task->woken = false;
 	}
 }
 
+static void process_wakeup(struct analysis_data *data,
+			   struct tep_handle *tep,
+			   struct tep_record *record)
+{
+	struct cpu_data *cpu_data = &data->cpu_data[record->cpu];
+	struct task_cpu_item *cpu_task;
+	struct task_item *task;
+	unsigned long long val;
+	int pid;
+
+	tep_read_number_field(data->wakeup_pid, record->data, &val);
+	pid = val;
+	cpu_task = get_cpu_task(cpu_data, pid);
+	task = cpu_task->task;
+	task->wakeup.last = record->ts;
+	task->woken = true;
+}
+
 static bool match_type(int type, struct tep_event *event)
 {
 	return event && type == event->id;
@@ -446,6 +472,9 @@  static void process_cpu(struct analysis_data *data,
 	type = tep_data_type(tep, record);
 	if (match_type(type, data->switch_event))
 		process_switch(data, tep, pid, record);
+
+	else if (match_type(type, data->wakeup_event))
+		process_wakeup(data, tep, record);
 }
 
 static int cmp_tasks(const void *A, const void *B)
@@ -637,6 +666,7 @@  static void print_task(struct tep_handle *tep, struct task_item *task)
 	if (task->migrated)
 		printf("Migrated:\t%llu\n", task->migrated);
 	print_timings_title("Type");
+	print_sched_timings("Wakeup", &task->wakeup);
 	print_sched_timings("Preempted", &task->preempt);
 	print_sched_timings("Blocked", &task->blocked);
 	print_sched_timings("Sleeping", &task->sleep);
@@ -788,6 +818,9 @@  static void do_trace_analyze(struct tracecmd_input *handle)
 	trace_hash_init(&data.tasks, 128);
 
 	data.switch_event = tep_find_event_by_name(tep, "sched", "sched_switch");
+	data.wakeup_event = tep_find_event_by_name(tep, "sched", "sched_waking");
+	if (!data.wakeup_event)
+		data.wakeup_event = tep_find_event_by_name(tep, "sched", "sched_wakeup");
 
 	/* Set to a very large number */
 	data.start_ts = -1ULL;
@@ -799,6 +832,12 @@  static void do_trace_analyze(struct tracecmd_input *handle)
 		data.prev_state = tep_find_field(data.switch_event, "prev_state");
 	}
 
+	if (data.wakeup_event) {
+		data.wakeup_pid = tep_find_field(data.wakeup_event, "pid");
+		if (!data.wakeup_pid)
+			data.wakeup_event = NULL;
+	}
+
 	do {
 		record = tracecmd_read_next_data(handle, NULL);
 		if (record)