Message ID | 20220324025726.1727204-10-rostedt@goodmis.org |
---|---|
State | New |
Headers | show |
Series | trace-cmd: Add trace-cmd analyze command | expand |
On Wed, Mar 23, 2022 at 10:57:23PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > If the sched_waking (or sched_wakeup) event is found, then use it to time > the wake up latency for each task. > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > +static void process_wakeup(struct analysis_data *data, > + struct tep_handle *tep, > + struct tep_record *record) > +{ > + struct cpu_data *cpu_data = &data->cpu_data[record->cpu]; > + struct task_cpu_item *cpu_task; > + struct task_item *task; > + unsigned long long val; > + int pid; > + > + tep_read_number_field(data->wakeup_pid, record->data, &val); > + pid = val; > + cpu_task = get_cpu_task(cpu_data, pid); Should this use get_task() instead of get_cpu_task()? I was trying this out and it provides a great overview of a trace but I found it strange that tasks affine to CPU N were also appearing in the list for CPU M. > + task = cpu_task->task; > + task->wakeup.last = record->ts; > + task->woken = true; > +}
On Fri, 25 Mar 2022 19:34:20 +0000 John Keeping <john@metanate.com> wrote: > > +static void process_wakeup(struct analysis_data *data, > > + struct tep_handle *tep, > > + struct tep_record *record) > > +{ > > + struct cpu_data *cpu_data = &data->cpu_data[record->cpu]; > > + struct task_cpu_item *cpu_task; > > + struct task_item *task; > > + unsigned long long val; > > + int pid; > > + > > + tep_read_number_field(data->wakeup_pid, record->data, &val); > > + pid = val; > > + cpu_task = get_cpu_task(cpu_data, pid); > > Should this use get_task() instead of get_cpu_task()? > > I was trying this out and it provides a great overview of a trace but I > found it strange that tasks affine to CPU N were also appearing in the > list for CPU M. Yes, you are correct, because I noticed the same issue :-D I'll fix it in v2. Thanks, -- Steve
On Fri, 25 Mar 2022 16:13:33 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> I'll fix it in v2.
BTW, I plan on rewriting a lot of this to move it into libtracecmd. That
way other tools can easily extract data from it.
For example, instead of hard coding "page faults" and "wake ups" I plan on
adding a way to register events to trigger on. Something like:
tracecmd_analyze_register_event_timing(handle, name, start_event,
start_field, start_record, end_event, end_field,
end_record);
Where "start_record" and "end_record" is if the field is a task to record.
enum {
TRACECMD_ANALYZE_FL_NONE,
TRACECMD_ANALYZE_FL_PID,
TRACECMD_ANALYZE_FL_PID_CPU,
};
Thus, we could have:
tracecmd_analyze_register_event_timing(handle,
"Wakeup", sched_waking,
pid_field, TRACECMD_ANALYZE_FL_PID,
sched_switch, next_pid,
TRACECMD_ANALYZE_FL_PID_CPU);
Or something like the above, that will end up turning into the
process_wakeup() function (and properly record the waking pid globally).
One tool I want to use this with, is my ktrace shell (development has
stalled, but I do plan on starting it again).
https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/ktrace.git/
And then I could have commands like:
ktrace> open trace.dat
ktrace> analyze register timing sched.sched_switch pid global \
sched.sched_switch next_pid cpu
ktrace> analyze
ktrace> analyze show cpu 1
And it only shows cpu 1 data.
ktrace> analyze show pid 512
to see just pid 512
etc.
-- Steve
On Fri, Mar 25, 2022 at 04:31:48PM -0400, Steven Rostedt wrote: > On Fri, 25 Mar 2022 16:13:33 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > I'll fix it in v2. > > BTW, I plan on rewriting a lot of this to move it into libtracecmd. That > way other tools can easily extract data from it. > > For example, instead of hard coding "page faults" and "wake ups" I plan on > adding a way to register events to trigger on. Something like: > > tracecmd_analyze_register_event_timing(handle, name, start_event, > start_field, start_record, end_event, end_field, > end_record); > > Where "start_record" and "end_record" is if the field is a task to record. > > enum { > TRACECMD_ANALYZE_FL_NONE, > TRACECMD_ANALYZE_FL_PID, > TRACECMD_ANALYZE_FL_PID_CPU, > }; > > Thus, we could have: > > tracecmd_analyze_register_event_timing(handle, > "Wakeup", sched_waking, > pid_field, TRACECMD_ANALYZE_FL_PID, > sched_switch, next_pid, > TRACECMD_ANALYZE_FL_PID_CPU); > > Or something like the above, that will end up turning into the > process_wakeup() function (and properly record the waking pid globally). This sounds really neat. I started having a look at whether it's possible to add an overview of interrupts from irq:irq_handler_entry & irq:irq_handler_exit and I guess that would fit in TRACECMD_ANALYZE_FL_NONE above, but it would be good to have per-cpu stats for irqs as well so does that mean an additional record type is needed?
On Sat, 26 Mar 2022 11:14:06 +0000 John Keeping <john@metanate.com> wrote: > I started having a look at whether it's possible to add an overview of > interrupts from irq:irq_handler_entry & irq:irq_handler_exit and I guess > that would fit in TRACECMD_ANALYZE_FL_NONE above, but it would be good > to have per-cpu stats for irqs as well so does that mean an additional > record type is needed? I actually created that enum for exactly that ;-) To be able to add different metrics in the future. Which is why it states "PID". I actually did think about adding things like TRACECMD_ANALYZE_FL_IRQ To say this is to add irq interruptions. Perhaps even add: TRACECMD_ANALYZE_FL_STOLEN for guest stolen times. I haven't decided on that API yet. Perhaps we could add new custom timings too, and include them somehow. I have to think about a good API to use. -- Steve
diff --git a/tracecmd/trace-analyze.c b/tracecmd/trace-analyze.c index 4db93367727b..7cf63923b962 100644 --- a/tracecmd/trace-analyze.c +++ b/tracecmd/trace-analyze.c @@ -33,10 +33,12 @@ struct analysis_data { unsigned long long start_ts; unsigned long long last_ts; struct tep_event *switch_event; + struct tep_event *wakeup_event; struct tep_format_field *prev_comm; struct tep_format_field *prev_state; struct tep_format_field *next_comm; struct tep_format_field *next_pid; + struct tep_format_field *wakeup_pid; struct cpu_data *cpu_data; struct trace_hash tasks; int nr_tasks; @@ -60,11 +62,13 @@ struct task_item { struct sched_timings sleep; struct sched_timings blocked; struct sched_timings other; + struct sched_timings wakeup; char *comm; struct trace_hash_item hash; int pid; int last_cpu; int last_state; + bool woken; }; struct task_cpu_item { @@ -420,9 +424,31 @@ static void process_switch(struct analysis_data *data, comm = (char *)(record->data + data->next_comm->offset); task->comm = strdup(comm); } + + if (task->woken) + update_sched_timings(&task->wakeup, record->ts); + task->woken = false; } } +static void process_wakeup(struct analysis_data *data, + struct tep_handle *tep, + struct tep_record *record) +{ + struct cpu_data *cpu_data = &data->cpu_data[record->cpu]; + struct task_cpu_item *cpu_task; + struct task_item *task; + unsigned long long val; + int pid; + + tep_read_number_field(data->wakeup_pid, record->data, &val); + pid = val; + cpu_task = get_cpu_task(cpu_data, pid); + task = cpu_task->task; + task->wakeup.last = record->ts; + task->woken = true; +} + static bool match_type(int type, struct tep_event *event) { return event && type == event->id; @@ -446,6 +472,9 @@ static void process_cpu(struct analysis_data *data, type = tep_data_type(tep, record); if (match_type(type, data->switch_event)) process_switch(data, tep, pid, record); + + else if (match_type(type, data->wakeup_event)) + process_wakeup(data, tep, record); } static int cmp_tasks(const void *A, const void *B) @@ -637,6 +666,7 @@ static void print_task(struct tep_handle *tep, struct task_item *task) if (task->migrated) printf("Migrated:\t%llu\n", task->migrated); print_timings_title("Type"); + print_sched_timings("Wakeup", &task->wakeup); print_sched_timings("Preempted", &task->preempt); print_sched_timings("Blocked", &task->blocked); print_sched_timings("Sleeping", &task->sleep); @@ -788,6 +818,9 @@ static void do_trace_analyze(struct tracecmd_input *handle) trace_hash_init(&data.tasks, 128); data.switch_event = tep_find_event_by_name(tep, "sched", "sched_switch"); + data.wakeup_event = tep_find_event_by_name(tep, "sched", "sched_waking"); + if (!data.wakeup_event) + data.wakeup_event = tep_find_event_by_name(tep, "sched", "sched_wakeup"); /* Set to a very large number */ data.start_ts = -1ULL; @@ -799,6 +832,12 @@ static void do_trace_analyze(struct tracecmd_input *handle) data.prev_state = tep_find_field(data.switch_event, "prev_state"); } + if (data.wakeup_event) { + data.wakeup_pid = tep_find_field(data.wakeup_event, "pid"); + if (!data.wakeup_pid) + data.wakeup_event = NULL; + } + do { record = tracecmd_read_next_data(handle, NULL); if (record)