diff mbox series

[17/17] libtracefs: Add man page for tracefs_sql()

Message ID 20210730221824.595597-18-rostedt@goodmis.org
State New
Headers show
Series libtracefs: Introducing tracefs_sql() to create synthetice events with an SQL line | expand

Commit Message

Steven Rostedt July 30, 2021, 10:18 p.m. UTC
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add a man page for tracefs_sql(). Included in that man page is a full
working sql parser example program that can allow you to create synthetic
events from writing SQL on the command line.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/libtracefs-sql.txt | 367 +++++++++++++++++++++++++++++++
 1 file changed, 367 insertions(+)
 create mode 100644 Documentation/libtracefs-sql.txt

Comments

Ahmed S. Darwish Aug. 1, 2021, 1:39 p.m. UTC | #1
On Fri, Jul 30, 2021, Steven Rostedt wrote:
> +

> +The SQL format is as follows:

> +

> +*SELECT* <fields> FROM <start-event> JOIN <end-event> ON <matching-fields> WHERE <filter>

> +

> +Note, although the examples show the SQL commands in uppercase, they are not required to

> +be so. That is, you can use "SELECT" or "select" or "sElEct".

> +


Maybe it would be helpful to mention that, unlike normal SELECT queries,
the JOIN and ON parts above are _not_ optional?

That is, generic "one event source" queries:

  SELECT common_pid,msr,val FROM write_msr WHERE msr=72

are not supported. (I wish they were though ;-))

BTW, thanks a lot for this work. It will finally make synthetic events
usable!

Kind regards,

--
Ahmed S. Darwish
Linutronix GmbH
Steven Rostedt Aug. 1, 2021, 10:29 p.m. UTC | #2
On Sun, 1 Aug 2021 15:39:25 +0200
"Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:

> On Fri, Jul 30, 2021, Steven Rostedt wrote:

> > +

> > +The SQL format is as follows:

> > +

> > +*SELECT* <fields> FROM <start-event> JOIN <end-event> ON <matching-fields> WHERE <filter>

> > +

> > +Note, although the examples show the SQL commands in uppercase, they are not required to

> > +be so. That is, you can use "SELECT" or "select" or "sElEct".

> > +  

> 

> Maybe it would be helpful to mention that, unlike normal SELECT queries,

> the JOIN and ON parts above are _not_ optional?

> 

> That is, generic "one event source" queries:

> 

>   SELECT common_pid,msr,val FROM write_msr WHERE msr=72

> 

> are not supported. (I wish they were though ;-))


Actually, the sql parser should support it, but it will fail on the
creation of events. That's because I started trying to make this create
normal histograms. The problem is, that it can't really do a 1 to 1 on
histograms and selects, so I gave up. But perhaps for the subset it can
create, maybe I can still have it do so. That may require changing the
API slightly. 

I'm not a big SQL person, so I don't know all the magic and I have no
idea how to add the "values" part of the hist trigger.

I could use SQL CAST to let people redefine the types that histograms
allow, like:

   SELECT CAST(common_pid AS execname), CAST(id AS syscall) FROM sys_enter

and have that produce:

 echo 'hist:keys=common_pid.execname,id.syscall' > events/raw_syscalls/sys_enter/trigger

Any ideas for the syntax to get values? Or do people care about the
value parameter any more than just "hitcount"? I don't think I have
ever used it for anything.

> 

> BTW, thanks a lot for this work. It will finally make synthetic events

> usable!

> 


Thanks! It's also something to make it easier for me to use. And
hopefully, allow for even more complex synthetic events.

-- Steve
Ahmed S. Darwish Aug. 4, 2021, 12:27 p.m. UTC | #3
On Sun, Aug 01, 2021, Steven Rostedt wrote:
> On Sun, 1 Aug 2021 15:39:25 +0200

> "Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:

>

> > On Fri, Jul 30, 2021, Steven Rostedt wrote:

> > > +

> > > +The SQL format is as follows:

> > > +

> > > +*SELECT* <fields> FROM <start-event> JOIN <end-event> ON <matching-fields> WHERE <filter>

> > > +

> > > +Note, although the examples show the SQL commands in uppercase, they are not required to

> > > +be so. That is, you can use "SELECT" or "select" or "sElEct".

> > > +

> >

> > Maybe it would be helpful to mention that, unlike normal SELECT queries,

> > the JOIN and ON parts above are _not_ optional?

> >

> > That is, generic "one event source" queries:

> >

> >   SELECT common_pid,msr,val FROM write_msr WHERE msr=72

> >

> > are not supported. (I wish they were though ;-))

>

> Actually, the sql parser should support it, but it will fail on the

> creation of events. That's because I started trying to make this create

> normal histograms. The problem is, that it can't really do a 1 to 1 on

> histograms and selects, so I gave up. But perhaps for the subset it can

> create, maybe I can still have it do so. That may require changing the

> API slightly.

>

> I'm not a big SQL person, so I don't know all the magic and I have no

> idea how to add the "values" part of the hist trigger.

>


Thanks! I've replied at the v2 thread.

(Discovered after-the-fact that a v3 was already sent, sorry..)

--
Ahmed S. Darwish
Linutronix GmbH
diff mbox series

Patch

diff --git a/Documentation/libtracefs-sql.txt b/Documentation/libtracefs-sql.txt
new file mode 100644
index 000000000000..e9b3d1ddbc22
--- /dev/null
+++ b/Documentation/libtracefs-sql.txt
@@ -0,0 +1,367 @@ 
+libtracefs(3)
+=============
+
+NAME
+----
+tracefs_sql - Create a synthetitc event via an SQL statement
+
+SYNOPSIS
+--------
+[verse]
+--
+*#include <tracefs.h>*
+
+struct tracefs_synth *tracefs_sql(struct tep_handle pass:[*]tep, const char pass:[*]name,
+				  const char pass:[*]sql_buffer, char pass:[**]err);
+--
+
+DESCRIPTION
+-----------
+Synthetic events are dynamically created events that attach two existing events
+together via one or more matching fields between the two events. It can be used
+to find the latency between the events, or to simply pass fields of the first event
+on to the second event to display as one event.
+
+The Linux kernel interface to create synthetic events is complex, and there needs
+to be a better way to create synthetic events that is easy and can be understood
+via existing technology.
+
+If you think of each event as a table, where the fields are the column of the table
+and each instance of the event as a row, you can understand how SQL can be used
+to attach two events together and for another event (table). Utilizing the
+SQL *SELECT* *FROM* *JOIN* *ON* [ *WHERE* ] syntax, a synthetic event can easily
+be created from two different events.
+
+
+*tracefs_sql*() takes in a *tep* handler (See _tep_local_events_(3)) that is used to
+verify the events within the _sql_buffer_ expression. The _name_ is the name of the
+synthetic event to create. If _err_ points to an address of a string, it will be filled
+with a detailed message on any type of parsing error, including fields that do not belong
+to an event, or if the events or fields are not properly compared.
+
+The example program below is a fully functional parser where it will create a synthetic
+event from a SQL syntax passed in via the command line or a file. 
+
+The SQL format is as follows:
+
+*SELECT* <fields> FROM <start-event> JOIN <end-event> ON <matching-fields> WHERE <filter>
+
+Note, although the examples show the SQL commands in uppercase, they are not required to
+be so. That is, you can use "SELECT" or "select" or "sElEct".
+
+For example:
+[source,c]
+--
+SELECT syscalls.sys_enter_read.fd, syscalls.sys_exit_read.ret FROM syscalls.sys_enter_read
+   JOIN syscalls.sys_exit_read
+   ON syscalls.sys_enter_read.common_pid = syscalls.sys_exit_write.common_pid
+--
+
+Will create a synthetic event that with the fields:
+
+  u64 fd; s64 ret;
+
+Because the function takes a _tep_ handle, and usually all event names are unique, you can
+leave off the system (group) name of the event, and *tracefs_sql*() will discover the
+system for you.
+
+That is, the above statement would work with:
+
+[source,c]
+--
+SELECT sys_enter_read.fd, sys_exit_read.ret FROM sys_enter_read JOIN sys_exit_read
+   ON sys_enter_read.common_pid = sys_exit_write.common_pid
+--
+
+The *AS* keyword can be used to name the fields as well as to give an alias to the
+events, such that the above can be simplified even more as:
+
+[source,c]
+--
+SELECT start.fd, end.ret FROM sys_enter_read AS start JOIN sys_exit_read AS end ON start.common_pid = end.common_pid
+--
+
+The above aliases _sys_enter_read_ as *start* and _sys_exit_read_ as *end* and uses
+those aliases to reference the event throughout the statement.
+
+Using the *AS* keyword in the selection portion of the SQL statement will define what
+those fields will be called in the synthetic event.
+
+[source,c]
+--
+SELECT start.fd AS filed, end.ret AS return FROM sys_enter_read AS start JOIN sys_exit_read AS end
+   ON start.common_pid = end.common_pid
+--
+
+The above labels the _fd_ of _start_ as *filed* and the _ret_ of _end_ as *return* where
+the synthetic event that is created will now have the fields:
+
+  u64 filed; s64 return;
+
+The fields can also be calculated with results passed to the synthetic event:
+
+[source,c]
+--
+select start.truesize, end.len, (start.truesize - end.len) as diff from napi_gro_receive_entry as start
+   JOIN netif_receive_skb as end ON start.skbaddr = end.skbaddr
+--
+
+Which would show the *truesize" of the _napi_gro_receive_entry_ event, the actual
+_len_ of the content, shown by the _netif_receive_skb_, and alse the delta between
+the two and expressed by the field *diff*.
+
+The code also supports recording the timestamps at either event, and performing calculations
+on them. For wakeup latency, you have:
+
+[source,c]
+--
+select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
+   JOIN sched_switch as end ON start.pid = end.next_pid
+--
+
+The above will create a synthetic event that records the _pid_ of the task being woken up,
+and the time difference between the _sched_waking_ event and the _sched_switch_ event.
+The *TIMESTAMP_USECS* will truncate the time down to microseconds as the timestamp usually
+recorded in the tracing buffer has nanosecond resolution. If you do not want that
+truncation, use *TIMESTAMP* instead of *TIMESTAMP_USECS*.
+
+Finally, the *WHERE* clause can be added, that will let you add filters on either or both events.
+
+[source,c]
+--
+select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
+   JOIN sched_switch as end ON start.pid = end.next_pid
+   WHERE start.prio < 100 && (!(end.prev_pid < 1 || end.prev_prio > 100) || end.prev_pid == 0)
+--
+
+*NOTE*
+
+Although both events can be used together in the *WHERE* clause, they must not be mixed outside
+the top most "&&" statements. You can not OR (||) the events together, where a filter of one
+event is OR'd to a filter of the other event. This does not make sense, as the synthetic event
+requires both events to take place to be recorded. If one is filtered out, then the synthetic
+event does not execute.
+
+[source,c]
+--
+select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
+   JOIN sched_switch as end ON start.pid = end.next_pid
+   WHERE start.prio < 100 && end.prev_prio < 100
+--
+
+The above is valid.
+
+Where as the below is not.
+
+[source,c]
+--
+select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
+   JOIN sched_switch as end ON start.pid = end.next_pid
+   WHERE start.prio < 100 || end.prev_prio < 100
+--
+
+
+RETURN VALUE
+------------
+Returns 0 on success and -1 on failure. On failure, if _err_ is defined, it will be
+allocated to hold a detailed description of what went wrong if it the error was caused
+by a parsing error, or that an event, field does not exist or is not compatible with
+what it was combined with.
+
+EXAMPLE
+-------
+[source,c]
+--
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <tracefs.h>
+
+static void usage(char **argv)
+{
+	fprintf(stderr, "usage: %s [-ed][-n name][-t dir][-f file | sql-command-line]\n"
+		"  -n name - name of synthetic event 'Anonymous' if left off\n"
+		"  -t dir - use dir instead of /sys/kernel/tracing\n"
+		"  -e - execute the commands to create the synthetic event\n"
+		"  -d - delete the synthetic event that would be created\n"
+		"  -f file - read sql lines from file otherwise from the command line\n"
+		"            if file is '-' then read from standard input.\n",
+		argv[0]);
+	exit(-1);
+}
+
+static int do_sql(const char *buffer, const char *name,
+		  const char *trace_dir, bool execute)
+{
+	struct tracefs_synth *synth;
+	struct tep_handle *tep;
+	struct trace_seq seq;
+	char *err;
+
+	if (!name)
+		name = "Anonymous";
+
+	trace_seq_init(&seq);
+	tep = tracefs_local_events(trace_dir);
+	if (!tep) {
+		if (!trace_dir)
+			trace_dir = "tracefs directory";
+		perror(trace_dir);
+		exit(-1);
+	}
+
+	synth = tracefs_sql(tep, name, buffer, &err);
+	if (!synth) {
+		perror("Failed creating synthetic event!");
+		if (err)
+			fprintf(stderr, "%s", err);
+		exit(-1);
+	}
+
+	tracefs_synth_show(&seq, NULL, synth);
+	if (execute)
+		tracefs_synth_create(NULL, synth);
+	tracefs_synth_free(synth);
+
+	trace_seq_do_printf(&seq);
+	trace_seq_destroy(&seq);
+	return 0;
+}
+
+int main (int argc, char **argv)
+{
+	char *trace_dir = NULL;
+	char *buffer = NULL;
+	char buf[BUFSIZ];
+	int buffer_size = 0;
+	const char *file = NULL;
+	bool execute = false;
+	const char *name;
+	FILE *fp;
+	size_t r;
+	int c;
+	int i;
+
+	for (;;) {
+		c = getopt(argc, argv, "ht:f:edn:");
+		if (c == -1)
+			break;
+
+		switch(c) {
+		case 'h':
+			usage(argv);
+		case 't':
+			trace_dir = optarg;
+			break;
+		case 'f':
+			file = optarg;
+			break;
+		case 'e':
+			execute = true;
+			break;
+		case 'n':
+			name = optarg;
+			break;
+		}
+	}
+
+	if (file) {
+		if (!strcmp(file, "-"))
+			fp = stdin;
+		else
+			fp = fopen(file, "r");
+		if (!fp) {
+			perror(file);
+			exit(-1);
+		}
+		while ((r = fread(buf, 1, BUFSIZ, fp)) > 0) {
+			buffer = realloc(buffer, buffer_size + r + 1);
+			strncpy(buffer + buffer_size, buf, r);
+			buffer_size += r;
+		}
+		fclose(fp);
+		if (buffer_size)
+			buffer[buffer_size] = '\0';
+	} else if (argc == optind) {
+		usage(argv);
+	} else {
+		for (i = optind; i < argc; i++) {
+			r = strlen(argv[i]);
+			buffer = realloc(buffer, buffer_size + r + 2);
+			if (i != optind)
+				buffer[buffer_size++] = ' ';
+			strcpy(buffer + buffer_size, argv[i]);
+			buffer_size += r;
+		}
+	}
+
+	do_sql(buffer, name, trace_dir, execute);
+	free(buffer);
+
+	return 0;
+}
+--
+
+FILES
+-----
+[verse]
+--
+*tracefs.h*
+	Header file to include in order to have access to the library APIs.
+*-ltracefs*
+	Linker switch to add when building a program that uses the library.
+--
+
+SEE ALSO
+--------
+_libtracefs(3)_,
+_libtraceevent(3)_,
+_trace-cmd(1)_,
+_tracefs_synth_init(3)_,
+_tracefs_synth_add_match_field(3)_,
+_tracefs_synth_add_compare_field(3)_,
+_tracefs_synth_add_start_field(3)_,
+_tracefs_synth_add_end_field(3)_,
+_tracefs_synth_append_start_filter(3)_,
+_tracefs_synth_append_end_filter(3)_,
+_tracefs_synth_create(3)_,
+_tracefs_synth_destroy(3)_,
+_tracefs_synth_free(3)_,
+_tracefs_synth_show(3)_,
+_tracefs_hist_alloc(3)_,
+_tracefs_hist_free(3)_,
+_tracefs_hist_add_key(3)_,
+_tracefs_hist_add_value(3)_,
+_tracefs_hist_add_name(3)_,
+_tracefs_hist_start(3)_,
+_tracefs_hist_destory(3)_,
+_tracefs_hist_add_sort_key(3)_,
+_tracefs_hist_sort_key_direction(3)_
+
+AUTHOR
+------
+[verse]
+--
+*Steven Rostedt* <rostedt@goodmis.org>
+*Tzvetomir Stoyanov* <tz.stoyanov@gmail.com>
+*sameeruddin shaik* <sameeruddin.shaik8@gmail.com>
+--
+REPORTING BUGS
+--------------
+Report bugs to  <linux-trace-devel@vger.kernel.org>
+
+LICENSE
+-------
+libtracefs is Free Software licensed under the GNU LGPL 2.1
+
+RESOURCES
+---------
+https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/
+
+COPYING
+-------
+Copyright \(C) 2020 VMware, Inc. Free use of this software is granted under
+the terms of the GNU Public License (GPL).