diff mbox series

[RESEND,net-next] netlink: add tracepoint at NL_SET_ERR_MSG

Message ID fb6e25a4833e6a0e055633092b05bae3c6e1c0d3.1611934253.git.marcelo.leitner@gmail.com
State New
Headers show
Series [RESEND,net-next] netlink: add tracepoint at NL_SET_ERR_MSG | expand

Commit Message

Marcelo Ricardo Leitner Feb. 1, 2021, 6:12 p.m. UTC
Often userspace won't request the extack information, or they don't log it
because of log level or so, and even when they do, sometimes it's not
enough to know exactly what caused the error.

Netlink extack is the standard way of reporting erros with descriptive
error messages. With a trace point on it, we then can know exactly where
the error happened, regardless of userspace app. Also, we can even see if
the err msg was overwritten.

The wrapper do_trace_netlink_extack() is because trace points shouldn't be
called from .h files, as trace points are not that small, and the function
call to do_trace_netlink_extack() on the macros is not protected by
tracepoint_enabled() because the macros are called from modules, and this
would require exporting some trace structs. As this is error path, it's
better to export just the wrapper instead.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/linux/netlink.h        |  8 ++++++++
 include/trace/events/netlink.h | 29 +++++++++++++++++++++++++++++
 net/netlink/af_netlink.c       |  8 ++++++++
 3 files changed, 45 insertions(+)
 create mode 100644 include/trace/events/netlink.h

Comments

Jakub Kicinski Feb. 2, 2021, 1:34 a.m. UTC | #1
On Mon,  1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote:
> Often userspace won't request the extack information, or they don't log it

> because of log level or so, and even when they do, sometimes it's not

> enough to know exactly what caused the error.

> 

> Netlink extack is the standard way of reporting erros with descriptive

> error messages. With a trace point on it, we then can know exactly where

> the error happened, regardless of userspace app. Also, we can even see if

> the err msg was overwritten.

> 

> The wrapper do_trace_netlink_extack() is because trace points shouldn't be

> called from .h files, as trace points are not that small, and the function

> call to do_trace_netlink_extack() on the macros is not protected by

> tracepoint_enabled() because the macros are called from modules, and this

> would require exporting some trace structs. As this is error path, it's

> better to export just the wrapper instead.

> 

> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


Did you measure the allyesconfig bloat from this?

How valuable is it to have the tracepoint in at the time it's set?
IIRC extack is passed in to the callbacks from the netlink core, it's
just not reported to user space if not requested. So we could capture
the message in af_netlink.c, no?
Marcelo Ricardo Leitner Feb. 2, 2021, 12:30 p.m. UTC | #2
On Mon, Feb 01, 2021 at 05:34:00PM -0800, Jakub Kicinski wrote:
> On Mon,  1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote:

> > Often userspace won't request the extack information, or they don't log it

> > because of log level or so, and even when they do, sometimes it's not

> > enough to know exactly what caused the error.

> > 

> > Netlink extack is the standard way of reporting erros with descriptive

> > error messages. With a trace point on it, we then can know exactly where

> > the error happened, regardless of userspace app. Also, we can even see if

> > the err msg was overwritten.

> > 

> > The wrapper do_trace_netlink_extack() is because trace points shouldn't be

> > called from .h files, as trace points are not that small, and the function

> > call to do_trace_netlink_extack() on the macros is not protected by

> > tracepoint_enabled() because the macros are called from modules, and this

> > would require exporting some trace structs. As this is error path, it's

> > better to export just the wrapper instead.

> > 

> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> 

> Did you measure the allyesconfig bloat from this?


Now I did:

$ ./scripts/bloat-o-meter -t out/{orig,new}/vmlinux
...
Total: Before=212077464, After=212108056, chg +0.01%

$ size out/{orig,new}/vmlinux
   text    data     bss     dec     hex filename
267409181       333328965       83018348        683756494       28c14bce      out/orig/vmlinux
267413171       333337273       83010156        683760600       28c15bd8      out/new/vmlinux

with the commit on top of 46eb3c108fe1744d0a6abfda69ef8c1d4f0e92d4.
It's not much because it's adding just a function call to the macro,
rather than the tracepoint itself.

> How valuable is it to have the tracepoint in at the time it's set?


To know exactly its source. It's very helpful to track down some
EINVALs reported to userspace. If not, we have to rely on the errmsg
and grep the code afterwards.

Also, if the message is a common one, one may not be able to easily
distinguish them. Ideally this shouldn't happen, but when debugging
applications such as OVS, where lots of netlink requests are flying,
it saves us time. I can, for example, look at a perf capture and
search for cls_flower or so. Otherwise, it will all show up as
"af_netlink: <err_msg>"

Also, it allows tracking when a previous errmsg (which would have been
a warning) is overwritten with a new one.

> IIRC extack is passed in to the callbacks from the netlink core, it's

> just not reported to user space if not requested. So we could capture

> the message in af_netlink.c, no?


If 4k is too much, yes. It looses practicality, per above, but should
be doable.

  Marcelo
David Ahern Feb. 2, 2021, 4:16 p.m. UTC | #3
On 2/2/21 5:30 AM, Marcelo Ricardo Leitner wrote:
> 

> Also, if the message is a common one, one may not be able to easily

> distinguish them. Ideally this shouldn't happen, but when debugging

> applications such as OVS, where lots of netlink requests are flying,

> it saves us time. I can, for example, look at a perf capture and

> search for cls_flower or so. Otherwise, it will all show up as

> "af_netlink: <err_msg>"


Modules should be using the NL_SET_ERR_MSG_MOD variant, so the message
would be ""af_netlink: cls_flower: <err_msg>"

I get the value in knowing the call site, so not arguing against that.
Just hoping that your experience matches theory.
Marcelo Ricardo Leitner Feb. 2, 2021, 7:30 p.m. UTC | #4
On Tue, Feb 02, 2021 at 09:16:28AM -0700, David Ahern wrote:
> On 2/2/21 5:30 AM, Marcelo Ricardo Leitner wrote:

> > 

> > Also, if the message is a common one, one may not be able to easily

> > distinguish them. Ideally this shouldn't happen, but when debugging

> > applications such as OVS, where lots of netlink requests are flying,

> > it saves us time. I can, for example, look at a perf capture and

> > search for cls_flower or so. Otherwise, it will all show up as

> > "af_netlink: <err_msg>"

> 

> Modules should be using the NL_SET_ERR_MSG_MOD variant, so the message

> would be ""af_netlink: cls_flower: <err_msg>"


Ah, right. They don't always do, though (and that probably should be
fixed). Also, currently there is no _MOD variant for NL_SET_ERR_MSG_ATTR.

For example:
$ git grep NL_SET_ERR_MSG -- cls_flower.c
cls_flower.c:                   NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
cls_flower.c:           NL_SET_ERR_MSG_ATTR(extack,
cls_flower.c:           NL_SET_ERR_MSG_ATTR(extack,
cls_flower.c:           NL_SET_ERR_MSG(extack, "Missing MPLS option \"depth\"");
...

> 

> I get the value in knowing the call site, so not arguing against that.

> Just hoping that your experience matches theory.


Okay.

Thanks,
Marcelo
Jakub Kicinski Feb. 3, 2021, 10:31 p.m. UTC | #5
On Mon,  1 Feb 2021 15:12:19 -0300 Marcelo Ricardo Leitner wrote:
> Often userspace won't request the extack information, or they don't log it

> because of log level or so, and even when they do, sometimes it's not

> enough to know exactly what caused the error.

> 

> Netlink extack is the standard way of reporting erros with descriptive

> error messages. With a trace point on it, we then can know exactly where

> the error happened, regardless of userspace app. Also, we can even see if

> the err msg was overwritten.

> 

> The wrapper do_trace_netlink_extack() is because trace points shouldn't be

> called from .h files, as trace points are not that small, and the function

> call to do_trace_netlink_extack() on the macros is not protected by

> tracepoint_enabled() because the macros are called from modules, and this

> would require exporting some trace structs. As this is error path, it's

> better to export just the wrapper instead.

> 

> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


> --- a/include/linux/netlink.h

> +++ b/include/linux/netlink.h

> @@ -6,11 +6,15 @@

>  #include <linux/capability.h>

>  #include <linux/skbuff.h>

>  #include <linux/export.h>

> +#include <linux/tracepoint.h>


Do we need the include...

>  #include <net/scm.h>

>  #include <uapi/linux/netlink.h>

>  

>  struct net;

>  

> +DECLARE_TRACEPOINT(netlink_extack);


... and the declaration? Seems to be a leftover.

> +void do_trace_netlink_extack(const char *msg);

> +

>  static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)

>  {

>  	return (struct nlmsghdr *)skb->data;

> @@ -90,6 +94,8 @@ struct netlink_ext_ack {

>  	static const char __msg[] = msg;		\

>  	struct netlink_ext_ack *__extack = (extack);	\

>  							\

> +	do_trace_netlink_extack(__msg);			\

> +							\

>  	if (__extack)					\

>  		__extack->_msg = __msg;			\

>  } while (0)

> @@ -110,6 +116,8 @@ struct netlink_ext_ack {

>  	static const char __msg[] = msg;			\

>  	struct netlink_ext_ack *__extack = (extack);		\

>  								\

> +	do_trace_netlink_extack(__msg);				\

> +								\

>  	if (__extack) {						\

>  		__extack->_msg = __msg;				\

>  		__extack->bad_attr = (attr);			\
diff mbox series

Patch

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 9f118771e24808623287d46157046749ec96a2b5..fe77f30aabfdde8882f6de99ecf633b79e903b77 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -6,11 +6,15 @@ 
 #include <linux/capability.h>
 #include <linux/skbuff.h>
 #include <linux/export.h>
+#include <linux/tracepoint.h>
 #include <net/scm.h>
 #include <uapi/linux/netlink.h>
 
 struct net;
 
+DECLARE_TRACEPOINT(netlink_extack);
+void do_trace_netlink_extack(const char *msg);
+
 static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)
 {
 	return (struct nlmsghdr *)skb->data;
@@ -90,6 +94,8 @@  struct netlink_ext_ack {
 	static const char __msg[] = msg;		\
 	struct netlink_ext_ack *__extack = (extack);	\
 							\
+	do_trace_netlink_extack(__msg);			\
+							\
 	if (__extack)					\
 		__extack->_msg = __msg;			\
 } while (0)
@@ -110,6 +116,8 @@  struct netlink_ext_ack {
 	static const char __msg[] = msg;			\
 	struct netlink_ext_ack *__extack = (extack);		\
 								\
+	do_trace_netlink_extack(__msg);				\
+								\
 	if (__extack) {						\
 		__extack->_msg = __msg;				\
 		__extack->bad_attr = (attr);			\
diff --git a/include/trace/events/netlink.h b/include/trace/events/netlink.h
new file mode 100644
index 0000000000000000000000000000000000000000..3b7be3b386a4f3976738a107fe4b7e0915ae58bb
--- /dev/null
+++ b/include/trace/events/netlink.h
@@ -0,0 +1,29 @@ 
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM netlink
+
+#if !defined(_TRACE_NETLINK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NETLINK_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(netlink_extack,
+
+	TP_PROTO(const char *msg),
+
+	TP_ARGS(msg),
+
+	TP_STRUCT__entry(
+		__string(	msg,	msg	)
+	),
+
+	TP_fast_assign(
+		__assign_str(msg, msg);
+	),
+
+	TP_printk("msg=%s", __get_str(msg))
+);
+
+#endif /* _TRACE_NETLINK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index daca50d6bb1283f3b04b585217f2aea6ba279b8b..dd488938447f9735daf1fb727c339a9874bab38b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -67,6 +67,8 @@ 
 #include <net/sock.h>
 #include <net/scm.h>
 #include <net/netlink.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/netlink.h>
 
 #include "af_netlink.h"
 
@@ -147,6 +149,12 @@  static BLOCKING_NOTIFIER_HEAD(netlink_chain);
 
 static const struct rhashtable_params netlink_rhashtable_params;
 
+void do_trace_netlink_extack(const char *msg)
+{
+	trace_netlink_extack(msg);
+}
+EXPORT_SYMBOL(do_trace_netlink_extack);
+
 static inline u32 netlink_group_mask(u32 group)
 {
 	return group ? 1 << (group - 1) : 0;