diff mbox series

[nf-next,v4,1/5] net: sched: Micro-optimize egress handling

Message ID a2a8af1622dff2bfd51d446aa8da2c1d2f6f543c.1611304190.git.lukas@wunner.de
State New
Headers show
Series Netfilter egress hook | expand

Commit Message

Lukas Wunner Jan. 22, 2021, 8:47 a.m. UTC
sch_handle_egress() returns either the skb or NULL to signal to its
caller __dev_queue_xmit() whether a packet should continue to be
processed.

The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a
NULL pointer deref right at its top.

But the compiler doesn't know that.  So if sch_handle_egress() signals
success by returning the skb, the "if (!skb) goto out;" statement
results in a gratuitous NULL pointer check in the Assembler output.

Avoid by telling the compiler that __dev_queue_xmit() is never passed a
NULL skb.  This also eliminates another gratuitous NULL pointer check in
__dev_queue_xmit()
  qdisc_pkt_len_init()
    skb_header_pointer()
      __skb_header_pointer()

The speedup is barely measurable:
Before: 1877 1875 1878 1874 1882 1873 Mb/sec
After:  1877 1877 1880 1883 1888 1886 Mb/sec

However we're about to add a netfilter egress hook to __dev_queue_xmit()
and without the micro-optimization, it will result in a performance
degradation which is indeed measurable:
With netfilter hook:               1853 1852 1850 1848 1849 1851 Mb/sec
With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec

The performance degradation is caused by a JNE instruction ("if (skb)")
being flipped to a JE instruction ("if (!skb)") once the netfilter hook
is added.  The micro-optimization removes the test and jump instructions
altogether.

Measurements were performed on a Core i7-3615QM.  Reproducer:
ip link add dev foo type dummy
ip link set dev foo up
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jakub Kicinski Jan. 24, 2021, 3:26 a.m. UTC | #1
On Fri, 22 Jan 2021 09:47:01 +0100 Lukas Wunner wrote:
> sch_handle_egress() returns either the skb or NULL to signal to its

> caller __dev_queue_xmit() whether a packet should continue to be

> processed.

> 

> The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> NULL pointer deref right at its top.

> 

> But the compiler doesn't know that.  So if sch_handle_egress() signals

> success by returning the skb, the "if (!skb) goto out;" statement

> results in a gratuitous NULL pointer check in the Assembler output.


Which exact compiler are we talking about it? Did you report this?
As Eric pointed the compiler should be able to figure this out quite
easily.

> Avoid by telling the compiler that __dev_queue_xmit() is never passed a

> NULL skb.  This also eliminates another gratuitous NULL pointer check in

> __dev_queue_xmit()

>   qdisc_pkt_len_init()

>     skb_header_pointer()

>       __skb_header_pointer()

> 

> The speedup is barely measurable:

> Before: 1877 1875 1878 1874 1882 1873 Mb/sec

> After:  1877 1877 1880 1883 1888 1886 Mb/sec

> 

> However we're about to add a netfilter egress hook to __dev_queue_xmit()

> and without the micro-optimization, it will result in a performance

> degradation which is indeed measurable:

> With netfilter hook:               1853 1852 1850 1848 1849 1851 Mb/sec

> With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec

> 

> The performance degradation is caused by a JNE instruction ("if (skb)")

> being flipped to a JE instruction ("if (!skb)") once the netfilter hook

> is added.  The micro-optimization removes the test and jump instructions

> altogether.

> 

> Measurements were performed on a Core i7-3615QM.  Reproducer:

> ip link add dev foo type dummy

> ip link set dev foo up

> tc qdisc add dev foo clsact

> tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'

> modprobe pktgen

> echo "add_device foo" > /proc/net/pktgen/kpktgend_3

> samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

> 

> Signed-off-by: Lukas Wunner <lukas@wunner.de>

> Cc: John Fastabend <john.fastabend@gmail.com>

> Cc: Daniel Borkmann <daniel@iogearbox.net>

> Cc: Alexei Starovoitov <ast@kernel.org>

> Cc: Eric Dumazet <edumazet@google.com>

> Cc: Thomas Graf <tgraf@suug.ch>

> ---

>  net/core/dev.c | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/net/core/dev.c b/net/core/dev.c

> index 7afbb642e203..4c16b9932823 100644

> --- a/net/core/dev.c

> +++ b/net/core/dev.c

> @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,

>   *      the BH enable code must have IRQs enabled so that it will not deadlock.

>   *          --BLG

>   */

> +__attribute__((nonnull(1)))

>  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)

>  {

>  	struct net_device *dev = skb->dev;
Lukas Wunner Jan. 24, 2021, 10:33 a.m. UTC | #2
On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote:
> On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote:

> > sch_handle_egress() returns either the skb or NULL to signal to its

> > caller __dev_queue_xmit() whether a packet should continue to be

> > processed.

> >

> > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> > NULL pointer deref right at its top.

> >

> > But the compiler doesn't know that.  So if sch_handle_egress() signals

> > success by returning the skb, the "if (!skb) goto out;" statement

> > results in a gratuitous NULL pointer check in the Assembler output.

> >

> > Avoid by telling the compiler that __dev_queue_xmit() is never passed a

> > NULL skb.

[...]
> > we're about to add a netfilter egress hook to __dev_queue_xmit()

> > and without the micro-optimization, it will result in a performance

> > degradation which is indeed measurable:

[...]
> > --- a/net/core/dev.c

> > +++ b/net/core/dev.c

> > +__attribute__((nonnull(1)))

> >  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)

> >  {

> >         struct net_device *dev = skb->dev;

> 

> It is a bit sad the compilers do not automatically get this knowledge

> from the very first instruction :

> 

>  struct net_device *dev = skb->dev;


The compiler (gcc) is capable of doing that, but the feature was disabled by:

    commit a3ca86aea507904148870946d599e07a340b39bf
    Author: Eugene Teo <eteo@redhat.com>
    Date:   Wed Jul 15 14:59:10 2009 +0800
    
    Add '-fno-delete-null-pointer-checks' to gcc CFLAGS

If -fno-delete-null-pointer-checks is dropped from the top-level Makefile
then the gratuitous NULL pointer checks disappear from the Assembler output,
obviating the need to litter hot paths with __attribute__((nonnull(1)))
annotations.

Taking a closer look at that commit, its rationale appears questionable:
It says that broken code such as ...

	struct agnx_priv *priv = dev->priv;

	if (!dev)
		return;

... would result in the NULL pointer check being optimized away.
The commit message claims that keeping the NULL pointer check in
"makes it harder to abuse" the broken code.

I don't see how that's the case:  If dev is NULL, the NULL pointer
dereference at the function's top causes termination of the task
in kernel/exit.c:do_exit().  So the NULL pointer check is never
reached by the task.  If on the other hand dev is non-NULL,
the task isn't terminated but then the NULL pointer check is
unnecessary as well.

So the point of the commit remains elusive to me.  I could submit
an RFC patch which drops -fno-delete-null-pointer-checks and see
if any security folks cry foul.  Thoughts?

Thanks,

Lukas
Lukas Wunner Jan. 24, 2021, 10:46 a.m. UTC | #3
On Sat, Jan 23, 2021 at 07:26:24PM -0800, Jakub Kicinski wrote:
> On Fri, 22 Jan 2021 09:47:01 +0100 Lukas Wunner wrote:

> > sch_handle_egress() returns either the skb or NULL to signal to its

> > caller __dev_queue_xmit() whether a packet should continue to be

> > processed.

> > 

> > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> > NULL pointer deref right at its top.

> > 

> > But the compiler doesn't know that.  So if sch_handle_egress() signals

> > success by returning the skb, the "if (!skb) goto out;" statement

> > results in a gratuitous NULL pointer check in the Assembler output.

> 

> Which exact compiler are we talking about it? Did you report this?

> As Eric pointed the compiler should be able to figure this out quite

> easily.


I tested with gcc 8, 9, 10.

No need to report as it's the expected behavior with
-fno-delete-null-pointer-checks, whose motivation appears
questionable though (per my preceding e-mail).

Thanks,

Lukas
Jakub Kicinski Jan. 25, 2021, 7:39 p.m. UTC | #4
On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote:
> On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote:

> > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote:  

> > > sch_handle_egress() returns either the skb or NULL to signal to its

> > > caller __dev_queue_xmit() whether a packet should continue to be

> > > processed.

> > >

> > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> > > NULL pointer deref right at its top.

> > >

> > > But the compiler doesn't know that.  So if sch_handle_egress() signals

> > > success by returning the skb, the "if (!skb) goto out;" statement

> > > results in a gratuitous NULL pointer check in the Assembler output.

> > >

> > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a

> > > NULL skb.  

> [...]

> > > we're about to add a netfilter egress hook to __dev_queue_xmit()

> > > and without the micro-optimization, it will result in a performance

> > > degradation which is indeed measurable:  

> [...]

> > > --- a/net/core/dev.c

> > > +++ b/net/core/dev.c

> > > +__attribute__((nonnull(1)))

> > >  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)

> > >  {

> > >         struct net_device *dev = skb->dev;  

> > 

> > It is a bit sad the compilers do not automatically get this knowledge

> > from the very first instruction :

> > 

> >  struct net_device *dev = skb->dev;  

> 

> The compiler (gcc) is capable of doing that, but the feature was disabled by:

> 

>     commit a3ca86aea507904148870946d599e07a340b39bf

>     Author: Eugene Teo <eteo@redhat.com>

>     Date:   Wed Jul 15 14:59:10 2009 +0800

>     

>     Add '-fno-delete-null-pointer-checks' to gcc CFLAGS

> 

> If -fno-delete-null-pointer-checks is dropped from the top-level Makefile

> then the gratuitous NULL pointer checks disappear from the Assembler output,

> obviating the need to litter hot paths with __attribute__((nonnull(1)))

> annotations.

> 

> Taking a closer look at that commit, its rationale appears questionable:

> It says that broken code such as ...

> 

> 	struct agnx_priv *priv = dev->priv;

> 

> 	if (!dev)

> 		return;

> 

> ... would result in the NULL pointer check being optimized away.

> The commit message claims that keeping the NULL pointer check in

> "makes it harder to abuse" the broken code.

> 

> I don't see how that's the case:  If dev is NULL, the NULL pointer

> dereference at the function's top causes termination of the task

> in kernel/exit.c:do_exit().  So the NULL pointer check is never

> reached by the task.  If on the other hand dev is non-NULL,

> the task isn't terminated but then the NULL pointer check is

> unnecessary as well.

> 

> So the point of the commit remains elusive to me.  I could submit

> an RFC patch which drops -fno-delete-null-pointer-checks and see

> if any security folks cry foul.  Thoughts?


I wonder if modern compilers can't simply warn about this particular
case. Not to mention our static checkers..


Dan, do you think the concern from the above-quoted commit is still
valid? Is this something that smatch flags these days? We're apparently
paying a real performance price in networking for tying compiler's hands
with -fno-delete-null-pointer-checks
Dan Carpenter Jan. 26, 2021, 8:58 a.m. UTC | #5
On Mon, Jan 25, 2021 at 11:39:08AM -0800, Jakub Kicinski wrote:
> On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote:

> > On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote:

> > > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote:  

> > > > sch_handle_egress() returns either the skb or NULL to signal to its

> > > > caller __dev_queue_xmit() whether a packet should continue to be

> > > > processed.

> > > >

> > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> > > > NULL pointer deref right at its top.

> > > >

> > > > But the compiler doesn't know that.  So if sch_handle_egress() signals

> > > > success by returning the skb, the "if (!skb) goto out;" statement

> > > > results in a gratuitous NULL pointer check in the Assembler output.

> > > >

> > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a

> > > > NULL skb.  

> > [...]

> > > > we're about to add a netfilter egress hook to __dev_queue_xmit()

> > > > and without the micro-optimization, it will result in a performance

> > > > degradation which is indeed measurable:  

> > [...]

> > > > --- a/net/core/dev.c

> > > > +++ b/net/core/dev.c

> > > > +__attribute__((nonnull(1)))

> > > >  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)

> > > >  {

> > > >         struct net_device *dev = skb->dev;  

> > > 

> > > It is a bit sad the compilers do not automatically get this knowledge

> > > from the very first instruction :

> > > 

> > >  struct net_device *dev = skb->dev;  

> > 

> > The compiler (gcc) is capable of doing that, but the feature was disabled by:

> > 

> >     commit a3ca86aea507904148870946d599e07a340b39bf

> >     Author: Eugene Teo <eteo@redhat.com>

> >     Date:   Wed Jul 15 14:59:10 2009 +0800

> >     

> >     Add '-fno-delete-null-pointer-checks' to gcc CFLAGS

> > 

> > If -fno-delete-null-pointer-checks is dropped from the top-level Makefile

> > then the gratuitous NULL pointer checks disappear from the Assembler output,

> > obviating the need to litter hot paths with __attribute__((nonnull(1)))

> > annotations.

> > 

> > Taking a closer look at that commit, its rationale appears questionable:

> > It says that broken code such as ...

> > 

> > 	struct agnx_priv *priv = dev->priv;

> > 

> > 	if (!dev)

> > 		return;

> > 

> > ... would result in the NULL pointer check being optimized away.

> > The commit message claims that keeping the NULL pointer check in

> > "makes it harder to abuse" the broken code.

> > 

> > I don't see how that's the case:  If dev is NULL, the NULL pointer

> > dereference at the function's top causes termination of the task

> > in kernel/exit.c:do_exit().  So the NULL pointer check is never

> > reached by the task.  If on the other hand dev is non-NULL,

> > the task isn't terminated but then the NULL pointer check is

> > unnecessary as well.

> > 

> > So the point of the commit remains elusive to me.  I could submit

> > an RFC patch which drops -fno-delete-null-pointer-checks and see

> > if any security folks cry foul.  Thoughts?

> 


This was a famous tun.c bug back in the day.  In those days we weren't
careful to disallow remapping NULL to a different pointer.  See
/proc/sys/vm/mmap_min_addr.  The exploit was to remap NULL to be a valid
user controlled pointer.  It should have been impossible to exploit
because the code had a check for NULL, but the compiler optimized it
away.

https://lwn.net/Articles/342330/

> I wonder if modern compilers can't simply warn about this particular

> case. Not to mention our static checkers..

> 

> 

> Dan, do you think the concern from the above-quoted commit is still

> valid? Is this something that smatch flags these days? We're apparently

> paying a real performance price in networking for tying compiler's hands

> with -fno-delete-null-pointer-checks


If I had to guess why GCC doesn't warn about this I would say that
probably it's because a lot of macros have NULL checks built in.  Most
static analysis tools have a warning about inconsistent NULL checks but
Smatch won't warn about it unless it can lead to a NULL dereference.
The fact that pointless NULL checks slow down the code has never
bothered anyone up to now.

regards,
dan carpenter
Lukas Wunner Jan. 30, 2021, 4 p.m. UTC | #6
On Tue, Jan 26, 2021 at 11:58:17AM +0300, Dan Carpenter wrote:
> On Mon, Jan 25, 2021 at 11:39:08AM -0800, Jakub Kicinski wrote:

> > On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote:

> > > On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote:

> > > > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote:  

> > > > > sch_handle_egress() returns either the skb or NULL to signal to its

> > > > > caller __dev_queue_xmit() whether a packet should continue to be

> > > > > processed.

> > > > >

> > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a

> > > > > NULL pointer deref right at its top.

> > > > >

> > > > > But the compiler doesn't know that.  So if sch_handle_egress() signals

> > > > > success by returning the skb, the "if (!skb) goto out;" statement

> > > > > results in a gratuitous NULL pointer check in the Assembler output.

> > > > >

> > > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a

> > > > > NULL skb.  

> > > [...]

> > > > > we're about to add a netfilter egress hook to __dev_queue_xmit()

> > > > > and without the micro-optimization, it will result in a performance

> > > > > degradation which is indeed measurable:  

> > > [...]

> > > > > --- a/net/core/dev.c

> > > > > +++ b/net/core/dev.c

> > > > > +__attribute__((nonnull(1)))

> > > > >  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)

> > > > >  {

> > > > >         struct net_device *dev = skb->dev;  

> > > > 

> > > > It is a bit sad the compilers do not automatically get this knowledge

> > > > from the very first instruction :

> > > > 

> > > >  struct net_device *dev = skb->dev;  

> > > 

> > > The compiler (gcc) is capable of doing that, but the feature was disabled by:

> > > 

> > >     commit a3ca86aea507904148870946d599e07a340b39bf

> > >     Author: Eugene Teo <eteo@redhat.com>

> > >     Date:   Wed Jul 15 14:59:10 2009 +0800

> > >     

> > >     Add '-fno-delete-null-pointer-checks' to gcc CFLAGS

> > > 

> > > If -fno-delete-null-pointer-checks is dropped from the top-level Makefile

> > > then the gratuitous NULL pointer checks disappear from the Assembler output,

> > > obviating the need to litter hot paths with __attribute__((nonnull(1)))

> > > annotations.

> > > 

> > > Taking a closer look at that commit, its rationale appears questionable:

> > > It says that broken code such as ...

> > > 

> > > 	struct agnx_priv *priv = dev->priv;

> > > 

> > > 	if (!dev)

> > > 		return;

> > > 

> > > ... would result in the NULL pointer check being optimized away.

> > > The commit message claims that keeping the NULL pointer check in

> > > "makes it harder to abuse" the broken code.

> > > 

> > > I don't see how that's the case:  If dev is NULL, the NULL pointer

> > > dereference at the function's top causes termination of the task

> > > in kernel/exit.c:do_exit().  So the NULL pointer check is never

> > > reached by the task.  If on the other hand dev is non-NULL,

> > > the task isn't terminated but then the NULL pointer check is

> > > unnecessary as well.

> > > 

> > > So the point of the commit remains elusive to me.  I could submit

> > > an RFC patch which drops -fno-delete-null-pointer-checks and see

> > > if any security folks cry foul.  Thoughts?

> 

> This was a famous tun.c bug back in the day.  In those days we weren't

> careful to disallow remapping NULL to a different pointer.  See

> /proc/sys/vm/mmap_min_addr.  The exploit was to remap NULL to be a valid

> user controlled pointer.  It should have been impossible to exploit

> because the code had a check for NULL, but the compiler optimized it

> away.

> 

> https://lwn.net/Articles/342330/


Thanks Dan, that's valuable information.  I'll add it to the commit message
when respinning the series.

So to summarize, keeping seemingly useless NULL pointer checks in
is a mitigation in case accesses to the zero page don't result in
an exception because an attacker managed to map something to that
page.

I'm assuming for now that selectively dropping those NULL pointer
checks from hotpaths is acceptable if otherwise a measurable
performance degradation occurs.

Maintainers please speak up if you disagree.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 7afbb642e203..4c16b9932823 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4072,6 +4072,7 @@  struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
  *      the BH enable code must have IRQs enabled so that it will not deadlock.
  *          --BLG
  */
+__attribute__((nonnull(1)))
 static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 {
 	struct net_device *dev = skb->dev;