diff mbox series

[bpf-next,V1,5/6] bpf: Add MTU check for TC-BPF packets after egress hook

Message ID 160200019184.719143.17780588544420986957.stgit@firesoul
State New
Headers show
Series bpf: New approach for BPF MTU handling and enforcement | expand

Commit Message

Jesper Dangaard Brouer Oct. 6, 2020, 4:03 p.m. UTC
The MTU should only apply to transmitted packets.

When TC-ingress redirect packet to egress on another netdev, then the
normal netstack MTU checks are skipped (and driver level will not catch
any MTU violation, checked ixgbe).

This patch choose not to add MTU check in the egress code path of
skb_do_redirect() prior to calling dev_queue_xmit(), because it is still
possible to run another BPF egress program that will shrink/consume
headers, which will make packet comply with netdev MTU. This use-case
might already be in production use (if ingress MTU is larger than egress).

Instead do the MTU check after sch_handle_egress() step, for the cases
that require this.

The cases need a bit explaining. Ingress to egress redirected packets
could be detected via skb->tc_at_ingress bit, but it is not reliable,
because sch_handle_egress() could steal the packet and redirect this
(again) to another egress netdev, which will then have the
skb->tc_at_ingress cleared. There is also the case of TC-egress prog
increase packet size and then redirect it egress. Thus, it is more
reliable to do the MTU check for any redirected packet (both ingress and
egress), which is available via skb_is_redirected() in earlier patch.
Also handle case where egress BPF-prog increased size.

One advantage of this approach is that it ingress-to-egress BPF-prog can
send information via packet data. With the MTU checks removed in the
helpers, and also not done in skb_do_redirect() call, this allows for an
ingress BPF-prog to communicate with an egress BPF-prog via packet data,
as long as egress BPF-prog remove this prior to transmitting packet.

Troubleshooting: MTU violations are recorded in TX dropped counter, and
kprobe on dev_queue_xmit() have retval -EMSGSIZE.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

kernel test robot Oct. 6, 2020, 8:09 p.m. UTC | #1
Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-r026-20201006 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1127662c6dc2a276839c75a42238b11a3ad00f32)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/2065cee7d6b74c8f1dabae4e4e15999a841e3349
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
        git checkout 2065cee7d6b74c8f1dabae4e4e15999a841e3349
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   net/core/dev.c:4068:7: warning: unused variable 'mtu_check' [-Wunused-variable]
           bool mtu_check = false;
                ^
>> net/core/dev.c:4176:1: warning: unused label 'drop' [-Wunused-label]

   drop:
   ^~~~~
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress' [-Wunused-function]
   sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress' [-Wunused-function]
   static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
                     ^
   In file included from net/core/dev.c:71:
   In file included from include/linux/uaccess.h:6:
   In file included from include/linux/sched.h:14:
   In file included from include/linux/pid.h:5:
   In file included from include/linux/rculist.h:10:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:12:
   In file included from include/linux/bitops.h:29:
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   24 warnings and 14 errors generated.
--
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/core/dev.c:4176:1: warning: unused label 'drop' [-Wunused-label]

   drop:
   ^~~~~
   net/core/dev.c:4068:7: warning: unused variable 'mtu_check' [-Wunused-variable]
           bool mtu_check = false;
                ^
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress' [-Wunused-function]
   sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress' [-Wunused-function]
   static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
                     ^
   In file included from net/core/dev.c:71:
   In file included from include/linux/uaccess.h:6:
   In file included from include/linux/sched.h:14:
   In file included from include/linux/pid.h:5:
   In file included from include/linux/rculist.h:10:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:12:
   In file included from include/linux/bitops.h:29:
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   24 warnings and 14 errors generated.

vim +/drop +4176 net/core/dev.c

  4037	
  4038	/**
  4039	 *	__dev_queue_xmit - transmit a buffer
  4040	 *	@skb: buffer to transmit
  4041	 *	@sb_dev: suboordinate device used for L2 forwarding offload
  4042	 *
  4043	 *	Queue a buffer for transmission to a network device. The caller must
  4044	 *	have set the device and priority and built the buffer before calling
  4045	 *	this function. The function can be called from an interrupt.
  4046	 *
  4047	 *	A negative errno code is returned on a failure. A success does not
  4048	 *	guarantee the frame will be transmitted as it may be dropped due
  4049	 *	to congestion or traffic shaping.
  4050	 *
  4051	 * -----------------------------------------------------------------------------------
  4052	 *      I notice this method can also return errors from the queue disciplines,
  4053	 *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
  4054	 *      be positive.
  4055	 *
  4056	 *      Regardless of the return value, the skb is consumed, so it is currently
  4057	 *      difficult to retry a send to this method.  (You can bump the ref count
  4058	 *      before sending to hold a reference for retry if you are careful.)
  4059	 *
  4060	 *      When calling this method, interrupts MUST be enabled.  This is because
  4061	 *      the BH enable code must have IRQs enabled so that it will not deadlock.
  4062	 *          --BLG
  4063	 */
  4064	static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
  4065	{
  4066		struct net_device *dev = skb->dev;
  4067		struct netdev_queue *txq;
  4068		bool mtu_check = false;
  4069		bool again = false;
  4070		struct Qdisc *q;
  4071		int rc = -ENOMEM;
  4072	
  4073		skb_reset_mac_header(skb);
  4074	
  4075		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
  4076			__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
  4077	
  4078		/* Disable soft irqs for various locks below. Also
  4079		 * stops preemption for RCU.
  4080		 */
  4081		rcu_read_lock_bh();
  4082	
  4083		skb_update_prio(skb);
  4084	
  4085		qdisc_pkt_len_init(skb);
  4086	#ifdef CONFIG_NET_CLS_ACT
  4087		mtu_check = skb_is_redirected(skb);
  4088		skb->tc_at_ingress = 0;
  4089	# ifdef CONFIG_NET_EGRESS
  4090		if (static_branch_unlikely(&egress_needed_key)) {
  4091			unsigned int len_orig = skb->len;
  4092	
  4093			skb = sch_handle_egress(skb, &rc, dev);
  4094			if (!skb)
  4095				goto out;
  4096			/* BPF-prog ran and could have changed packet size beyond MTU */
  4097			if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
  4098				mtu_check = true;
  4099		}
  4100	# endif
  4101		/* MTU-check only happens on "last" net_device in a redirect sequence
  4102		 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
  4103		 * either ingress or egress to another device).
  4104		 */
  4105		if (mtu_check && !is_skb_forwardable(dev, skb)) {
  4106			rc = -EMSGSIZE;
  4107			goto drop;
  4108		}
  4109	#endif
  4110		/* If device/qdisc don't need skb->dst, release it right now while
  4111		 * its hot in this cpu cache.
  4112		 */
  4113		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
  4114			skb_dst_drop(skb);
  4115		else
  4116			skb_dst_force(skb);
  4117	
  4118		txq = netdev_core_pick_tx(dev, skb, sb_dev);
  4119		q = rcu_dereference_bh(txq->qdisc);
  4120	
  4121		trace_net_dev_queue(skb);
  4122		if (q->enqueue) {
  4123			rc = __dev_xmit_skb(skb, q, dev, txq);
  4124			goto out;
  4125		}
  4126	
  4127		/* The device has no queue. Common case for software devices:
  4128		 * loopback, all the sorts of tunnels...
  4129	
  4130		 * Really, it is unlikely that netif_tx_lock protection is necessary
  4131		 * here.  (f.e. loopback and IP tunnels are clean ignoring statistics
  4132		 * counters.)
  4133		 * However, it is possible, that they rely on protection
  4134		 * made by us here.
  4135	
  4136		 * Check this and shot the lock. It is not prone from deadlocks.
  4137		 *Either shot noqueue qdisc, it is even simpler 8)
  4138		 */
  4139		if (dev->flags & IFF_UP) {
  4140			int cpu = smp_processor_id(); /* ok because BHs are off */
  4141	
  4142			if (txq->xmit_lock_owner != cpu) {
  4143				if (dev_xmit_recursion())
  4144					goto recursion_alert;
  4145	
  4146				skb = validate_xmit_skb(skb, dev, &again);
  4147				if (!skb)
  4148					goto out;
  4149	
  4150				HARD_TX_LOCK(dev, txq, cpu);
  4151	
  4152				if (!netif_xmit_stopped(txq)) {
  4153					dev_xmit_recursion_inc();
  4154					skb = dev_hard_start_xmit(skb, dev, txq, &rc);
  4155					dev_xmit_recursion_dec();
  4156					if (dev_xmit_complete(rc)) {
  4157						HARD_TX_UNLOCK(dev, txq);
  4158						goto out;
  4159					}
  4160				}
  4161				HARD_TX_UNLOCK(dev, txq);
  4162				net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
  4163						     dev->name);
  4164			} else {
  4165				/* Recursion is detected! It is possible,
  4166				 * unfortunately
  4167				 */
  4168	recursion_alert:
  4169				net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
  4170						     dev->name);
  4171			}
  4172		}
  4173	
  4174		rc = -ENETDOWN;
  4175		rcu_read_unlock_bh();
> 4176	drop:

  4177		atomic_long_inc(&dev->tx_dropped);
  4178		kfree_skb_list(skb);
  4179		return rc;
  4180	out:
  4181		rcu_read_unlock_bh();
  4182		return rc;
  4183	}
  4184	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Oct. 7, 2020, 12:26 a.m. UTC | #2
Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: mips-randconfig-r024-20201005 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1127662c6dc2a276839c75a42238b11a3ad00f32)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/0day-ci/linux/commit/2065cee7d6b74c8f1dabae4e4e15999a841e3349
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
        git checkout 2065cee7d6b74c8f1dabae4e4e15999a841e3349
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   net/core/dev.c:4176:1: warning: unused label 'drop'
   drop:
   ^~~~~
>> net/core/dev.c:4068:7: warning: unused variable 'mtu_check'

   bool mtu_check = false;
   ^
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress'
   sch_handle_ingress(struct sk_buff struct packet_type int
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress'
   static inline int nf_ingress(struct sk_buff struct packet_type
   ^
   fatal error: error in backend: Nested variants found in inline asm string: ' .set push
   .set noat
   .set push
   .set arch=r4000
   .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   1: ll $0, $2 # __cmpxchg_asm
   bne $0, ${3:z}, 2f
   .set pop
   move $$1, ${4:z}
   .set arch=r4000
   sc $$1, $1
   beqz $$1, 1b
   .set pop
   2: .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   '
   clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
   clang version 12.0.0 (git://gitmirror/llvm_project 1127662c6dc2a276839c75a42238b11a3ad00f32)
   Target: mipsel-unknown-linux-gnu
   Thread model: posix
   InstalledDir: /opt/cross/clang-1127662c6d/bin
   clang-12: note: diagnostic msg:
   Makefile arch drivers include kernel net scripts source usr
--
>> net/core/dev.c:4068:7: warning: unused variable 'mtu_check'

   bool mtu_check = false;
   ^
   net/core/dev.c:4176:1: warning: unused label 'drop'
   drop:
   ^~~~~
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress'
   sch_handle_ingress(struct sk_buff struct packet_type int
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress'
   static inline int nf_ingress(struct sk_buff struct packet_type
   ^
   fatal error: error in backend: Nested variants found in inline asm string: ' .set push
   .set noat
   .set push
   .set arch=r4000
   .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   1: ll $0, $2 # __cmpxchg_asm
   bne $0, ${3:z}, 2f
   .set pop
   move $$1, ${4:z}
   .set arch=r4000
   sc $$1, $1
   beqz $$1, 1b
   .set pop
   2: .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   '
   clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
   clang version 12.0.0 (git://gitmirror/llvm_project 1127662c6dc2a276839c75a42238b11a3ad00f32)
   Target: mipsel-unknown-linux-gnu
   Thread model: posix
   InstalledDir: /opt/cross/clang-1127662c6d/bin
   clang-12: note: diagnostic msg:
   Makefile arch drivers include kernel net scripts source usr

vim +/mtu_check +4068 net/core/dev.c

  4037	
  4038	/**
  4039	 *	__dev_queue_xmit - transmit a buffer
  4040	 *	@skb: buffer to transmit
  4041	 *	@sb_dev: suboordinate device used for L2 forwarding offload
  4042	 *
  4043	 *	Queue a buffer for transmission to a network device. The caller must
  4044	 *	have set the device and priority and built the buffer before calling
  4045	 *	this function. The function can be called from an interrupt.
  4046	 *
  4047	 *	A negative errno code is returned on a failure. A success does not
  4048	 *	guarantee the frame will be transmitted as it may be dropped due
  4049	 *	to congestion or traffic shaping.
  4050	 *
  4051	 * -----------------------------------------------------------------------------------
  4052	 *      I notice this method can also return errors from the queue disciplines,
  4053	 *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
  4054	 *      be positive.
  4055	 *
  4056	 *      Regardless of the return value, the skb is consumed, so it is currently
  4057	 *      difficult to retry a send to this method.  (You can bump the ref count
  4058	 *      before sending to hold a reference for retry if you are careful.)
  4059	 *
  4060	 *      When calling this method, interrupts MUST be enabled.  This is because
  4061	 *      the BH enable code must have IRQs enabled so that it will not deadlock.
  4062	 *          --BLG
  4063	 */
  4064	static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
  4065	{
  4066		struct net_device *dev = skb->dev;
  4067		struct netdev_queue *txq;
> 4068		bool mtu_check = false;

  4069		bool again = false;
  4070		struct Qdisc *q;
  4071		int rc = -ENOMEM;
  4072	
  4073		skb_reset_mac_header(skb);
  4074	
  4075		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
  4076			__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
  4077	
  4078		/* Disable soft irqs for various locks below. Also
  4079		 * stops preemption for RCU.
  4080		 */
  4081		rcu_read_lock_bh();
  4082	
  4083		skb_update_prio(skb);
  4084	
  4085		qdisc_pkt_len_init(skb);
  4086	#ifdef CONFIG_NET_CLS_ACT
  4087		mtu_check = skb_is_redirected(skb);
  4088		skb->tc_at_ingress = 0;
  4089	# ifdef CONFIG_NET_EGRESS
  4090		if (static_branch_unlikely(&egress_needed_key)) {
  4091			unsigned int len_orig = skb->len;
  4092	
  4093			skb = sch_handle_egress(skb, &rc, dev);
  4094			if (!skb)
  4095				goto out;
  4096			/* BPF-prog ran and could have changed packet size beyond MTU */
  4097			if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
  4098				mtu_check = true;
  4099		}
  4100	# endif
  4101		/* MTU-check only happens on "last" net_device in a redirect sequence
  4102		 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
  4103		 * either ingress or egress to another device).
  4104		 */
  4105		if (mtu_check && !is_skb_forwardable(dev, skb)) {
  4106			rc = -EMSGSIZE;
  4107			goto drop;
  4108		}
  4109	#endif
  4110		/* If device/qdisc don't need skb->dst, release it right now while
  4111		 * its hot in this cpu cache.
  4112		 */
  4113		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
  4114			skb_dst_drop(skb);
  4115		else
  4116			skb_dst_force(skb);
  4117	
  4118		txq = netdev_core_pick_tx(dev, skb, sb_dev);
  4119		q = rcu_dereference_bh(txq->qdisc);
  4120	
  4121		trace_net_dev_queue(skb);
  4122		if (q->enqueue) {
  4123			rc = __dev_xmit_skb(skb, q, dev, txq);
  4124			goto out;
  4125		}
  4126	
  4127		/* The device has no queue. Common case for software devices:
  4128		 * loopback, all the sorts of tunnels...
  4129	
  4130		 * Really, it is unlikely that netif_tx_lock protection is necessary
  4131		 * here.  (f.e. loopback and IP tunnels are clean ignoring statistics
  4132		 * counters.)
  4133		 * However, it is possible, that they rely on protection
  4134		 * made by us here.
  4135	
  4136		 * Check this and shot the lock. It is not prone from deadlocks.
  4137		 *Either shot noqueue qdisc, it is even simpler 8)
  4138		 */
  4139		if (dev->flags & IFF_UP) {
  4140			int cpu = smp_processor_id(); /* ok because BHs are off */
  4141	
  4142			if (txq->xmit_lock_owner != cpu) {
  4143				if (dev_xmit_recursion())
  4144					goto recursion_alert;
  4145	
  4146				skb = validate_xmit_skb(skb, dev, &again);
  4147				if (!skb)
  4148					goto out;
  4149	
  4150				HARD_TX_LOCK(dev, txq, cpu);
  4151	
  4152				if (!netif_xmit_stopped(txq)) {
  4153					dev_xmit_recursion_inc();
  4154					skb = dev_hard_start_xmit(skb, dev, txq, &rc);
  4155					dev_xmit_recursion_dec();
  4156					if (dev_xmit_complete(rc)) {
  4157						HARD_TX_UNLOCK(dev, txq);
  4158						goto out;
  4159					}
  4160				}
  4161				HARD_TX_UNLOCK(dev, txq);
  4162				net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
  4163						     dev->name);
  4164			} else {
  4165				/* Recursion is detected! It is possible,
  4166				 * unfortunately
  4167				 */
  4168	recursion_alert:
  4169				net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
  4170						     dev->name);
  4171			}
  4172		}
  4173	
  4174		rc = -ENETDOWN;
  4175		rcu_read_unlock_bh();
  4176	drop:
  4177		atomic_long_inc(&dev->tx_dropped);
  4178		kfree_skb_list(skb);
  4179		return rc;
  4180	out:
  4181		rcu_read_unlock_bh();
  4182		return rc;
  4183	}
  4184	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index b433098896b2..33529022b30d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3870,6 +3870,7 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+		*ret = NET_XMIT_SUCCESS;
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
@@ -4064,9 +4065,10 @@  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 {
 	struct net_device *dev = skb->dev;
 	struct netdev_queue *txq;
+	bool mtu_check = false;
+	bool again = false;
 	struct Qdisc *q;
 	int rc = -ENOMEM;
-	bool again = false;
 
 	skb_reset_mac_header(skb);
 
@@ -4082,14 +4084,28 @@  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
+	mtu_check = skb_is_redirected(skb);
 	skb->tc_at_ingress = 0;
 # ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
+		unsigned int len_orig = skb->len;
+
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
+		/* BPF-prog ran and could have changed packet size beyond MTU */
+		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
+			mtu_check = true;
 	}
 # endif
+	/* MTU-check only happens on "last" net_device in a redirect sequence
+	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
+	 * either ingress or egress to another device).
+	 */
+	if (mtu_check && !is_skb_forwardable(dev, skb)) {
+		rc = -EMSGSIZE;
+		goto drop;
+	}
 #endif
 	/* If device/qdisc don't need skb->dst, release it right now while
 	 * its hot in this cpu cache.
@@ -4157,7 +4173,7 @@  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	rc = -ENETDOWN;
 	rcu_read_unlock_bh();
-
+drop:
 	atomic_long_inc(&dev->tx_dropped);
 	kfree_skb_list(skb);
 	return rc;