Message ID | 20210304195252.16360-1-michael-dev@fami-braun.de |
---|---|
State | New |
Headers | show |
Series | [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash | expand |
Hi Michael, On Thu, Mar 04, 2021 at 08:52:52PM +0100, michael-dev@fami-braun.de wrote: > From: Michael Braun <michael-dev@fami-braun.de> > > When using jumbo packets and overrunning rx queue with napi enabled, > the following sequence is observed in gfar_add_rx_frag: > > | lstatus | | skb | > t | lstatus, size, flags | first | len, data_len, *ptr | > ---+--------------------------------------+-------+-----------------------+ > 13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | > 12 | 10000640, 1600, INTERRUPT | 0 | 8000, 6400, f554c12e | > 11 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, f554c12e | > 10 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, f554c12e | > 09 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, f554c12e | > 08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | > 07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, f554c12e | > 06 | 1c000080, 128, INTERRUPT LAST FIRST | 1 | 0, 0, abf3bd6e | > 05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | > 04 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, c5a57780 | > 03 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, c5a57780 | > 02 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, c5a57780 | > 01 | 10000640, 1600, INTERRUPT | 0 | 1600, 0, c5a57780 | > 00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, c5a57780 | > > So at t=7 a new packets is started but not finished, probably due to rx > overrun - but rx overrun is not indicated in the flags. Instead a new > packets starts at t=8. This results in skb->len to exceed size for the LAST > fragment at t=13 and thus a negative fragment size added to the skb. > > This then crashes: > > kernel BUG at include/linux/skbuff.h:2277! > Oops: Exception in kernel mode, sig: 5 [#1] > ... > NIP [c04689f4] skb_pull+0x2c/0x48 > LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 > Call Trace: > [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) > [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 > [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c > [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 > [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc > [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 > [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc > [ec4bff08] [c0062718] kthread+0x140/0x144 > [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c > > This patch fixes this by checking for computed LAST fragment size, so a > negative sized fragment is never added. > In order to prevent the newer rx frame from getting corrupted, the FIRST > flag is checked to discard the incomplete older frame. > > Signed-off-by: Michael Braun <michael-dev@fami-braun.de> > --- Just for my understanding, do you have a reproducer for the issue? I notice you haven't answered Claudiu's questions posted on v1. On LS1021A I cannot trigger this apparent hardware issue even if I force RX overruns (by reducing the ring size). Judging from the "NIP" register from your stack trace, this is a PowerPC device, which one is it?
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Thu, 4 Mar 2021 20:52:52 +0100 you wrote: > From: Michael Braun <michael-dev@fami-braun.de> > > When using jumbo packets and overrunning rx queue with napi enabled, > the following sequence is observed in gfar_add_rx_frag: > > | lstatus | | skb | > t | lstatus, size, flags | first | len, data_len, *ptr | > > [...] Here is the summary with links: - [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash https://git.kernel.org/netdev/net/c/d8861bab48b6 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Am 04.03.2021 21:17, schrieb Vladimir Oltean: > Just for my understanding, do you have a reproducer for the issue? > I notice you haven't answered Claudiu's questions posted on v1. > On LS1021A I cannot trigger this apparent hardware issue even if I > force > RX overruns (by reducing the ring size). Judging from the "NIP" > register > from your stack trace, this is a PowerPC device, which one is it? This is on P1020WLAN (AP) devices and it happens reproducably when I use ipsec + iperf3 --udp -b 1000M on some other server targetting the AP. Yes this is PPC. Regards, Michael
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 541de32ea662..1cf8ef717453 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus, if (lstatus & BD_LFLAG(RXBD_LAST)) size -= skb->len; + WARN(size < 0, "gianfar: rx fragment size underflow"); + if (size < 0) + return false; + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, rxb->page_offset + RXBUF_ALIGNMENT, size, GFAR_RXB_TRUESIZE); @@ -2552,6 +2556,17 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, if (lstatus & BD_LFLAG(RXBD_EMPTY)) break; + /* lost RXBD_LAST descriptor due to overrun */ + if (skb && + (lstatus & BD_LFLAG(RXBD_FIRST))) { + /* discard faulty buffer */ + dev_kfree_skb(skb); + skb = NULL; + rx_queue->stats.rx_dropped++; + + /* can continue normally */ + } + /* order rx buffer descriptor reads */ rmb();