[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

Message ID	20210304195252.16360-1-michael-dev@fami-braun.de
State	New
Headers	show Return-Path: <netdev-owner@kernel.org> From: michael-dev@fami-braun.de To: claudiu.manoil@nxp.com Cc: netdev@vger.kernel.org, Michael Braun <michael-dev@fami-braun.de> Subject: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash Date: Thu, 4 Mar 2021 20:52:52 +0100 Message-Id: <20210304195252.16360-1-michael-dev@fami-braun.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash \| expand [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

Message ID

20210304195252.16360-1-michael-dev@fami-braun.de

State

New

Headers

From: michael-dev@fami-braun.de
To: claudiu.manoil@nxp.com
Cc: netdev@vger.kernel.org, Michael Braun <michael-dev@fami-braun.de>
Subject: [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash
Date: Thu,  4 Mar 2021 20:52:52 +0100
Message-Id: <20210304195252.16360-1-michael-dev@fami-braun.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash | expand

Commit Message

michael-dev March 4, 2021, 7:52 p.m. UTC

From: Michael Braun <michael-dev@fami-braun.de>

When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:

   | lstatus                              |       | skb                   |
t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
---+--------------------------------------+-------+-----------------------+
13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |

So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.

This then crashes:

kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c

This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
---
 drivers/net/ethernet/freescale/gianfar.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Vladimir Oltean March 4, 2021, 8:17 p.m. UTC | #1

Hi Michael,

On Thu, Mar 04, 2021 at 08:52:52PM +0100, michael-dev@fami-braun.de wrote:
> From: Michael Braun <michael-dev@fami-braun.de>
>
> When using jumbo packets and overrunning rx queue with napi enabled,
> the following sequence is observed in gfar_add_rx_frag:
>
>    | lstatus                              |       | skb                   |
> t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
> ---+--------------------------------------+-------+-----------------------+
> 13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
> 12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
> 11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
> 10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
> 09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
> 08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
> 07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
> 06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
> 05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
> 04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
> 03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
> 02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
> 01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
> 00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |
>
> So at t=7 a new packets is started but not finished, probably due to rx
> overrun - but rx overrun is not indicated in the flags. Instead a new
> packets starts at t=8. This results in skb->len to exceed size for the LAST
> fragment at t=13 and thus a negative fragment size added to the skb.
>
> This then crashes:
>
> kernel BUG at include/linux/skbuff.h:2277!
> Oops: Exception in kernel mode, sig: 5 [#1]
> ...
> NIP [c04689f4] skb_pull+0x2c/0x48
> LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
> Call Trace:
> [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
> [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
> [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
> [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
> [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
> [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
> [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
> [ec4bff08] [c0062718] kthread+0x140/0x144
> [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
>
> This patch fixes this by checking for computed LAST fragment size, so a
> negative sized fragment is never added.
> In order to prevent the newer rx frame from getting corrupted, the FIRST
> flag is checked to discard the incomplete older frame.
>
> Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
> ---

Just for my understanding, do you have a reproducer for the issue?
I notice you haven't answered Claudiu's questions posted on v1.
On LS1021A I cannot trigger this apparent hardware issue even if I force
RX overruns (by reducing the ring size). Judging from the "NIP" register
from your stack trace, this is a PowerPC device, which one is it?

patchwork-bot+netdevbpf@kernel.org March 5, 2021, 9:30 p.m. UTC | #2

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu,  4 Mar 2021 20:52:52 +0100 you wrote:
> From: Michael Braun <michael-dev@fami-braun.de>

> 

> When using jumbo packets and overrunning rx queue with napi enabled,

> the following sequence is observed in gfar_add_rx_frag:

> 

>    | lstatus                              |       | skb                   |

> t  | lstatus,  size, flags                | first | len, data_len, *ptr   |

> 

> [...]


Here is the summary with links:
  - [PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash
    https://git.kernel.org/netdev/net/c/d8861bab48b6

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

michael-dev March 7, 2021, 8:43 p.m. UTC | #3

Am 04.03.2021 21:17, schrieb Vladimir Oltean:
> Just for my understanding, do you have a reproducer for the issue?

> I notice you haven't answered Claudiu's questions posted on v1.

> On LS1021A I cannot trigger this apparent hardware issue even if I 

> force

> RX overruns (by reducing the ring size). Judging from the "NIP" 

> register

> from your stack trace, this is a PowerPC device, which one is it?


This is on P1020WLAN (AP) devices and it happens reproducably when I use 
ipsec + iperf3 --udp -b 1000M on some other server targetting the AP.
Yes this is PPC.

Regards,
Michael

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 541de32ea662..1cf8ef717453 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2390,6 +2390,10 @@  static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus,
 		if (lstatus & BD_LFLAG(RXBD_LAST))
 			size -= skb->len;
 
+		WARN(size < 0, "gianfar: rx fragment size underflow");
+		if (size < 0)
+			return false;
+
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
 				rxb->page_offset + RXBUF_ALIGNMENT,
 				size, GFAR_RXB_TRUESIZE);
@@ -2552,6 +2556,17 @@  static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue,
 		if (lstatus & BD_LFLAG(RXBD_EMPTY))
 			break;
 
+		/* lost RXBD_LAST descriptor due to overrun */
+		if (skb &&
+		    (lstatus & BD_LFLAG(RXBD_FIRST))) {
+			/* discard faulty buffer */
+			dev_kfree_skb(skb);
+			skb = NULL;
+			rx_queue->stats.rx_dropped++;
+
+			/* can continue normally */
+		}
+
 		/* order rx buffer descriptor reads */
 		rmb();

[PATCHv2] gianfar: fix jumbo packets+napi+rx overrun crash

Commit Message

Comments

Patch