mbox series

[net-next,00/15] bnxt_en: Error recovery improvements.

Message ID 1611558501-11022-1-git-send-email-michael.chan@broadcom.com
Headers show
Series bnxt_en: Error recovery improvements. | expand

Message

Michael Chan Jan. 25, 2021, 7:08 a.m. UTC
This series contains a number of improvements in the area of error
recovery.  Most error recovery scenarios are tightly coordinated with
the firmware.  A number of patches add retry logic to establish
connection with the firmware if there are indications that the
firmware is still alive and will likely transition back to the
normal state.  Some patches speed up the recovery process and make
it more reliable.  There are some cleanup patches as well.

Edwin Peer (3):
  bnxt_en: handle CRASH_NO_MASTER during bnxt_open()
  bnxt_en: log firmware debug notifications
  bnxt_en: attempt to reinitialize after aborted reset

Michael Chan (9):
  bnxt_en: Update firmware interface to 1.10.2.11.
  bnxt_en: Define macros for the various health register states.
  bnxt_en: Retry sending the first message to firmware if it is under
    reset.
  bnxt_en: Add bnxt_fw_reset_timeout() helper.
  bnxt_en: Add a new BNXT_STATE_NAPI_DISABLED flag to keep track of NAPI
    state.
  bnxt_en: Modify bnxt_disable_int_sync() to be called more than once.
  bnxt_en: Improve firmware fatal error shutdown sequence.
  bnxt_en: Consolidate firmware reset event logging.
  bnxt_en: Do not process completion entries after fatal condition
    detected.

Vasundhara Volam (3):
  bnxt_en: Move reading VPD info after successful handshake with fw.
  bnxt_en: Add an upper bound for all firmware command timeouts.
  bnxt_en: Retry open if firmware is in reset.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 228 ++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  22 ++
 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 249 ++++++++++++++----
 4 files changed, 393 insertions(+), 113 deletions(-)

Comments

Joe Perches Jan. 25, 2021, 9:24 a.m. UTC | #1
On Mon, 2021-01-25 at 02:08 -0500, Michael Chan wrote:
> From: Edwin Peer <edwin.peer@broadcom.com>

> 

> Firmware is capable of generating asynchronous debug notifications.

> The event data is opaque to the driver and is simply logged. Debug

> notifications can be enabled by turning on hardware status messages

> using the ethtool msglvl interface.

[]
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c

[]
> @@ -2072,6 +2073,13 @@ static int bnxt_async_event_process(struct bnxt *bp,

>  			bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);

>  		goto async_event_process_exit;

>  	}

> +	case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION:

> +		if (netif_msg_hw(bp)) {

> +			netdev_notice(bp->dev,

> +				      "Received firmware debug notification, data1: 0x%x, data2: 0x%x\n",

> +				      data1, data2);

> +		}


		netif_notice(bp, hw, bp->dev,
			     "Received firmware debug notification, data1: 0x%x, data2: 0x%x\n",
			     data1, data2);

> +		goto async_event_process_exit;


>  	case ASYNC_EVENT_CMPL_EVENT_ID_RING_MONITOR_MSG: {

>  		struct bnxt_rx_ring_info *rxr;

>  		u16 grp_idx;
Willem de Bruijn Jan. 26, 2021, 1:37 a.m. UTC | #2
On Mon, Jan 25, 2021 at 3:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
>

> This series contains a number of improvements in the area of error

> recovery.  Most error recovery scenarios are tightly coordinated with

> the firmware.  A number of patches add retry logic to establish

> connection with the firmware if there are indications that the

> firmware is still alive and will likely transition back to the

> normal state.  Some patches speed up the recovery process and make

> it more reliable.  There are some cleanup patches as well.

>

> Edwin Peer (3):

>   bnxt_en: handle CRASH_NO_MASTER during bnxt_open()

>   bnxt_en: log firmware debug notifications

>   bnxt_en: attempt to reinitialize after aborted reset

>

> Michael Chan (9):

>   bnxt_en: Update firmware interface to 1.10.2.11.

>   bnxt_en: Define macros for the various health register states.

>   bnxt_en: Retry sending the first message to firmware if it is under

>     reset.

>   bnxt_en: Add bnxt_fw_reset_timeout() helper.

>   bnxt_en: Add a new BNXT_STATE_NAPI_DISABLED flag to keep track of NAPI

>     state.

>   bnxt_en: Modify bnxt_disable_int_sync() to be called more than once.

>   bnxt_en: Improve firmware fatal error shutdown sequence.

>   bnxt_en: Consolidate firmware reset event logging.

>   bnxt_en: Do not process completion entries after fatal condition

>     detected.

>

> Vasundhara Volam (3):

>   bnxt_en: Move reading VPD info after successful handshake with fw.

>   bnxt_en: Add an upper bound for all firmware command timeouts.

>   bnxt_en: Retry open if firmware is in reset.

>

>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 228 ++++++++++++----

>  drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  22 ++

>  .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |   7 +-

>  drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 249 ++++++++++++++----

>  4 files changed, 393 insertions(+), 113 deletions(-)


For netdrv:

Acked-by: Willem de Bruijn <willemb@google.com>
Jakub Kicinski Jan. 26, 2021, 3:23 a.m. UTC | #3
On Mon, 25 Jan 2021 20:37:52 -0500 Willem de Bruijn wrote:
> On Mon, Jan 25, 2021 at 3:36 AM Michael Chan <michael.chan@broadcom.com> wrote:

> > This series contains a number of improvements in the area of error

> > recovery.  Most error recovery scenarios are tightly coordinated with

> > the firmware.  A number of patches add retry logic to establish

> > connection with the firmware if there are indications that the

> > firmware is still alive and will likely transition back to the

> > normal state.  Some patches speed up the recovery process and make

> > it more reliable.  There are some cleanup patches as well.

>

> Acked-by: Willem de Bruijn <willemb@google.com>


Thanks! 

Applied.