From patchwork Thu Mar 11 03:26:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 398235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0842FC433E9 for ; Thu, 11 Mar 2021 03:27:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D67C464FA8 for ; Thu, 11 Mar 2021 03:27:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230461AbhCKD02 (ORCPT ); Wed, 10 Mar 2021 22:26:28 -0500 Received: from mail.kernel.org ([198.145.29.99]:49354 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230122AbhCKD0R (ORCPT ); Wed, 10 Mar 2021 22:26:17 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 75EA164F9F; Thu, 11 Mar 2021 03:26:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615433176; bh=KUEW2t6cH7hWlD+QXDaWwlVzpmeN19XinrPOSNGScNE=; h=From:To:Cc:Subject:Date:Reply-To:From; b=sc62NkeBpY9PA7CkhUb5mJTZ0WjnPiR29CC0VAH7hZmm4va6QnP8oxgB1jAQgF2PM UeB4aFFUGKUlJUdk2e4ohPmFGWWYicLu3AlzsMPPzyy+eUuO8KG9HISzU1Zrv/ulJh OfAMBlMYnGI4hMx22pSvYcIA69+eG88HDAA9L6+gT0DzGDQvT1iu6HA6r/Jysle+TJ pGiRrpLPB+IRvkNfmcSFNTB6XhIi6XaVGOmwDvInaMbMOV39ednekLe1zdubDzySmA yB2kbjbKGyoSDVN9IV72gvcS1ZkSEqmwimovKbM7ZKItIDXS7JpHh2BetC4gvRNSI2 6kgTFbG92FLcA== From: Jakub Kicinski To: netdev@vger.kernel.org Cc: jiri@resnulli.us, saeedm@nvidia.com, andrew.gospodarek@broadcom.com, jacob.e.keller@intel.com, guglielmo.morandin@broadcom.com, eugenem@fb.com, eranbe@mellanox.com, Jakub Kicinski Subject: [RFC net-next v2 1/3] devlink: move health state to uAPI Date: Wed, 10 Mar 2021 19:26:11 -0800 Message-Id: <20210311032613.1533100-1-kuba@kernel.org> X-Mailer: git-send-email 2.29.2 Reply-To: f242ed68-d31b-527d-562f-c5a35123861a@intel.com MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move the health states into uAPI, so applications can use them. Note that we need to change the name of the enum because user space is likely already defining the same values. E.g. iproute2 does. Use this opportunity to shorten the names. Signed-off-by: Jakub Kicinski --- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 4 ++-- .../ethernet/mellanox/mlx5/core/en/health.c | 4 ++-- include/net/devlink.h | 7 +------ include/uapi/linux/devlink.h | 12 ++++++++++++ net/core/devlink.c | 18 +++++++++--------- 5 files changed, 26 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index 64381be935a8..cafc98ab4b5e 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -252,9 +252,9 @@ void bnxt_dl_health_status_update(struct bnxt *bp, bool healthy) u8 state; if (healthy) - state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; + state = DL_HEALTH_STATE_HEALTHY; else - state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; + state = DL_HEALTH_STATE_ERROR; if (health->fatal) devlink_health_reporter_state_update(health->fw_fatal_reporter, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c index 84e501e057b4..c526e31e562c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c @@ -151,10 +151,10 @@ void mlx5e_health_channels_update(struct mlx5e_priv *priv) { if (priv->tx_reporter) devlink_health_reporter_state_update(priv->tx_reporter, - DEVLINK_HEALTH_REPORTER_STATE_HEALTHY); + DL_HEALTH_STATE_HEALTHY); if (priv->rx_reporter) devlink_health_reporter_state_update(priv->rx_reporter, - DEVLINK_HEALTH_REPORTER_STATE_HEALTHY); + DL_HEALTH_STATE_HEALTHY); } int mlx5e_health_sq_to_ready(struct mlx5_core_dev *mdev, struct net_device *dev, u32 sqn) diff --git a/include/net/devlink.h b/include/net/devlink.h index 853420db5d32..b424328af658 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -656,11 +656,6 @@ struct devlink_port_region_ops { struct devlink_fmsg; struct devlink_health_reporter; -enum devlink_health_reporter_state { - DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, - DEVLINK_HEALTH_REPORTER_STATE_ERROR, -}; - /** * struct devlink_health_reporter_ops - Reporter operations * @name: reporter name @@ -1675,7 +1670,7 @@ int devlink_health_report(struct devlink_health_reporter *reporter, const char *msg, void *priv_ctx); void devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, - enum devlink_health_reporter_state state); + enum devlink_health_state state); void devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter); diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index f6008b2fa60f..41a6ea3b2256 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -608,4 +608,16 @@ enum devlink_port_fn_opstate { DEVLINK_PORT_FN_OPSTATE_ATTACHED, }; +/** + * enum devlink_health_state - indicates the state of a health reporter + * @DL_HEALTH_STATE_HEALTHY: fully operational, working state + * @DL_HEALTH_STATE_ERROR: error state, running health reporter's recovery + * may fix the issue, otherwise user needs to try + * power cycling or other forms of reset + */ +enum devlink_health_state { + DL_HEALTH_STATE_HEALTHY, + DL_HEALTH_STATE_ERROR, +}; + #endif /* _UAPI_LINUX_DEVLINK_H_ */ diff --git a/net/core/devlink.c b/net/core/devlink.c index 737b61c2976e..8e4e4bd7bb36 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -6346,7 +6346,7 @@ devlink_health_reporter_recover(struct devlink_health_reporter *reporter, { int err; - if (reporter->health_state == DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) + if (reporter->health_state == DL_HEALTH_STATE_HEALTHY) return 0; if (!reporter->ops->recover) @@ -6357,7 +6357,7 @@ devlink_health_reporter_recover(struct devlink_health_reporter *reporter, return err; devlink_health_reporter_recovery_done(reporter); - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; + reporter->health_state = DL_HEALTH_STATE_HEALTHY; devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); return 0; @@ -6416,7 +6416,7 @@ static int devlink_health_do_dump(struct devlink_health_reporter *reporter, int devlink_health_report(struct devlink_health_reporter *reporter, const char *msg, void *priv_ctx) { - enum devlink_health_reporter_state prev_health_state; + enum devlink_health_state prev_health_state; struct devlink *devlink = reporter->devlink; unsigned long recover_ts_threshold; @@ -6425,14 +6425,14 @@ int devlink_health_report(struct devlink_health_reporter *reporter, trace_devlink_health_report(devlink, reporter->ops->name, msg); reporter->error_count++; prev_health_state = reporter->health_state; - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; + reporter->health_state = DL_HEALTH_STATE_ERROR; devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); /* abort if the previous error wasn't recovered */ recover_ts_threshold = reporter->last_recovery_ts + msecs_to_jiffies(reporter->graceful_period); if (reporter->auto_recover && - (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || + (prev_health_state != DL_HEALTH_STATE_HEALTHY || (reporter->last_recovery_ts && reporter->recovery_count && time_is_after_jiffies(recover_ts_threshold)))) { trace_devlink_health_recover_aborted(devlink, @@ -6443,7 +6443,7 @@ int devlink_health_report(struct devlink_health_reporter *reporter, return -ECANCELED; } - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; + reporter->health_state = DL_HEALTH_STATE_ERROR; if (reporter->auto_dump) { mutex_lock(&reporter->dump_lock); @@ -6520,10 +6520,10 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) void devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, - enum devlink_health_reporter_state state) + enum devlink_health_state state) { - if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY && - state != DEVLINK_HEALTH_REPORTER_STATE_ERROR)) + if (WARN_ON(state != DL_HEALTH_STATE_HEALTHY && + state != DL_HEALTH_STATE_ERROR)) return; if (reporter->health_state == state) From patchwork Thu Mar 11 03:26:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 398236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3AFAC433E0 for ; Thu, 11 Mar 2021 03:27:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8432764F9F for ; Thu, 11 Mar 2021 03:27:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230391AbhCKD01 (ORCPT ); Wed, 10 Mar 2021 22:26:27 -0500 Received: from mail.kernel.org ([198.145.29.99]:49370 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230125AbhCKD0R (ORCPT ); Wed, 10 Mar 2021 22:26:17 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 22E7D64FC9; Thu, 11 Mar 2021 03:26:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615433177; bh=c68gw+qPQpsb6OkRTN6dewkYJxoVacZSeWgUNGpNM5c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To:From; b=SyWuyMaOzQnnEj2uSBDlQmzFLj0iGzLflYyyW+BoNjr55Z4e+4OSI2cnS55hqRblz oZOTJfViYrgCHf2gegVP3X74wtNS7I9b+F9vVgIWfGmgm3AEdXi/4MFqpo9aAUvt29 9kadAoAuHUWlpcIiO4txMnwjZzl+kUadb+7ycD2m4HzQjF/lsvU8/FA9Xy1CIEhfVy n7RgjVVJG6Epzsuo/dPP9KV8rCi7ACbq+NomVuYsf7uCcFbXM5/krrqz3V6SGrVF6h EqzepchTqdXmC67l4pUqsGaAqAzyRP1RcRLw0T54rTHJHu4z917XHYOBfu8ABQSxAs QEHjmRccDUCkA== From: Jakub Kicinski To: netdev@vger.kernel.org Cc: jiri@resnulli.us, saeedm@nvidia.com, andrew.gospodarek@broadcom.com, jacob.e.keller@intel.com, guglielmo.morandin@broadcom.com, eugenem@fb.com, eranbe@mellanox.com, Jakub Kicinski Subject: [RFC net-next v2 2/3] devlink: health: add remediation type Date: Wed, 10 Mar 2021 19:26:12 -0800 Message-Id: <20210311032613.1533100-2-kuba@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210311032613.1533100-1-kuba@kernel.org> References: <20210311032613.1533100-1-kuba@kernel.org> Reply-To: f242ed68-d31b-527d-562f-c5a35123861a@intel.com MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently devlink health does not give user any clear information of what kind of remediation ->recover callback will perform. This makes it difficult to understand the impact of enabling auto- -remediation, and the severity of the error itself. To allow users to make more informed decision add a new remediation type attribute. Note that we only allow one remediation type per reporter, this is intentional. devlink health is not built for mixing issues of different severity into one reporter since it only maintains one dump, of the first event and a single error counter. Nudging vendors towards categorizing issues beyond coarse groups is an added bonus. Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 2 ++ include/uapi/linux/devlink.h | 25 +++++++++++++++++++++++++ net/core/devlink.c | 7 ++++++- 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b424328af658..72b37769761f 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -659,6 +659,7 @@ struct devlink_health_reporter; /** * struct devlink_health_reporter_ops - Reporter operations * @name: reporter name + * remedy: severity of the remediation required * @recover: callback to recover from reported error * if priv_ctx is NULL, run a full recover * @dump: callback to dump an object @@ -669,6 +670,7 @@ struct devlink_health_reporter; struct devlink_health_reporter_ops { char *name; + enum devlink_health_remedy remedy; int (*recover)(struct devlink_health_reporter *reporter, void *priv_ctx, struct netlink_ext_ack *extack); int (*dump)(struct devlink_health_reporter *reporter, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 41a6ea3b2256..8cd1508b525b 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -534,6 +534,9 @@ enum devlink_attr { DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */ DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */ + + DEVLINK_ATTR_HEALTH_REPORTER_REMEDY, /* u32 */ + /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, @@ -620,4 +623,26 @@ enum devlink_health_state { DL_HEALTH_STATE_ERROR, }; +/** + * enum devlink_health_reporter_remedy - severity of remediation procedure + * @DL_HEALTH_REMEDY_NONE: transient error, no remediation required + * @DL_HEALTH_REMEDY_KICK: device stalled, processing will be re-triggered + * @DL_HEALTH_REMEDY_COMP_RESET: associated device component (e.g. device queue) + * will be reset + * @DL_HEALTH_REMEDY_RESET: full device reset, will result in temporary + * unavailability of the device, device configuration + * should not be lost + * @DL_HEALTH_REMEDY_REINIT: device will be reinitialized and configuration lost + * + * Used in %DEVLINK_ATTR_HEALTH_REPORTER_REMEDY, categorizes the health reporter + * by the severity of the remediation. + */ +enum devlink_health_remedy { + DL_HEALTH_REMEDY_NONE = 1, + DL_HEALTH_REMEDY_KICK, + DL_HEALTH_REMEDY_COMP_RESET, + DL_HEALTH_REMEDY_RESET, + DL_HEALTH_REMEDY_REINIT, +}; + #endif /* _UAPI_LINUX_DEVLINK_H_ */ diff --git a/net/core/devlink.c b/net/core/devlink.c index 8e4e4bd7bb36..09d77d43ff63 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -6095,7 +6095,8 @@ __devlink_health_reporter_create(struct devlink *devlink, { struct devlink_health_reporter *reporter; - if (WARN_ON(graceful_period && !ops->recover)) + if (WARN_ON(graceful_period && !ops->recover) || + WARN_ON(ops->recover && !ops->remedy)) return ERR_PTR(-EINVAL); reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); @@ -6265,6 +6266,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg, if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME, reporter->ops->name)) goto reporter_nest_cancel; + if (reporter->ops->remedy && + nla_put_u32(msg, DEVLINK_ATTR_HEALTH_REPORTER_REMEDY, + reporter->ops->remedy)) + goto reporter_nest_cancel; if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE, reporter->health_state)) goto reporter_nest_cancel; From patchwork Thu Mar 11 03:26:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 399528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 004EAC43381 for ; Thu, 11 Mar 2021 03:27:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B5DC264FA1 for ; Thu, 11 Mar 2021 03:27:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230477AbhCKD03 (ORCPT ); Wed, 10 Mar 2021 22:26:29 -0500 Received: from mail.kernel.org ([198.145.29.99]:49388 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230173AbhCKD0S (ORCPT ); Wed, 10 Mar 2021 22:26:18 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B34BE64FCC; Thu, 11 Mar 2021 03:26:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615433178; bh=Nqfuf8iNSzg9GNwxT+jhFRRMPbs5xnVjaEsqiYuZFEw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To:From; b=ipV/dIP2Eo4cwOnCnBY1MuUP4td8K69xLYIS2eqT5Dvb8/+f8+2utzD7pixnf98nD VnatlybbjpP51IAWEQMtrGWWBp3cALEFjhk205Kgn9EdjZDD3ltXii8+aQZ8bgfEHc /8qOLp60IelRqvZbuGqBY5yRdBYaLJ4Rsi7M3w/woMxzMVP97o3JNJYN/A1XwiT0Iq nLs9K+XK2jC7ltzhbS0RNCm1EgnM+OID0wMXFvNjXMIV1IuEGjTM926i6KDfgwof7X KLCC9RxXRcSQs2WjUjgJcIXZBmTkVa572a1yo8RC/yLLq+T+OMPArZgtzYwEpmwjCA Rdo1+tHW+uhsA== From: Jakub Kicinski To: netdev@vger.kernel.org Cc: jiri@resnulli.us, saeedm@nvidia.com, andrew.gospodarek@broadcom.com, jacob.e.keller@intel.com, guglielmo.morandin@broadcom.com, eugenem@fb.com, eranbe@mellanox.com, Jakub Kicinski Subject: [RFC net-next v2 3/3] devlink: add more failure modes Date: Wed, 10 Mar 2021 19:26:13 -0800 Message-Id: <20210311032613.1533100-3-kuba@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210311032613.1533100-1-kuba@kernel.org> References: <20210311032613.1533100-1-kuba@kernel.org> Reply-To: f242ed68-d31b-527d-562f-c5a35123861a@intel.com MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org >> Pending vendors adding the right reporters. << Extend the applicability of devlink health reporters beyond what can be locally remedied. Add failure modes which require re-flashing the NVM image or HW changes. The expectation is that driver will call devlink_health_reporter_state_update() to put hardware health reporters into bad state. Signed-off-by: Jakub Kicinski --- include/uapi/linux/devlink.h | 7 +++++++ net/core/devlink.c | 3 +-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 8cd1508b525b..f623bbc63489 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -617,10 +617,17 @@ enum devlink_port_fn_opstate { * @DL_HEALTH_STATE_ERROR: error state, running health reporter's recovery * may fix the issue, otherwise user needs to try * power cycling or other forms of reset + * @DL_HEALTH_STATE_BAD_IMAGE: device's non-volatile memory needs + * to be re-written, usually due to block corruption + * @DL_HEALTH_STATE_BAD_HW: hardware errors detected, device, host + * or the connection between the two may be at fault */ enum devlink_health_state { DL_HEALTH_STATE_HEALTHY, DL_HEALTH_STATE_ERROR, + + DL_HEALTH_STATE_BAD_IMAGE, + DL_HEALTH_STATE_BAD_HW, }; /** diff --git a/net/core/devlink.c b/net/core/devlink.c index 09d77d43ff63..4a9fa6288a4a 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -6527,8 +6527,7 @@ void devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, enum devlink_health_state state) { - if (WARN_ON(state != DL_HEALTH_STATE_HEALTHY && - state != DL_HEALTH_STATE_ERROR)) + if (WARN_ON(state > DL_HEALTH_STATE_BAD_HW)) return; if (reporter->health_state == state)