From patchwork Thu Mar 11 22:37:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 399452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3266DC432C3 for ; Thu, 11 Mar 2021 22:38:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 14E8464F8F for ; Thu, 11 Mar 2021 22:38:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231208AbhCKWhw (ORCPT ); Thu, 11 Mar 2021 17:37:52 -0500 Received: from mail.kernel.org ([198.145.29.99]:33478 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229678AbhCKWhh (ORCPT ); Thu, 11 Mar 2021 17:37:37 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 70E8264F9A; Thu, 11 Mar 2021 22:37:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615502256; bh=63zc9a2TcICeWxaVl+ln137zpVYnCihOv+bXEROrJRQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qJqpv+c/Y2rbXRbmpTCMSSFAHr9YeygXxfhKCJTC0Wymw2KaGo9od32FUKPrBQz5k CNgYlSa87TUVopMcqdOoGejoHUOZdw/B70Unh3YWObFSrqTvyJOQ6jDnPbatjUIp7w fZGpENlCZMwEyT++7jb1tbOrbi6FsO3O/gKKxHxwsSNKnbgkgxaCJ65kNxnoG6JyJZ CEXrBnOQ+xzupxx7J03lbaY2xcbcCHNWt777f4GgHmzM4GN/o7yje9FOrsdKVuXvsB 3RcBYw9Y2YlAa/iVrDFEctOHc8LyjZ0q3aoYCupuJxc5G15wlZoHZ0acSx9NnJdvwv 1gRw8wAJDcnJA== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski Cc: netdev@vger.kernel.org, Leon Romanovsky , Moshe Shemesh , Saeed Mahameed Subject: [net-next 06/15] net/mlx5: Check returned value from health recover sequence Date: Thu, 11 Mar 2021 14:37:14 -0800 Message-Id: <20210311223723.361301-7-saeed@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210311223723.361301-1-saeed@kernel.org> References: <20210311223723.361301-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Leon Romanovsky MLX5_INTERFACE_STATE_UP is far from being reliable check for success to recover, because it can be changed any time and health logic doesn't have any locks to protect from it. The locks are not needed here because health recover is good to have, but not must to success, so rely on the returned value from the mlx5_recover_device() as a marker for success/failure. Reviewed-by: Moshe Shemesh Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++++-- drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 0c32c485eb58..a0a851640804 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -335,12 +335,12 @@ static int mlx5_health_try_recover(struct mlx5_core_dev *dev) return -EIO; } mlx5_core_err(dev, "starting health recovery flow\n"); - mlx5_recover_device(dev); - if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state) || - mlx5_health_check_fatal_sensors(dev)) { + if (mlx5_recover_device(dev) || mlx5_health_check_fatal_sensors(dev)) { mlx5_core_err(dev, "health recovery failed\n"); return -EIO; } + + mlx5_core_info(dev, "health revovery succeded\n"); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 363bc3e917c2..e3a417d17707 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1721,11 +1721,14 @@ void mlx5_disable_device(struct mlx5_core_dev *dev) mlx5_unload_one(dev); } -void mlx5_recover_device(struct mlx5_core_dev *dev) +int mlx5_recover_device(struct mlx5_core_dev *dev) { + int ret = -EIO; + mlx5_pci_disable_device(dev); if (mlx5_pci_slot_reset(dev->pdev) == PCI_ERS_RESULT_RECOVERED) - mlx5_pci_resume(dev->pdev); + ret = mlx5_load_one(dev); + return ret; } static struct pci_driver mlx5_core_driver = { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 02993a51b114..37c8ec7d2217 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -134,7 +134,7 @@ void mlx5_error_sw_reset(struct mlx5_core_dev *dev); u32 mlx5_health_check_fatal_sensors(struct mlx5_core_dev *dev); int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev); void mlx5_disable_device(struct mlx5_core_dev *dev); -void mlx5_recover_device(struct mlx5_core_dev *dev); +int mlx5_recover_device(struct mlx5_core_dev *dev); int mlx5_sriov_init(struct mlx5_core_dev *dev); void mlx5_sriov_cleanup(struct mlx5_core_dev *dev); int mlx5_sriov_attach(struct mlx5_core_dev *dev);