From patchwork Tue Sep 14 19:25:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Linus_L=C3=BCssing?= X-Patchwork-Id: 511452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 580C1C433FE for ; Tue, 14 Sep 2021 19:33:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A231610CE for ; Tue, 14 Sep 2021 19:33:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233235AbhINTe5 (ORCPT ); Tue, 14 Sep 2021 15:34:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232815AbhINTey (ORCPT ); Tue, 14 Sep 2021 15:34:54 -0400 X-Greylist: delayed 484 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 14 Sep 2021 12:33:36 PDT Received: from mail.aperture-lab.de (mail.aperture-lab.de [IPv6:2a01:4f8:c2c:665b::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CFFCC061574; Tue, 14 Sep 2021 12:33:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 7FA8241008; Tue, 14 Sep 2021 21:25:30 +0200 (CEST) From: =?utf-8?q?Linus_L=C3=BCssing?= To: Kalle Valo , Felix Fietkau , Sujith Manoharan , ath9k-devel@qca.qualcomm.com Cc: linux-wireless@vger.kernel.org, "David S . Miller" , Jakub Kicinski , "John W . Linville" , Felix Fietkau , Simon Wunderlich , Sven Eckelmann , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Linus_L=C3=BCssing?= Subject: [PATCH 1/3] ath9k: add option to reset the wifi chip via debugfs Date: Tue, 14 Sep 2021 21:25:13 +0200 Message-Id: <20210914192515.9273-2-linus.luessing@c0d3.blue> In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue> References: <20210914192515.9273-1-linus.luessing@c0d3.blue> MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org From: Linus Lüssing Sometimes, in yet unknown cases the wifi chip stops working. To allow a watchdog in userspace to easily and quickly reset the wifi chip, add the according functionality to userspace. A reset can then be triggered via: $ echo 1 > /sys/kernel/debug/ieee80211/phy0/ath9k/reset The number of user resets can further be tracked in the row "User reset" in the same file. So far people usually used "iw scan" to fix ath9k chip hangs from userspace. Which triggers the ath9k_queue_reset(), too. The reset file however has the advantage of less overhead, which makes debugging bugs within ath9k_queue_reset() easier. Signed-off-by: Linus Lüssing --- drivers/net/wireless/ath/ath9k/debug.c | 57 ++++++++++++++++++++++++-- drivers/net/wireless/ath/ath9k/debug.h | 1 + 2 files changed, 54 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c index 4c81b1d7f417..fb7a2952d0ce 100644 --- a/drivers/net/wireless/ath/ath9k/debug.c +++ b/drivers/net/wireless/ath/ath9k/debug.c @@ -749,9 +749,9 @@ static int read_file_misc(struct seq_file *file, void *data) static int read_file_reset(struct seq_file *file, void *data) { - struct ieee80211_hw *hw = dev_get_drvdata(file->private); - struct ath_softc *sc = hw->priv; + struct ath_softc *sc = file->private; static const char * const reset_cause[__RESET_TYPE_MAX] = { + [RESET_TYPE_USER] = "User reset", [RESET_TYPE_BB_HANG] = "Baseband Hang", [RESET_TYPE_BB_WATCHDOG] = "Baseband Watchdog", [RESET_TYPE_FATAL_INT] = "Fatal HW Error", @@ -779,6 +779,55 @@ static int read_file_reset(struct seq_file *file, void *data) return 0; } +static int open_file_reset(struct inode *inode, struct file *f) +{ + return single_open(f, read_file_reset, inode->i_private); +} + +static ssize_t write_file_reset(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct ath_softc *sc = file_inode(file)->i_private; + struct ath_hw *ah = sc->sc_ah; + struct ath_common *common = ath9k_hw_common(ah); + unsigned long val; + char buf[32]; + ssize_t len; + + len = min(count, sizeof(buf) - 1); + if (copy_from_user(buf, user_buf, len)) + return -EFAULT; + + buf[len] = '\0'; + if (kstrtoul(buf, 0, &val)) + return -EINVAL; + + if (val != 1) + return -EINVAL; + + /* avoid rearming hw_reset_work on shutdown */ + mutex_lock(&sc->mutex); + if (test_bit(ATH_OP_INVALID, &common->op_flags)) { + mutex_unlock(&sc->mutex); + return -EBUSY; + } + + ath9k_queue_reset(sc, RESET_TYPE_USER); + mutex_unlock(&sc->mutex); + + return count; +} + +static const struct file_operations fops_reset = { + .read = seq_read, + .write = write_file_reset, + .open = open_file_reset, + .owner = THIS_MODULE, + .llseek = seq_lseek, + .release = single_release, +}; + void ath_debug_stat_tx(struct ath_softc *sc, struct ath_buf *bf, struct ath_tx_status *ts, struct ath_txq *txq, unsigned int flags) @@ -1393,8 +1442,8 @@ int ath9k_init_debug(struct ath_hw *ah) read_file_queues); debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy, read_file_misc); - debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy, - read_file_reset); + debugfs_create_file("reset", 0600, sc->debug.debugfs_phy, + sc, &fops_reset); ath9k_cmn_debug_recv(sc->debug.debugfs_phy, &sc->debug.stats.rxstats); ath9k_cmn_debug_phy_err(sc->debug.debugfs_phy, &sc->debug.stats.rxstats); diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h index 33826aa13687..389459c04d14 100644 --- a/drivers/net/wireless/ath/ath9k/debug.h +++ b/drivers/net/wireless/ath/ath9k/debug.h @@ -39,6 +39,7 @@ struct fft_sample_tlv; #endif enum ath_reset_type { + RESET_TYPE_USER, RESET_TYPE_BB_HANG, RESET_TYPE_BB_WATCHDOG, RESET_TYPE_FATAL_INT, From patchwork Tue Sep 14 19:25:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Linus_L=C3=BCssing?= X-Patchwork-Id: 511453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5108FC433F5 for ; Tue, 14 Sep 2021 19:33:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3AE3C60E9B for ; Tue, 14 Sep 2021 19:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233322AbhINTe6 (ORCPT ); Tue, 14 Sep 2021 15:34:58 -0400 Received: from mail.aperture-lab.de ([116.203.183.178]:44930 "EHLO mail.aperture-lab.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232113AbhINTey (ORCPT ); Tue, 14 Sep 2021 15:34:54 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id D4D2641015; Tue, 14 Sep 2021 21:25:34 +0200 (CEST) From: =?utf-8?q?Linus_L=C3=BCssing?= To: Kalle Valo , Felix Fietkau , Sujith Manoharan , ath9k-devel@qca.qualcomm.com Cc: linux-wireless@vger.kernel.org, "David S . Miller" , Jakub Kicinski , "John W . Linville" , Felix Fietkau , Simon Wunderlich , Sven Eckelmann , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Linus_L=C3=BCssing?= , =?utf-8?q?Linus_L=C3=BCssing?= Subject: [PATCH 3/3] ath9k: Fix potential hw interrupt resume during reset Date: Tue, 14 Sep 2021 21:25:15 +0200 Message-Id: <20210914192515.9273-4-linus.luessing@c0d3.blue> In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue> References: <20210914192515.9273-1-linus.luessing@c0d3.blue> MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org From: Linus Lüssing There is a small risk of the ath9k hw interrupts being reenabled in the following way: 1) ath_reset_internal() ... -> disable_irq() ... <- returns 2) ath9k_tasklet() ... -> ath9k_hw_resume_interrupts() ... 1) ath_reset_internal() continued: -> tasklet_disable(&sc->intr_tq); (= ath9k_tasklet() off) By first disabling the ath9k interrupt there is a small window afterwards which allows ath9k hw interrupts being reenabled through the ath9k_tasklet() before we disable this tasklet in ath_reset_internal(). Leading to having the ath9k hw interrupts enabled during the reset, which we should avoid. Fixing this by first disabling all ath9k tasklets. And only after they are not running anymore also disabling the overall ath9k interrupt. Either ath9k_queue_reset()->ath9k_kill_hw_interrupts() or ath_reset_internal()->disable_irq()->ath_isr()->ath9k_kill_hw_interrupts() should then have ensured that no ath9k hw interrupts are running during the actual ath9k reset. We could reproduce this issue with two Lima boards from 8devices (QCA4531) on OpenWrt 19.07 while sending UDP traffic between the two and triggering an ath9k_queue_reset() and with added msleep()s between disable_irq() and tasklet_disable() in ath_reset_internal(). Cc: Sven Eckelmann Cc: Simon Wunderlich Cc: Linus Lüssing Fixes: e3f31175a3ee ("ath9k: fix race condition in irq processing during hardware reset") Signed-off-by: Linus Lüssing --- drivers/net/wireless/ath/ath9k/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index 98090e40e1cf..b9f9a8ae3b56 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -292,9 +292,9 @@ static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan) __ath_cancel_work(sc); - disable_irq(sc->irq); tasklet_disable(&sc->intr_tq); tasklet_disable(&sc->bcon_tasklet); + disable_irq(sc->irq); spin_lock_bh(&sc->sc_pcu_lock); if (!sc->cur_chan->offchannel) {