From patchwork Tue Jan 26 01:11:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 371174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11BE7C433DB for ; Tue, 26 Jan 2021 19:54:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D445622A83 for ; Tue, 26 Jan 2021 19:54:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727270AbhAZFbv (ORCPT ); Tue, 26 Jan 2021 00:31:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727700AbhAZBbd (ORCPT ); Mon, 25 Jan 2021 20:31:33 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B3B2C061A31 for ; Mon, 25 Jan 2021 17:11:13 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id f16so1970691pgh.3 for ; Mon, 25 Jan 2021 17:11:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=8vibuuj1dKsMOvwlhBBf+nTr0uc0591aLFH4MSVjWJs=; b=vylLPnbqG6mOcGEHqzPYtqA7BjxA2mni15PJJ+Z795mgss1XnvQMYmGWhQWMA4vLHW S88EPl1wzL2fYB0mT+Ajt8uhgN7zEgpkkHuj8jz7HR5OnqmtrqE13rYCw0XDGVyaIRS6 6roBpcjMjdRaI+XAiM3oK4XdFTHzgqzpt6rC5WxjE9uDEnLf6bLdY1ZyMs0ErazXTFPv +gBmGQxwxa7A3o71AHmrjyjYYP8+zUViRQCTsWRYSHo2pj12gxPGbebF4IUttuTv+Ffh xvo6vp8VH426PiY7bgxI1kF16sgO5wCpvw/avvYIu4d8MdOBesBfDaF8C2PMhyqwaO1z 1WVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8vibuuj1dKsMOvwlhBBf+nTr0uc0591aLFH4MSVjWJs=; b=EtBc/d8ZhCny0z+Sa+nmbAN43OtvwZ8WJb1ZkavdQQxhwig0dh8KMeCGmknv5/nzWY GqHEJgP91x871Dh94o/yiS+VTOoWGbLtHBxKgMzWXrzyJBTHAPNBtRKhKaNAA1Wz4979 e9To6mUL4f4fWScPlgf8dcepg90x7wC9wxrRl/L9NsgmZKWRMelS4urv//nagkeOqzuh 9mfT3ieK9zPOkW1kvpPytV2OG68tK2pQu49zFhByd16tbVuNE2uXMdrc53GGMtAtk2KX efPZDtBJ8fbf9fGGRuW9BOEi4RWFr5ZaPg68OtJJ6vSqZLzfhNDjRi5VhSDm7jCB5I49 ghWA== X-Gm-Message-State: AOAM533KCoxUgbnWfNsCwYU+vFuXtAr+ZMfVf1BckeWGZv10oqRn+fv/ hneavM409wH5FQnbiOLH7U6H053DsVc= X-Google-Smtp-Source: ABdhPJxUP2LYKoYaBIZWWk2MDu1iHgNaWZO8QvbsMzZ52Jan8Z6soHKg4n/V3y3sAjDGm4RjwKj6/ufefLI= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a63:da17:: with SMTP id c23mr3178197pgh.348.1611623473076; Mon, 25 Jan 2021 17:11:13 -0800 (PST) Date: Mon, 25 Jan 2021 17:11:07 -0800 In-Reply-To: <20210126011109.2425966-1-weiwan@google.com> Message-Id: <20210126011109.2425966-2-weiwan@google.com> Mime-Version: 1.0 References: <20210126011109.2425966-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.280.ga3ce27912f-goog Subject: [PATCH net-next v8 1/3] net: extract napi poll functionality to __napi_poll() From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Eric Dumazet , Paolo Abeni , Hannes Frederic Sowa , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Felix Fietkau This commit introduces a new function __napi_poll() which does the main logic of the existing napi_poll() function, and will be called by other functions in later commits. This idea and implementation is done by Felix Fietkau and is proposed as part of the patch to move napi work to work_queue context. This commit by itself is a code restructure. Signed-off-by: Felix Fietkau Signed-off-by: Wei Wang --- net/core/dev.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 0332f2e8f7da..7d23bff03864 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6768,15 +6768,10 @@ void __netif_napi_del(struct napi_struct *napi) } EXPORT_SYMBOL(__netif_napi_del); -static int napi_poll(struct napi_struct *n, struct list_head *repoll) +static int __napi_poll(struct napi_struct *n, bool *repoll) { - void *have; int work, weight; - list_del_init(&n->poll_list); - - have = netpoll_poll_lock(n); - weight = n->weight; /* This NAPI_STATE_SCHED test is for avoiding a race @@ -6796,7 +6791,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) n->poll, work, weight); if (likely(work < weight)) - goto out_unlock; + return work; /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code @@ -6805,7 +6800,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) */ if (unlikely(napi_disable_pending(n))) { napi_complete(n); - goto out_unlock; + return work; } /* The NAPI context has more processing work, but busy-polling @@ -6818,7 +6813,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) */ napi_schedule(n); } - goto out_unlock; + return work; } if (n->gro_bitmask) { @@ -6836,9 +6831,29 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) if (unlikely(!list_empty(&n->poll_list))) { pr_warn_once("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); - goto out_unlock; + return work; } + *repoll = true; + + return work; +} + +static int napi_poll(struct napi_struct *n, struct list_head *repoll) +{ + bool do_repoll = false; + void *have; + int work; + + list_del_init(&n->poll_list); + + have = netpoll_poll_lock(n); + + work = __napi_poll(n, &do_repoll); + + if (!do_repoll) + goto out_unlock; + list_add_tail(&n->poll_list, repoll); out_unlock: From patchwork Tue Jan 26 01:11:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 371210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CE76C433DB for ; Tue, 26 Jan 2021 05:33:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4143F229C4 for ; Tue, 26 Jan 2021 05:33:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728597AbhAZFdF (ORCPT ); Tue, 26 Jan 2021 00:33:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730587AbhAZBcY (ORCPT ); Mon, 25 Jan 2021 20:32:24 -0500 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1DCDC061A32 for ; Mon, 25 Jan 2021 17:11:15 -0800 (PST) Received: by mail-qt1-x849.google.com with SMTP id b8so8377326qtr.18 for ; Mon, 25 Jan 2021 17:11:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=MOobICiR3lrbiPVGXKBGqp4xvGAkQMkkuE51Yamln9Q=; b=aqRukVX3USA+ufIbk27Sobkm9wIL1cdiYPc1YSEiIpLuc8O9t99Wg6HMtO6WcppHAE TkH6tkRXn1kEdsH2vh2cTf3GJsBzqmDaSWIeY7FGLaKkmQQbpsc+TBK2E0ud64SClbYR K1y5t6Yz46Fet3r1cUnAMilqf0dlQq9QM9X0zg0hUFpeNOG3PwmSfj8gRssWSx7BdsPJ JjdyLWpWNQRfFiDcR15nn4U3E6SEi8Wk3C+ebU9zlXv4+x0N2dqLW6xuD0kAVMtPt+Py 3xREe5Fl05TClQtLq04DNqGkuMtTb32v1TUJNrwbTbmo1HE2OC6v7Av/maKQnEnjXwEk RjRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=MOobICiR3lrbiPVGXKBGqp4xvGAkQMkkuE51Yamln9Q=; b=BakodpbfcpIUFvEgTOy4+8+YpTTu/LbnYpMjTQsZ4nTkIy7LqRB+2OGmek8BwxwMMM 5ulHMIkphcnLcnHFsMHVHbv5T2Ud5xc976/J0RJoFsqwwScaW4lXSJPkITPERGwquhO2 GIc0vN6BfIktLM9JZsOau0WA06yq4EUM41auTXSfbTJpL1Zb+No6CRAbR5gn6FGfbwJW eHic0akhYkmT3cExemYCThSIXGLSauvN/kq22N3KgsPXXMB5U1eGzG8weNEyM0ggtxNo C7cEfhQZymY93H0ly4QPXF0JFjivkPgAKaZaK2dR6h2VN149/gbkxNm2BEMj8Ie00YO5 7ybQ== X-Gm-Message-State: AOAM533HP+EGJaD06IRSG/irVdnrCQ3T4sWDJXkBQtuuhn5LYiV8sxoP bST0oKQy1Cg15qKtXiF88snNV5U/czs= X-Google-Smtp-Source: ABdhPJwWt2nKiSwjPb4fjgnxLi6nBidkIu7YDZF0DAtWye7ReHIIjnVLsPKnwi4H0mqab4/G8+jdlsAsYBk= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a0c:9e2d:: with SMTP id p45mr3519540qve.40.1611623474998; Mon, 25 Jan 2021 17:11:14 -0800 (PST) Date: Mon, 25 Jan 2021 17:11:08 -0800 In-Reply-To: <20210126011109.2425966-1-weiwan@google.com> Message-Id: <20210126011109.2425966-3-weiwan@google.com> Mime-Version: 1.0 References: <20210126011109.2425966-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.280.ga3ce27912f-goog Subject: [PATCH net-next v8 2/3] net: implement threaded-able napi poll loop support From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Eric Dumazet , Paolo Abeni , Hannes Frederic Sowa , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch allows running each napi poll loop inside its own kernel thread. The kthread is created during netif_napi_add() if dev->threaded is set. And threaded mode is enabled in napi_enable(). We will provide a way to set dev->threaded and enable threaded mode without a device up/down in the following patch. Once that threaded mode is enabled and the kthread is started, napi_schedule() will wake-up such thread instead of scheduling the softirq. The threaded poll loop behaves quite likely the net_rx_action, but it does not have to manipulate local irqs and uses an explicit scheduling point based on netdev_budget. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Hannes Frederic Sowa Signed-off-by: Hannes Frederic Sowa Co-developed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski Signed-off-by: Wei Wang --- include/linux/netdevice.h | 19 ++----- net/core/dev.c | 117 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 122 insertions(+), 14 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 02dcef4d66e2..8cb8d43ea5fa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -347,6 +347,7 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; unsigned int napi_id; + struct task_struct *thread; }; enum { @@ -358,6 +359,7 @@ enum { NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */ NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ + NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ }; enum { @@ -369,6 +371,7 @@ enum { NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), + NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), }; enum gro_result { @@ -503,20 +506,7 @@ static inline bool napi_complete(struct napi_struct *n) */ void napi_disable(struct napi_struct *n); -/** - * napi_enable - enable NAPI scheduling - * @n: NAPI context - * - * Resume NAPI from being scheduled on this context. - * Must be paired with napi_disable. - */ -static inline void napi_enable(struct napi_struct *n) -{ - BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); - smp_mb__before_atomic(); - clear_bit(NAPI_STATE_SCHED, &n->state); - clear_bit(NAPI_STATE_NPSVC, &n->state); -} +void napi_enable(struct napi_struct *n); /** * napi_synchronize - wait until NAPI is not running @@ -2143,6 +2133,7 @@ struct net_device { struct lock_class_key *qdisc_running_key; bool proto_down; unsigned wol_enabled:1; + unsigned threaded:1; struct list_head net_notifier_list; diff --git a/net/core/dev.c b/net/core/dev.c index 7d23bff03864..743dd69fba19 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -91,6 +91,7 @@ #include #include #include +#include #include #include #include @@ -1493,6 +1494,37 @@ void netdev_notify_peers(struct net_device *dev) } EXPORT_SYMBOL(netdev_notify_peers); +static int napi_threaded_poll(void *data); + +static int napi_kthread_create(struct napi_struct *n) +{ + int err = 0; + + /* Create and wake up the kthread once to put it in + * TASK_INTERRUPTIBLE mode to avoid the blocked task + * warning and work with loadavg. + */ + n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d", + n->dev->name, n->napi_id); + if (IS_ERR(n->thread)) { + err = PTR_ERR(n->thread); + pr_err("kthread_run failed with err %d\n", err); + n->thread = NULL; + } + + return err; +} + +static void napi_kthread_stop(struct napi_struct *n) +{ + if (!n->thread) + return; + + kthread_stop(n->thread); + clear_bit(NAPI_STATE_THREADED, &n->state); + n->thread = NULL; +} + static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) { const struct net_device_ops *ops = dev->netdev_ops; @@ -4252,6 +4284,21 @@ int gro_normal_batch __read_mostly = 8; static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi) { + struct task_struct *thread; + + if (test_bit(NAPI_STATE_THREADED, &napi->state)) { + /* Paired with smp_mb__before_atomic() in + * napi_enable(). Use READ_ONCE() to guarantee + * a complete read on napi->thread. Only call + * wake_up_process() when it's not NULL. + */ + thread = READ_ONCE(napi->thread); + if (thread) { + wake_up_process(thread); + return; + } + } + list_add_tail(&napi->poll_list, &sd->poll_list); __raise_softirq_irqoff(NET_RX_SOFTIRQ); } @@ -6720,6 +6767,12 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, set_bit(NAPI_STATE_NPSVC, &napi->state); list_add_rcu(&napi->dev_list, &dev->napi_list); napi_hash_add(napi); + /* Create kthread for this napi if dev->threaded is set. + * Clear dev->threaded if kthread creation failed so that + * threaded mode will not be enabled in napi_enable(). + */ + if (dev->threaded && napi_kthread_create(napi)) + dev->threaded = 0; } EXPORT_SYMBOL(netif_napi_add); @@ -6734,12 +6787,31 @@ void napi_disable(struct napi_struct *n) msleep(1); hrtimer_cancel(&n->timer); + napi_kthread_stop(n); clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state); clear_bit(NAPI_STATE_DISABLE, &n->state); } EXPORT_SYMBOL(napi_disable); +/** + * napi_enable - enable NAPI scheduling + * @n: NAPI context + * + * Resume NAPI from being scheduled on this context. + * Must be paired with napi_disable. + */ +void napi_enable(struct napi_struct *n) +{ + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); + smp_mb__before_atomic(); + clear_bit(NAPI_STATE_SCHED, &n->state); + clear_bit(NAPI_STATE_NPSVC, &n->state); + if (n->dev->threaded && n->thread) + set_bit(NAPI_STATE_THREADED, &n->state); +} +EXPORT_SYMBOL(napi_enable); + static void flush_gro_hash(struct napi_struct *napi) { int i; @@ -6862,6 +6934,51 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) return work; } +static int napi_thread_wait(struct napi_struct *napi) +{ + set_current_state(TASK_INTERRUPTIBLE); + + while (!kthread_should_stop() && !napi_disable_pending(napi)) { + if (test_bit(NAPI_STATE_SCHED, &napi->state)) { + WARN_ON(!list_empty(&napi->poll_list)); + __set_current_state(TASK_RUNNING); + return 0; + } + + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + __set_current_state(TASK_RUNNING); + return -1; +} + +static int napi_threaded_poll(void *data) +{ + struct napi_struct *napi = data; + void *have; + + while (!napi_thread_wait(napi)) { + for (;;) { + bool repoll = false; + + local_bh_disable(); + + have = netpoll_poll_lock(napi); + __napi_poll(napi, &repoll); + netpoll_poll_unlock(have); + + __kfree_skb_flush(); + local_bh_enable(); + + if (!repoll) + break; + + cond_resched(); + } + } + return 0; +} + static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); From patchwork Tue Jan 26 01:11:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 371209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C091EC433E6 for ; Tue, 26 Jan 2021 05:35:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8ACAA22DD6 for ; Tue, 26 Jan 2021 05:35:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728919AbhAZFdt (ORCPT ); Tue, 26 Jan 2021 00:33:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728433AbhAZBeK (ORCPT ); Mon, 25 Jan 2021 20:34:10 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF4A8C0610D6 for ; Mon, 25 Jan 2021 17:11:17 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id hg20so694322pjb.3 for ; Mon, 25 Jan 2021 17:11:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=7rHfeoR/rzjDxDJmcyo6Qv9sSR5oFjlaIjUi/wlTGcs=; b=EumdQGmvD35r7d5GonAwcFtWzhL2JSjmNkx4u+IufZemSS8OvpV3XQfhgzRLcWMiUI 0P5uHOtc8G46ON0Tk3hqRmBnhO5x91bs6v0JORYi7/codUHmxNaH5VG95DIXKWViYFn2 CLaoIYekKpDaQlUjqfVD+i6kbIL9vp4oIyhWZZ1wapUWMK4Cyj9cLdoyVH0g4GvoTiuK Buob52svQk10Geh7QDKFFhqxP9Z8TIlVYH5oVxs86XTCXHiD+8igQXmqZIvwJJIp16YN Xcd0H9DiF5P86B9ehdcnwmMeYhvW1eco7//ak03KCUqSegHmXbWHGCrtdozKVVN6G+Oq dETg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=7rHfeoR/rzjDxDJmcyo6Qv9sSR5oFjlaIjUi/wlTGcs=; b=KsjuGs/z4Y5BVSYPx18KLIoaFwrqndOa/jQmoL9MxNwYjKGXf+gQi3EAJyFrapVSQ2 +z9KZNZvkOkccuw34Js/bIIX4iyYZVn2YRbaet3UpB8uHg4ktGYXpsOXhxLrSvav+f7o C7kLkNwTHqCTTgu+jGTe6wHmGmxFGO7x+J9C+nsGk/nqzce4DgtzWtOhT5uRuBiWQoxA aqsYDPItdStumHkh5JK4g/ZP+JTgHi81pwMuTjzq5Ta/gCaF2owmAVmrYV6OWUTaHWF9 H6gbb13s2Znwh4NyMdsHn/Mjnd7vZVsBsIxnUJ9Zs7LPF27fRyLDgT7QBu5ROw3Y4dwb BpZw== X-Gm-Message-State: AOAM533bigiJQePnY87FzeGz3Hc3hrXpmoIGEg4SkIyaOfjdXfp1Cpi2 aBM7hYztQF9+z80AiJy/yPKDhrkfwpQ= X-Google-Smtp-Source: ABdhPJzpvJx9nk6bu8vRYVQNqEI4tSu8FBLmlnRqVGEaP8eYBusP0wj29JYxoPtUmueQMGeChKaZRTkfR0c= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a62:3503:0:b029:1aa:6f15:b9fe with SMTP id c3-20020a6235030000b02901aa6f15b9femr2921572pfa.65.1611623477124; Mon, 25 Jan 2021 17:11:17 -0800 (PST) Date: Mon, 25 Jan 2021 17:11:09 -0800 In-Reply-To: <20210126011109.2425966-1-weiwan@google.com> Message-Id: <20210126011109.2425966-4-weiwan@google.com> Mime-Version: 1.0 References: <20210126011109.2425966-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.280.ga3ce27912f-goog Subject: [PATCH net-next v8 3/3] net: add sysfs attribute to control napi threaded mode From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Eric Dumazet , Paolo Abeni , Hannes Frederic Sowa , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds a new sysfs attribute to the network device class. Said attribute provides a per-device control to enable/disable the threaded mode for all the napi instances of the given network device, without the need for a device up/down. User sets it to 1 or 0 to enable or disable threaded mode. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Hannes Frederic Sowa Signed-off-by: Hannes Frederic Sowa Co-developed-by: Felix Fietkau Signed-off-by: Felix Fietkau Signed-off-by: Wei Wang --- Documentation/ABI/testing/sysfs-class-net | 15 ++++++ include/linux/netdevice.h | 2 + net/core/dev.c | 61 ++++++++++++++++++++++- net/core/net-sysfs.c | 50 +++++++++++++++++++ 4 files changed, 126 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index 1f2002df5ba2..1419103d11f9 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -337,3 +337,18 @@ Contact: netdev@vger.kernel.org Description: 32-bit unsigned integer counting the number of times the link has been down + +What: /sys/class/net//threaded +Date: Jan 2021 +KernelVersion: 5.12 +Contact: netdev@vger.kernel.org +Description: + Boolean value to control the threaded mode per device. User could + set this value to enable/disable threaded mode for all napi + belonging to this device, without the need to do device up/down. + + Possible values: + == ================================== + 0 threaded mode disabled for this dev + 1 threaded mode enabled for this dev + == ================================== diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8cb8d43ea5fa..26c3e8cf4c01 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n) return napi_complete_done(n, 0); } +int dev_set_threaded(struct net_device *dev, bool threaded); + /** * napi_disable - prevent NAPI from scheduling * @n: NAPI context diff --git a/net/core/dev.c b/net/core/dev.c index 743dd69fba19..1897af6a46eb 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4288,8 +4288,9 @@ static inline void ____napi_schedule(struct softnet_data *sd, if (test_bit(NAPI_STATE_THREADED, &napi->state)) { /* Paired with smp_mb__before_atomic() in - * napi_enable(). Use READ_ONCE() to guarantee - * a complete read on napi->thread. Only call + * napi_enable()/napi_set_threaded(). + * Use READ_ONCE() to guarantee a complete + * read on napi->thread. Only call * wake_up_process() when it's not NULL. */ thread = READ_ONCE(napi->thread); @@ -6740,6 +6741,62 @@ static void init_gro_hash(struct napi_struct *napi) napi->gro_bitmask = 0; } +static int napi_set_threaded(struct napi_struct *n, bool threaded) +{ + int err = 0; + + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state)) + return 0; + + if (!threaded) { + clear_bit(NAPI_STATE_THREADED, &n->state); + return 0; + } + + if (!n->thread) { + err = napi_kthread_create(n); + if (err) + return err; + } + + /* Make sure kthread is created before THREADED bit + * is set. + */ + smp_mb__before_atomic(); + set_bit(NAPI_STATE_THREADED, &n->state); + + return 0; +} + +static void dev_disable_threaded_all(struct net_device *dev) +{ + struct napi_struct *napi; + + list_for_each_entry(napi, &dev->napi_list, dev_list) + napi_set_threaded(napi, false); + dev->threaded = 0; +} + +int dev_set_threaded(struct net_device *dev, bool threaded) +{ + struct napi_struct *napi; + int ret; + + dev->threaded = threaded; + list_for_each_entry(napi, &dev->napi_list, dev_list) { + ret = napi_set_threaded(napi, threaded); + if (ret) { + /* Error occurred on one of the napi, + * reset threaded mode on all napi. + */ + dev_disable_threaded_all(dev); + break; + } + } + + return ret; +} + void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight) { diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index daf502c13d6d..884f049ee395 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -538,6 +538,55 @@ static ssize_t phys_switch_id_show(struct device *dev, } static DEVICE_ATTR_RO(phys_switch_id); +static ssize_t threaded_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct net_device *netdev = to_net_dev(dev); + int ret; + + if (!rtnl_trylock()) + return restart_syscall(); + + if (!dev_isalive(netdev)) { + ret = -EINVAL; + goto unlock; + } + + if (list_empty(&netdev->napi_list)) { + ret = -EOPNOTSUPP; + goto unlock; + } + + ret = sprintf(buf, fmt_dec, netdev->threaded); + +unlock: + rtnl_unlock(); + return ret; +} + +static int modify_napi_threaded(struct net_device *dev, unsigned long val) +{ + int ret; + + if (list_empty(&dev->napi_list)) + return -EOPNOTSUPP; + + if (val != 0 && val != 1) + return -EOPNOTSUPP; + + ret = dev_set_threaded(dev, val); + + return ret; +} + +static ssize_t threaded_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return netdev_store(dev, attr, buf, len, modify_napi_threaded); +} +static DEVICE_ATTR_RW(threaded); + static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_netdev_group.attr, &dev_attr_type.attr, @@ -570,6 +619,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_proto_down.attr, &dev_attr_carrier_up_count.attr, &dev_attr_carrier_down_count.attr, + &dev_attr_threaded.attr, NULL, }; ATTRIBUTE_GROUPS(net_class);