[net,v7,1/3] net: sched: fix packet stuck problem for lockless qdisc

Lockless qdisc has below concurrent problem:
    cpu0                 cpu1
     .                     .
q->enqueue                 .
     .                     .
qdisc_run_begin()          .
     .                     .
dequeue_skb()              .
     .                     .
sch_direct_xmit()          .
     .                     .
     .                q->enqueue
     .             qdisc_run_begin()
     .            return and do nothing
     .                     .
qdisc_run_end()            .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action)     cpu1             cpu2
          .                   .                .
          .              q->enqueue            .
          .            qdisc_run_begin()       .
          .              dequeue_skb()         .
          .                   .            q->enqueue
          .                   .                .
          .             sch_direct_xmit()      .
          .                   .         qdisc_run_begin()
          .                   .       return and do nothing
          .                   .                .
 clear __QDISC_STATE_SCHED    .                .
 qdisc_run_begin()            .                .
 return and do nothing        .                .
          .                   .                .
          .            qdisc_run_end()         .

This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
   not set, set STATE_MISSED and retry another spin_trylock() in
   case other CPU may not see STATE_MISSED after it releases the
   lock.
2. reschedule if STATE_MISSED is set after the lock is released
   at the end of qdisc_run_end().

For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.

Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.

Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

 threads  without+this_patch   with+this_patch      delta
    1        2.61Mpps            2.60Mpps           -0.3%
    2        3.97Mpps            3.82Mpps           -3.7%
    4        5.62Mpps            5.59Mpps           -0.5%
    8        2.78Mpps            2.77Mpps           -0.3%
   16        2.22Mpps            2.22Mpps           -0.0%

Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
V7: Clear STATE_MISSED before calling __netif_schedule()
    as suggested by Jakub.
V6: Check MISSED after the first trylock, and remove the
    automic test and set for performance sake as suggested
    by Jakub.
V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED mirroring
    NAPI's NAPIF_STATE_MISSED, and add Juergen's "Tested-by"
    tag for there is only renaming and typo fixing between
    V4 and V3.
V3: Fix a compile error and a few comment typo, remove the
    __QDISC_STATE_DEACTIVATED checking, and update the
    performance data.
V2: Avoid the overhead of fixing the data race as much as
    possible.
---
 include/net/sch_generic.h | 35 ++++++++++++++++++++++++++++++++++-
 net/sched/sch_generic.c   | 19 +++++++++++++++++++
 2 files changed, 53 insertions(+), 1 deletion(-)

Message ID	1620868260-32984-2-git-send-email-linyunsheng@huawei.com
State	Superseded
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDCA3C43462 for <netdev@archiver.kernel.org>; Thu, 13 May 2021 01:11:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE2C161406 for <netdev@archiver.kernel.org>; Thu, 13 May 2021 01:11:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239242AbhEMBM0 (ORCPT <rfc822;netdev@archiver.kernel.org>); Wed, 12 May 2021 21:12:26 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:2570 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234981AbhEMBMX (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 12 May 2021 21:12:23 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FgYSb6QPkzsRNP; Thu, 13 May 2021 09:08:31 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.498.0; Thu, 13 May 2021 09:10:58 +0800 From: Yunsheng Lin <linyunsheng@huawei.com> To: <davem@davemloft.net>, <kuba@kernel.org> CC: <olteanv@gmail.com>, <ast@kernel.org>, <daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>, <weiwan@google.com>, <cong.wang@bytedance.com>, <ap420073@gmail.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>, <mkl@pengutronix.de>, <linux-can@vger.kernel.org>, <jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>, <jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>, <songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>, <kpsingh@kernel.org>, <bpf@vger.kernel.org>, <jonas.bonn@netrounds.com>, <pabeni@redhat.com>, <mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>, <kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>, <atenart@kernel.org>, <alexander.duyck@gmail.com>, <hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>, <mkubecek@suse.cz>, <bjorn@kernel.org>, <alobakin@pm.me> Subject: [PATCH net v7 1/3] net: sched: fix packet stuck problem for lockless qdisc Date: Thu, 13 May 2021 09:10:58 +0800 Message-ID: <1620868260-32984-2-git-send-email-linyunsheng@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1620868260-32984-1-git-send-email-linyunsheng@huawei.com> References: <1620868260-32984-1-git-send-email-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-CFilter-Loop: Reflected Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	fix packet stuck problem for lockless qdisc \| expand [net,v7,0/3] fix packet stuck problem for lockless qdisc [net,v7,1/3] net: sched: fix packet stuck problem for lockless qdisc [net,v7,2/3] net: sched: fix endless tx action reschedule during deactivation [net,v7,3/3] net: sched: fix tx action reschedule issue with stopped queue

[net,v7,1/3] net: sched: fix packet stuck problem for lockless qdisc

Commit Message

Patch