From patchwork Wed Apr 1 15:29:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stuart Haslam X-Patchwork-Id: 47090 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wg0-f71.google.com (mail-wg0-f71.google.com [74.125.82.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B739A21416 for ; Sat, 11 Apr 2015 10:08:23 +0000 (UTC) Received: by wgtl5 with SMTP id l5sf8978107wgt.1 for ; Sat, 11 Apr 2015 03:08:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:delivered-to:date:from:to :message-id:references:mime-version:content-disposition:in-reply-to :user-agent:cc:subject:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:content-type :content-transfer-encoding:errors-to:sender:x-original-sender :x-original-authentication-results:mailing-list; bh=nq9pTBwgEs9iw6mm61sZfAgXDbiPsIGNuF2A5j5Rf5Y=; b=LIlNFAYyzTC8waxLTqH75w8h55uzNPImDvn6jRRBkqfyzAERGNhzTRorxlFn/PidTQ /voDbTw+2zvyIyB0vWFh2/tPfyLWUdAqEk11k8eugUXipqTLgw+JUa98lKZLZr6Ub9uP Dz8qD7g/0R6psFRJKXeiT95eA/ZFWt9tVFRXE+m5lwogWCmqMAYdeGQVgWhKlV4cNMnM 1crgxrnAds4ycXiR3nHXzU9wUfSZqx133oLdbg/3glacYHF9jDwJVKV0NRySNqsvAi1y Yu0Lw7GDoc8hGq2VuTq9sfJVc4Pos1/if9/WYHmC8JrtvPO1syi/TzJ/2UryhwH1Ha2q cdqA== X-Gm-Message-State: ALoCoQlD4yiiYL/AsFoqelBgQIKQH/wWOy2yjRrlFhwQXXOPGqjQowpQw8eMeKBWtKogXfNMQUlk X-Received: by 10.112.46.172 with SMTP id w12mr954085lbm.18.1428746903028; Sat, 11 Apr 2015 03:08:23 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.42.172 with SMTP id p12ls556899lal.40.gmail; Sat, 11 Apr 2015 03:08:22 -0700 (PDT) X-Received: by 10.152.18.225 with SMTP id z1mr5108033lad.124.1428746902815; Sat, 11 Apr 2015 03:08:22 -0700 (PDT) Received: from mail-la0-f46.google.com (mail-la0-f46.google.com. [209.85.215.46]) by mx.google.com with ESMTPS id xg1si3518060lac.41.2015.04.11.03.08.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Apr 2015 03:08:22 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.46 as permitted sender) client-ip=209.85.215.46; Received: by labbd9 with SMTP id bd9so28402055lab.2 for ; Sat, 11 Apr 2015 03:08:22 -0700 (PDT) X-Received: by 10.112.141.202 with SMTP id rq10mr5047945lbb.88.1428746902540; Sat, 11 Apr 2015 03:08:22 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.67.65 with SMTP id l1csp235726lbt; Sat, 11 Apr 2015 03:08:21 -0700 (PDT) X-Received: by 10.229.249.6 with SMTP id mi6mr6949134qcb.24.1428746901319; Sat, 11 Apr 2015 03:08:21 -0700 (PDT) Received: from lists.linaro.org (lists.linaro.org. [54.225.227.206]) by mx.google.com with ESMTP id p9si4402815qca.25.2015.04.11.03.08.20; Sat, 11 Apr 2015 03:08:21 -0700 (PDT) Received-SPF: pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) client-ip=54.225.227.206; Received: by lists.linaro.org (Postfix, from userid 109) id ABBC261C99; Sat, 11 Apr 2015 10:08:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on ip-10-142-244-252.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,T_FRT_BELOW2,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from ip-10-142-244-252.ec2.internal (localhost [127.0.0.1]) by lists.linaro.org (Postfix) with ESMTP id 02B286514F; Sat, 11 Apr 2015 09:52:21 +0000 (UTC) X-Original-To: lng-odp@lists.linaro.org Delivered-To: lng-odp@lists.linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id E901A64FBD; Wed, 1 Apr 2015 15:30:24 +0000 (UTC) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by lists.linaro.org (Postfix) with ESMTPS id DAB0764F46 for ; Wed, 1 Apr 2015 15:30:00 +0000 (UTC) Received: by wiaa2 with SMTP id a2so71186176wia.0 for ; Wed, 01 Apr 2015 08:30:00 -0700 (PDT) X-Received: by 10.180.103.166 with SMTP id fx6mr15913969wib.4.1427902199961; Wed, 01 Apr 2015 08:29:59 -0700 (PDT) Received: from localhost ([2001:41d0:a:3cb4::1]) by mx.google.com with ESMTPSA id jy7sm25942215wid.22.2015.04.01.08.29.58 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 01 Apr 2015 08:29:59 -0700 (PDT) Date: Wed, 1 Apr 2015 16:29:56 +0100 From: Stuart Haslam To: Maxim Uvarov Message-ID: <20150401152956.GA16425@localhost> References: <1427895059-13898-1-git-send-email-maxim.uvarov@linaro.org> <20150401135736.GA13254@localhost> <551C0517.7030200@linaro.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <551C0517.7030200@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Topics: patch Cc: petri.savolainen@nokia.com, lng-odp@lists.linaro.org Subject: Re: [lng-odp] [PATCH] linux-generic: sched fix polling input queue X-BeenThere: lng-odp@lists.linaro.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Errors-To: lng-odp-bounces@lists.linaro.org Sender: "lng-odp" X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: stuart.haslam@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.46 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 On Wed, Apr 01, 2015 at 05:47:51PM +0300, Maxim Uvarov wrote: > On 04/01/15 16:57, Stuart Haslam wrote: > >On Wed, Apr 01, 2015 at 04:30:59PM +0300, Maxim Uvarov wrote: > >>Commit: > >> 8fa64fb9fc4e43fbd9d3c226ed89229863bdb771 > >> linux-generic: scheduler: restructured queue and pktio integration > >>Implement race with schedule termination and polling input queue. > >>This patch locks pktio while doing poll to prevent destroy linked > >>queue in the middle of this poll. > >> > >Admittedly I've not looked in detail at the terminate sequence after the > >scheduler changes, so don't really understand what you're fixing, but > >this feels like a workaround rather than a fix. Shouldn't the pktin > >queue have been removed from the scheduler before it's closed? What's > >the sequence that leads to the problem? > as I understand that right one thread goes to > schedule()->pktin_poll(sched_cmd->pe)) > then successful do: > odp_pktio_recv(entry->s.handle, pkt_tbl, QUEUE_MULTI_MAX); > > after that other thread calls terminate and bellow in the pktio_poll code: > queue_enq_multi(qentry, hdr_tbl, num_enq); > > qentry here is corrupted due to it has been destroyed by other thread. > I see from the trace that you can reproduce this with the odp_pktio.c validation test, which is single threaded so there's no thread race. > Because of qentry is linked to pktio entry we have to lock pktio > entry for that > queue to make sure that is was not modified while pktin_poll execution. > > I did make check from the root to find that problem (it occurs about > 1 of 10 run times). > > I sent back trace some to mailing list some time ago: > > ore was generated by `./test/validation/odp_pktio'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000000411dc4 in odp_atomic_fetch_inc_u32 (atom=0x2baaaadfff00) > at ./include/odp/atomic.h:70 > 70 return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); > (gdb) bt > #0 0x0000000000411dc4 in odp_atomic_fetch_inc_u32 (atom=0x2baaaadfff00) > at ./include/odp/atomic.h:70 > #1 0x0000000000411e8a in odp_ticketlock_lock > (ticketlock=0x2baaaadfff00) at odp_ticketlock.c:28 > #2 0x000000000040f0f8 in queue_enq_multi (queue=0x2baaaadfff00, > buf_hdr=0x7fff1fccb0b0, num=1) at odp_queue.c:376 > #3 0x000000000040987d in pktin_poll (entry=0x2aaaab200600) at > odp_packet_io.c:713 > #4 0x0000000000410378 in schedule (out_queue=0x0, > out_ev=0x7fff1fccb1d8, max_num=1, max_deq=4) at odp_schedule.c:455 > #5 0x000000000041050a in schedule_loop (out_queue=0x0, wait=1, > out_ev=0x7fff1fccb1d8, max_num=1, max_deq=4) at odp_schedule.c:518 > #6 0x00000000004105a4 in odp_schedule (out_queue=0x0, wait=1) at > odp_schedule.c:551 > #7 0x0000000000402b83 in destroy_inq (pktio=0x2) at odp_pktio.c:320 So the problem sequence is this; 1. destroy_inq() calls odp_pktio_inq_remdef() 2. odp_pktio_inq_remdef() sets the pktio's inq_default to ODP_QUEUE_INVALID and does nothing to remove the queue from scheduler (not sure if it should?) 3. destroy_inq() calls odp_schedule() 4. odp_schedule() dequeues the event to poll the pktin queue and then calls pktin_poll() 5. pktin_poll() attempts to fetch some packets from the pktin, and if it receives any, attempts to enqueue them to using inq_default, which by this point is ODP_QUEUE_INVALID. So there's a fundamental breakage. I'm not sure yet how it should be fixed but this patch will make it go away; The race is that you'll only get the crash if some packets were actually received between the last time the validation test itself called odp_schedule() and calling destroy_inq(). You'd never see it on the linux-generic "loop" interface as traffic doesn't just mysteriously appear there. diff --git a/platform/linux-generic/odp_packet_io.c b/platform/linux-generic/odp_packet_io.c index 15335d1..ae1de3c 100644 --- a/platform/linux-generic/odp_packet_io.c +++ b/platform/linux-generic/odp_packet_io.c @@ -685,6 +685,9 @@ int pktin_poll(pktio_entry_t *entry) if (odp_unlikely(is_free(entry))) return -1; + if (entry->s.inq_default == ODP_QUEUE_INVALID) + return -1; + num = odp_pktio_recv(entry->s.handle, pkt_tbl, QUEUE_MULTI_MAX); if (num < 0) {