From patchwork Thu Aug 4 16:41:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Fischofer X-Patchwork-Id: 73287 Delivered-To: patch@linaro.org Received: by 10.140.29.52 with SMTP id a49csp1489471qga; Thu, 4 Aug 2016 09:42:10 -0700 (PDT) X-Received: by 10.55.75.76 with SMTP id y73mr8038921qka.12.1470328930020; Thu, 04 Aug 2016 09:42:10 -0700 (PDT) Return-Path: Received: from lists.linaro.org (lists.linaro.org. [54.225.227.206]) by mx.google.com with ESMTP id 29si8766252qtr.34.2016.08.04.09.42.09; Thu, 04 Aug 2016 09:42:09 -0700 (PDT) Received-SPF: pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) client-ip=54.225.227.206; Authentication-Results: mx.google.com; spf=pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) smtp.mailfrom=lng-odp-bounces@lists.linaro.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id 03FD460748; Thu, 4 Aug 2016 16:42:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on ip-10-142-244-252 X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, URIBL_BLOCKED autolearn=disabled version=3.4.0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by lists.linaro.org (Postfix) with ESMTP id 60EE960734; Thu, 4 Aug 2016 16:42:02 +0000 (UTC) X-Original-To: lng-odp@lists.linaro.org Delivered-To: lng-odp@lists.linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id AF68460738; Thu, 4 Aug 2016 16:41:59 +0000 (UTC) Received: from mail-yw0-f169.google.com (mail-yw0-f169.google.com [209.85.161.169]) by lists.linaro.org (Postfix) with ESMTPS id 6B13360731 for ; Thu, 4 Aug 2016 16:41:58 +0000 (UTC) Received: by mail-yw0-f169.google.com with SMTP id z8so254069331ywa.1 for ; Thu, 04 Aug 2016 09:41:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=+ARwZiXcnyml1Iifzn+TOZh4ApKpr2uIIQGHtOftLgQ=; b=kBZZNZKOqiPWuA7/frqaH/fEKT+kjrcU+ueo/Bh/DqbRYHCiPFMh5c4hbrqS3YVzwd PCTixE421oNxkMhVicuhequHk6GSME3wAO8awG81zQiw0Hx0q54nU4+QAYY5emHmlbJJ 4djSWmLC2FH0UHhWDF3w0HEAO/0YzkKWfFRRlAq7bwxvZDjPJvielf57jdjzsLaNv68f pXwa/83pj0aME2SYYI7pWonYoU8smwHwmuZMkp9itTnpj2KsJk77h9y03PzQF347WsI6 hVyJTYczmarjTeXmoLuHwIwel8OTAh7mZzvY2m4f1vKOhD2eUSDVgIjWWXOwLlEcfiDy Wtjw== X-Gm-Message-State: AEkooutlpe+HOYbuUHZo5KQm/oolRkDx7IlIVIaQqPsqTkwNqjupx8MC4ns6xbc43eKGAUrSUfJRGvFjnSMCzIYFeJo= X-Received: by 10.13.204.22 with SMTP id o22mr53918013ywd.7.1470328918094; Thu, 04 Aug 2016 09:41:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.47.198 with HTTP; Thu, 4 Aug 2016 09:41:57 -0700 (PDT) In-Reply-To: <20160804161746.GA115829@ubuntu> References: <1470316694-17100-1-git-send-email-mike.holmes@linaro.org> <20160804152621.GA70275@ubuntu> <20160804161746.GA115829@ubuntu> From: Bill Fischofer Date: Thu, 4 Aug 2016 11:41:57 -0500 Message-ID: To: Brian Brooks X-Content-Filtered-By: Mailman/MimeDel 2.1.16 X-Topics: patch Cc: LNG ODP Mailman List Subject: Re: [lng-odp] [PATCH] test:linux-generic: run odp_scheduling in process mode X-BeenThere: lng-odp@lists.linaro.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: "The OpenDataPlane \(ODP\) List" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: lng-odp-bounces@lists.linaro.org Sender: "lng-odp" On Thu, Aug 4, 2016 at 11:17 AM, Brian Brooks wrote: > On 08/04 11:01:09, Bill Fischofer wrote: > > On Thu, Aug 4, 2016 at 10:59 AM, Mike Holmes > wrote: > > > > > > > > > > > On 4 August 2016 at 11:47, Bill Fischofer > > > wrote: > > > > > >> > > >> On Thu, Aug 4, 2016 at 10:36 AM, Mike Holmes > > >> wrote: > > >> > > >>> On my vanilla x86 I don't get any issues, keen to get this in and > have > > >>> CI run it on lots of HW to see what happens, many of the other tests > > >>> completely fail in process mode so we will expose a lot as we add > them I > > >>> think. > > >>> > > >>> On 4 August 2016 at 11:33, Bill Fischofer > > > >>> wrote: > > >>> > > >>>> > > >>>> > > >>>> On Thu, Aug 4, 2016 at 10:26 AM, Brian Brooks < > brian.brooks@linaro.org> > > >>>> wrote: > > >>>> > > >>>>> Reviewed-by: Brian Brooks > > >>>>> > > >>>>> On 08/04 09:18:14, Mike Holmes wrote: > > >>>>> > +ret=0 > > >>>>> > + > > >>>>> > +run() > > >>>>> > +{ > > >>>>> > + echo odp_scheduling_run_proc starts with $1 worker threads > > >>>>> > + echo ===================================================== > > >>>>> > + > > >>>>> > + $PERFORMANCE/odp_scheduling${EXEEXT} --odph_proc -c $1 || > > >>>>> ret=1 > > >>>>> > +} > > >>>>> > + > > >>>>> > +run 1 > > >>>>> > +run 8 > > >>>>> > + > > >>>>> > +exit $ret > > >>>>> > > >>>>> Seeing this randomly in both multithread and multiprocess modes: > > >>>>> > > >>>> > > >>>> Before or after you apply this patch? What environment are you > seeing > > >>>> these errors in. They should definitely not be happening. > > >>>> > > >>>> > > >>>>> > > >>>>> ../../../odp/platform/linux-generic/odp_queue.c:328:odp_ > queue_destroy():queue > > >>>>> "sched_00_07" not empty > > >>>>> ../../../odp/platform/linux-generic/odp_schedule.c:271: > schedule_term_global():Queue > > >>>>> not empty > > >>>>> ../../../odp/platform/linux-generic/odp_schedule.c:294: > schedule_term_global():Pool > > >>>>> destroy fail. > > >>>>> ../../../odp/platform/linux-generic/odp_init.c:188:_odp_ > term_global():ODP > > >>>>> schedule term failed. > > >>>>> ../../../odp/platform/linux-generic/odp_queue.c:170:odp_ > queue_term_global():Not > > >>>>> destroyed queue: sched_00_07 > > >>>>> ../../../odp/platform/linux-generic/odp_init.c:195:_odp_ > term_global():ODP > > >>>>> queue term failed. > > >>>>> ../../../odp/platform/linux-generic/odp_pool.c:149:odp_ > pool_term_global():Not > > >>>>> destroyed pool: odp_sched_pool > > >>>>> ../../../odp/platform/linux-generic/odp_pool.c:149:odp_ > pool_term_global():Not > > >>>>> destroyed pool: msg_pool > > >>>>> ../../../odp/platform/linux-generic/odp_init.c:202:_odp_ > term_global():ODP > > >>>>> buffer pool term failed. > > >>>>> ~/odp_incoming/odp_build/test/common_plat/performance$ echo $? > > >>>>> 0 > > >>>>> > > >>>>> > > >> Looks like we have a real issue that somehow creeped into master. I > can > > >> sporadically reproduce these same errors on my x86 system. It looks > like > > >> this is also present in the monarch_lts branch. > > >> > > > > > > > > > I think that we agreed that Monarch would not support Process mode > becasue > > > we never tested for it, but for TgrM we need to start fixing it. > > > > > > > Unfortunately the issue Brian identified has nothing to do with process > > mode. This happens in regular pthread mode on all levels past v1.10.0.0 > as > > far as I can see. > > The issue seems to emerge only under high event rates. The application asks > for more work, but none will be scheduled. However, there actually will be > work in the queue. So, the teardown will fail because the queue is not > empty. > There may be a disconnect between the scheduling and the queueing or some > other synchronization related bug. I think I've seen something similar on > an ARM platform, so it may be architecture independent. > Well, now that I'm trying to find the root issue, it's proving elusive. I was able to get this failure in < 10 runs before but now it doesn't want to show itself. If you can repro this more readily, can you get a core dump of the failure? I've been running with this patch: --- >From ba1fa0eb943fa7a3a3c9202b9e5bf5fc2ed5d1f4 Mon Sep 17 00:00:00 2001 From: Bill Fischofer Date: Thu, 4 Aug 2016 11:38:01 -0500 Subject: [PATCH] debug: abort on cleanup errors Signed-off-by: Bill Fischofer --- test/performance/odp_scheduling.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) -- 2.7.4 diff --git a/test/performance/odp_scheduling.c b/test/performance/odp_scheduling.c index c575b70..7505df9 100644 --- a/test/performance/odp_scheduling.c +++ b/test/performance/odp_scheduling.c @@ -785,6 +785,7 @@ int main(int argc, char *argv[]) char cpumaskstr[ODP_CPUMASK_STR_SIZE]; odp_pool_param_t params; int ret = 0; + int rc = 0; odp_instance_t instance; odph_odpthread_params_t thr_params; @@ -953,15 +954,17 @@ int main(int argc, char *argv[]) for (j = 0; j < QUEUES_PER_PRIO; j++) { queue = globals->queue[i][j]; - odp_queue_destroy(queue); + rc += odp_queue_destroy(queue); } } - odp_shm_free(shm); - odp_queue_destroy(plain_queue); - odp_pool_destroy(pool); - odp_term_local(); - odp_term_global(instance); + rc += odp_shm_free(shm); + rc += odp_queue_destroy(plain_queue); + rc += odp_pool_destroy(pool); + rc += odp_term_local(); + rc += odp_term_global(instance); + if (rc != 0) + abort(); return ret; }