From patchwork Wed Jan 28 17:27:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julien Grall X-Patchwork-Id: 43882 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f198.google.com (mail-lb0-f198.google.com [209.85.217.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 0117223FFF for ; Wed, 28 Jan 2015 17:30:29 +0000 (UTC) Received: by mail-lb0-f198.google.com with SMTP id l4sf11802412lbv.1 for ; Wed, 28 Jan 2015 09:30:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:message-id:date:from:user-agent :mime-version:to:references:in-reply-to:cc:subject:precedence :list-id:list-unsubscribe:list-post:list-help:list-subscribe :content-type:content-transfer-encoding:sender:errors-to :x-original-sender:x-original-authentication-results:mailing-list :list-archive; bh=+eK/AUmeryJbOyLc000f9jLMgDGJ6b5FkJDRzUSQ/C8=; b=OAguNgFiGwF1/zzjMJjt5JvSTOeIe7NWNDptVfiWcGM0TaLOYo9MygOGI6V3AZSBDL XnwjHOGiGMr8c01c7nD8x4oxEBwmpRb+9NwP/B33JkNITxBNim0wrtaLlX9LaE9tPGZ1 E300v4QovVHNiqc69P+N2m336WmEIwMHhFBozAgvtXOtZizncp5yloy5/Gjvu4yM/sW/ ybo9JAldFgqUc+zdHViH0E2lEVTuDmd0/ZEnD95E/ZC1doYYA/qobIkOeR2aaIFSw9h1 3lkvNruK5WSRGKnZexhw58HkLuzB1Y+B85GcD9a0M+f/dKgHUNyt/a4heyFx7rWiE759 A9Kg== X-Gm-Message-State: ALoCoQnWWS9NUH4LUK91748ekkaSbz9E+lMs+hNTZWH6Mdigh0T+jz79sD68pIJu1bpqKgIHs0Jk X-Received: by 10.152.36.168 with SMTP id r8mr1208501laj.2.1422466227533; Wed, 28 Jan 2015 09:30:27 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.20.132 with SMTP id n4ls205156lae.6.gmail; Wed, 28 Jan 2015 09:30:27 -0800 (PST) X-Received: by 10.112.156.132 with SMTP id we4mr9637542lbb.59.1422466227294; Wed, 28 Jan 2015 09:30:27 -0800 (PST) Received: from mail-la0-f52.google.com (mail-la0-f52.google.com. [209.85.215.52]) by mx.google.com with ESMTPS id xz9si5041817lbb.82.2015.01.28.09.30.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 28 Jan 2015 09:30:27 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.52 as permitted sender) client-ip=209.85.215.52; Received: by mail-la0-f52.google.com with SMTP id ge10so20375331lab.11 for ; Wed, 28 Jan 2015 09:30:26 -0800 (PST) X-Received: by 10.112.188.227 with SMTP id gd3mr9753422lbc.22.1422466226731; Wed, 28 Jan 2015 09:30:26 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.35.133 with SMTP id h5csp1511196lbj; Wed, 28 Jan 2015 09:30:26 -0800 (PST) X-Received: by 10.195.17.225 with SMTP id gh1mr9588982wjd.37.1422466225860; Wed, 28 Jan 2015 09:30:25 -0800 (PST) Received: from lists.xen.org (lists.xen.org. [50.57.142.19]) by mx.google.com with ESMTPS id fn9si5327705wib.56.2015.01.28.09.30.25 (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 28 Jan 2015 09:30:25 -0800 (PST) Received-SPF: none (google.com: xen-devel-bounces@lists.xen.org does not designate permitted sender hosts) client-ip=50.57.142.19; Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YGWPc-0006PD-MD; Wed, 28 Jan 2015 17:28:20 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YGWPb-0006P8-Lp for xen-devel@lists.xen.org; Wed, 28 Jan 2015 17:28:19 +0000 Received: from [193.109.254.147] by server-5.bemta-14.messagelabs.com id B0/36-03170-33C19C45; Wed, 28 Jan 2015 17:28:19 +0000 X-Env-Sender: julien.grall@linaro.org X-Msg-Ref: server-12.tower-27.messagelabs.com!1422466097!12151491!1 X-Originating-IP: [209.85.212.176] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 6.12.5; banners=-,-,- X-VirusChecked: Checked Received: (qmail 1903 invoked from network); 28 Jan 2015 17:28:18 -0000 Received: from mail-wi0-f176.google.com (HELO mail-wi0-f176.google.com) (209.85.212.176) by server-12.tower-27.messagelabs.com with RC4-SHA encrypted SMTP; 28 Jan 2015 17:28:18 -0000 Received: by mail-wi0-f176.google.com with SMTP id bs8so5325493wib.3 for ; Wed, 28 Jan 2015 09:28:17 -0800 (PST) X-Received: by 10.180.207.211 with SMTP id ly19mr9146460wic.73.1422466097815; Wed, 28 Jan 2015 09:28:17 -0800 (PST) Received: from [10.80.2.139] ([185.25.64.249]) by mx.google.com with ESMTPSA id vq9sm6981142wjc.6.2015.01.28.09.28.16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Jan 2015 09:28:17 -0800 (PST) Message-ID: <54C91C15.7030709@linaro.org> Date: Wed, 28 Jan 2015 17:27:49 +0000 From: Julien Grall User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 MIME-Version: 1.0 To: David Vrabel , Wei Liu References: <54C7B6E8.9080106@linaro.org> <20150127164539.GJ24026@zion.uk.xensource.com> <54C7C131.9030502@linaro.org> <20150127165312.GK24026@zion.uk.xensource.com> <54C91229.8090104@linaro.org> <54C91712.3020806@citrix.com> In-Reply-To: <54C91712.3020806@citrix.com> Cc: Ian Campbell , xen-devel Subject: Re: [Xen-devel] rcu_sched self-detect stall when disable vif device X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Post: , List-Help: , List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: julien.grall@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.52 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Archive: On 28/01/15 17:06, David Vrabel wrote: > On 28/01/15 16:45, Julien Grall wrote: >> On 27/01/15 16:53, Wei Liu wrote: >>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote: >>>> On 27/01/15 16:45, Wei Liu wrote: >>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote: >>>>>> Hi, >>>>>> >>>>>> While I'm working on support for 64K page in netfront, I got >>>>>> an rcu_sced self-detect message. It happens when netback is >>>>>> disabling the vif device due to an error. >>>>>> >>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why >>>>>> the processor is stucked in xenvif_rx_queue_purge? >>>>>> >>>>> >>>>> When you try to release a SKB, core network driver need to enter some >>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu. >>>> >>>> But this message shouldn't happen in normal condition or because of >>>> netfront. Right? >>>> >>> >>> Never saw report like this before, even in the case that netfront is >>> buggy. >> >> This is only happening when preemption is not enabled (i.e >> CONFIG_PREEMPT_NONE in the config file) in the backend kernel. >> >> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned >> into an infinite loop. In my case, the code executed looks like: >> >> >> 1. for (;;) { >> 2. xenvif_wait_for_rx_work(queue); >> 3. >> 4. if (kthread_should_stop()) >> 5. break; >> 6. >> 7. if (unlikely(vif->disabled && queue->id == 0) { >> 8. xenvif_carrier_off(vif); >> 9. xenvif_rx_queue_purge(queue); >> 10. continue; >> 11. } >> 12. } >> >> The wait on line 2 will return directly because the vif is disabled >> (see xenvif_have_rx_work) >> >> We are on queue 0, so the condition on line 7 is true. Therefore we will >> loop on line 10. And so on... >> >> On platform where preemption is not enabled, this thread will never >> yield/give the hand to another thread (unless the domain is destroyed). > > I'm not sure why we have a continue in the vif->disabled case and not > just a break. Can you try that? So I applied this small patches: While I don't get anymore message rcu_sched stall, when I destroy the guest, the backend hits a NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffff800000a50000 [00000000] *pgd=00000083de82a003, *pud=00000083de82b003, *pmd=00000083de82c003, *pte=00600000e1110707 Internal error: Oops: 96000006 [#1] SMP Modules linked in: CPU: 4 PID: 34 Comm: xenwatch Not tainted 3.19.0-rc5-xen-seattle+ #13 Hardware name: AMD Seattle (RevA) Development Board (Overdrive) (DT) task: ffff80001ea39480 ti: ffff80001ea78000 task.ti: ffff80001ea78000 PC is at exit_creds+0x18/0x70 LR is at __put_task_struct+0x3c/0xd4 pc : [] lr : [] pstate: 80000145 sp : ffff80001ea7bc50 x29: ffff80001ea7bc50 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: ffff80001eb3c840 x23: ffff80001eb3c840 x22: 000000000006c560 x21: ffff0000011f7000 x20: 0000000000000000 x19: ffff80001ba06680 x18: 0000ffffd2635bd0 x17: 0000ffff839e4074 x16: 00000000deadbeef x15: ffffffffffffffff x14: 0ffffffffffffffe x13: 0000000000000028 x12: 0000000000000010 x11: 0000000000000030 x10: 0101010101010101 x9 : ffff80001ea7b8e0 x8 : ffff7c01cf6e2740 x7 : 0000000000000000 x6 : 0000000000002fc9 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff80001ba06690 x1 : 0000000000000000 x0 : 0000000000000000 Process xenwatch (pid: 34, stack limit = 0xffff80001ea78058) Stack: (0xffff80001ea7bc50 to 0xffff80001ea7c000) bc40: 1ea7bc70 ffff8000 00094990 ffff8000 bc60: 1ba06680 ffff8000 008b45a8 ffff8000 1ea7bc90 ffff8000 000b15f0 ffff8000 bc80: 1ba06680 ffff8000 005bcab8 ffff8000 1ea7bcc0 ffff8000 00541efc ffff8000 bca0: 011ed000 ffff0000 00000000 00000000 011f7000 ffff0000 00000006 00000000 bcc0: 1ea7bd00 ffff8000 00540984 ffff8000 1ce23680 ffff8000 00000006 00000000 bce0: 00752cf0 ffff8000 00000001 00000000 00752e38 ffff8000 1ea7bd98 ffff8000 bd00: 1ea7bd40 ffff8000 00540bcc ffff8000 1ce23680 ffff8000 1cce0c00 ffff8000 bd20: 00000000 00000000 1cce0c00 ffff8000 009b0288 ffff8000 1ea7be20 ffff8000 bd40: 1ea7bd70 ffff8000 0048011c ffff8000 1ce23700 ffff8000 1cf71000 ffff8000 bd60: 009a6258 ffff8000 00a36d38 00000000 1ea7bdb0 ffff8000 00480ea4 ffff8000 bd80: 1b89d800 ffff8000 009a62b0 ffff8000 009a6258 ffff8000 00a36d38 ffff8000 bda0: 00a36e30 ffff8000 0047f7c0 ffff8000 1ea7bdc0 ffff8000 0047f82c ffff8000 bdc0: 1ea7be30 ffff8000 000b1064 ffff8000 1ea48cc0 ffff8000 009dbfe8 ffff8000 bde0: 008552d8 ffff8000 00000000 00000000 0047f778 ffff8000 00000000 00000000 be00: 1ea7be30 ffff8000 00000000 ffff8000 1ea39480 ffff8000 000c75f8 ffff8000 be20: 1ea7be20 ffff8000 1ea7be20 ffff8000 00000000 00000000 00085930 ffff8000 be40: 000b0f88 ffff8000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000 be60: 00000000 00000000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000 be80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bea0: 1ea7bea0 ffff8000 1ea7bea0 ffff8000 00000000 ffff8000 00000000 00000000 bec0: 1ea7bec0 ffff8000 1ea7bec0 ffff8000 00000000 00000000 00000000 00000000 bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000 bfe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call trace: [] exit_creds+0x18/0x70 [] __put_task_struct+0x38/0xd4 [] kthread_stop+0xc0/0x130 [] xenvif_disconnect+0x58/0xd0 [] set_backend_state+0x134/0x278 [] frontend_changed+0x8c/0xec [] xenbus_otherend_changed+0x9c/0xa4 [] frontend_changed+0xc/0x18 [] xenwatch_thread+0xb0/0x140 [] kthread+0xd8/0xf0 Code: f9000bf3 aa0003f3 f9422401 f9422000 (b9400021) ---[ end trace af11d521ee530da8 ]--- Regards, Reported-by: Julien Grall Tested-by: Julien Grall diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 908e65e..9448c6c 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data) if (unlikely(vif->disabled && queue->id == 0)) { xenvif_carrier_off(vif); xenvif_rx_queue_purge(queue); - continue; + break; } if (!skb_queue_empty(&queue->rx_queue))