From patchwork Wed Nov 19 16:50:37 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Stabellini X-Patchwork-Id: 41196 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-la0-f72.google.com (mail-la0-f72.google.com [209.85.215.72]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 1BC22241C9 for ; Wed, 19 Nov 2014 16:53:49 +0000 (UTC) Received: by mail-la0-f72.google.com with SMTP id mc6sf699637lab.3 for ; Wed, 19 Nov 2014 08:53:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:in-reply-to:message-id :references:user-agent:mime-version:cc:subject:precedence:list-id :list-unsubscribe:list-post:list-help:list-subscribe:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:list-archive:content-type:content-transfer-encoding; bh=bIu0s7umLZBobQnpHaMPdAYR2vCP7oFPjG2ryaLBv5M=; b=ArD5smsAsdbu0ESF15hd77fFT6bckp+bE+IJ6h0hbr6EPP9PlsID4a8gRBTh1h/3iu verWsIPgcgJ2XdZOza4D/YsOuWU5QwthggaYVEzgD76u+RYIywvqwi9/bbfIZqYcUiJA cJ4ai0BjOcBtd/qFfTfh8U8o/FoMMnihYXPmR13FNEg7QHjLlWK01Y+ZunB1huZEUnSg MIHpnr3O6CFZesxsBofufhg1Izr5jhQpcobsR69sgaT4uh3yLax8z9Mkt3wzLXklzY2x WCl3Mq6m8B3WCuHFwwGsGRVgF7ApxSiGVE5qBYMDgEwRL8yayN4eUSjBy7jlwxv66RVA j4eg== X-Gm-Message-State: ALoCoQnYG7TWLLxNLlutTALkGRzLvXY/nj6y0luQf9mnhyoMQLdlpwmIG2AnAo2H8meosppx/QeD X-Received: by 10.194.176.106 with SMTP id ch10mr677949wjc.6.1416416028063; Wed, 19 Nov 2014 08:53:48 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.28.67 with SMTP id z3ls1461624lag.74.gmail; Wed, 19 Nov 2014 08:53:47 -0800 (PST) X-Received: by 10.112.171.6 with SMTP id aq6mr6410080lbc.28.1416416027561; Wed, 19 Nov 2014 08:53:47 -0800 (PST) Received: from mail-la0-f41.google.com (mail-la0-f41.google.com. [209.85.215.41]) by mx.google.com with ESMTPS id w2si407429laz.35.2014.11.19.08.53.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Nov 2014 08:53:47 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) client-ip=209.85.215.41; Received: by mail-la0-f41.google.com with SMTP id gf13so858526lab.14 for ; Wed, 19 Nov 2014 08:53:47 -0800 (PST) X-Received: by 10.152.42.226 with SMTP id r2mr6465039lal.29.1416416027456; Wed, 19 Nov 2014 08:53:47 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp118099lbc; Wed, 19 Nov 2014 08:53:46 -0800 (PST) X-Received: by 10.220.116.197 with SMTP id n5mr498334vcq.48.1416416025953; Wed, 19 Nov 2014 08:53:45 -0800 (PST) Received: from lists.xen.org (lists.xen.org. [50.57.142.19]) by mx.google.com with ESMTPS id j1si816668vdk.104.2014.11.19.08.53.43 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 19 Nov 2014 08:53:45 -0800 (PST) Received-SPF: none (google.com: xen-devel-bounces@lists.xen.org does not designate permitted sender hosts) client-ip=50.57.142.19; Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Xr8U3-0004SQ-F0; Wed, 19 Nov 2014 16:51:59 +0000 Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Xr8U2-0004SL-8P for xen-devel@lists.xen.org; Wed, 19 Nov 2014 16:51:58 +0000 Received: from [85.158.143.35] by server-2.bemta-4.messagelabs.com id 28/AD-25276-DAACC645; Wed, 19 Nov 2014 16:51:57 +0000 X-Env-Sender: Stefano.Stabellini@citrix.com X-Msg-Ref: server-12.tower-21.messagelabs.com!1416415913!13954555!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n X-StarScan-Received: X-StarScan-Version: 6.12.4; banners=-,-,- X-VirusChecked: Checked Received: (qmail 8883 invoked from network); 19 Nov 2014 16:51:56 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-12.tower-21.messagelabs.com with RC4-SHA encrypted SMTP; 19 Nov 2014 16:51:56 -0000 X-IronPort-AV: E=Sophos;i="5.07,418,1413244800"; d="scan'208";a="194463619" Received: from ukmail1.uk.xensource.com (10.80.16.128) by smtprelay.citrix.com (10.13.107.79) with Microsoft SMTP Server id 14.3.181.6; Wed, 19 Nov 2014 11:50:59 -0500 Received: from kaball.uk.xensource.com ([10.80.2.59]) by ukmail1.uk.xensource.com with esmtp (Exim 4.69) (envelope-from ) id 1Xr8T4-0006qh-51; Wed, 19 Nov 2014 16:50:58 +0000 Date: Wed, 19 Nov 2014 16:50:37 +0000 From: Stefano Stabellini X-X-Sender: sstabellini@kaball.uk.xensource.com To: Andrii Tseglytskyi In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 X-DLP: MIA1 Cc: Julien Grall , "xen-devel@lists.xen.org" , Ian Campbell , Stefano Stabellini Subject: Re: [Xen-devel] Xen 4.5 random freeze question X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Post: , List-Help: , List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: stefano.stabellini@eu.citrix.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Archive: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini > wrote: > > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi > >> wrote: > >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini > >> > wrote: > >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >>> Hi Stefano, > >> >>> > >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini > >> >>> wrote: > >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >>> >> Hi Stefano, > >> >>> >> > >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > >> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > >> >>> >> > > else > >> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > >> >>> >> > > > >> >>> >> > > } > >> >>> >> > > >> >>> >> > Yes, exactly > >> >>> >> > >> >>> >> I tried, hang still occurs with this change > >> >>> > > >> >>> > We need to figure out why during the hang you still have all the LRs > >> >>> > busy even if you are getting maintenance interrupts that should cause > >> >>> > them to be cleared. > >> >>> > > >> >>> > >> >>> I see that I have free LRs during maintenance interrupt > >> >>> > >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt > >> >>> (XEN) GICH_LRs (vcpu 0) mask=0 > >> >>> (XEN) HW_LR[0]=9a015856 > >> >>> (XEN) HW_LR[1]=0 > >> >>> (XEN) HW_LR[2]=0 > >> >>> (XEN) HW_LR[3]=0 > >> >>> (XEN) Inflight irq=86 lr=0 > >> >>> (XEN) Inflight irq=2 lr=255 > >> >>> (XEN) Pending irq=2 > >> >>> > >> >>> But I see that after I got hang - maintenance interrupts are generated > >> >>> continuously. Platform continues printing the same log till reboot. > >> >> > >> >> Exactly the same log? As in the one above you just pasted? > >> >> That is very very suspicious. > >> > > >> > Yes exactly the same log. And looks like it means that LRs are flushed > >> > correctly. > >> > > >> >> > >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and > >> >> something we do in Xen, maybe writing to an LR register, might trigger a > >> >> new maintenance interrupt immediately causing an infinite loop. > >> >> > >> > > >> > Yes, this is what I'm thinking about. Taking in account all collected > >> > debug info it looks like once LRs are overloaded with SGIs - > >> > maintenance interrupt occurs. > >> > And then it is not handled properly, and occurs again and again - so > >> > platform hangs inside its handler. > >> > > >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on > >> >> hypervisor entry. > >> >> > >> > > >> > Now trying. > >> > > >> >> > >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >> index 4d2a92d..6ae8dc4 100644 > >> >> --- a/xen/arch/arm/gic.c > >> >> +++ b/xen/arch/arm/gic.c > >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) > >> >> if ( is_idle_vcpu(v) ) > >> >> return; > >> >> > >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >> + > >> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); > >> >> > >> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > >> >> @@ -821,12 +823,8 @@ void gic_inject(void) > >> >> > >> >> gic_restore_pending_irqs(current); > >> >> > >> >> - > >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> >> GICH[GICH_HCR] |= GICH_HCR_UIE; > >> >> - else > >> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >> - > >> >> } > >> >> > >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) > >> > > >> > >> Heh - I don't see hangs with this patch :) But also I see that > >> maintenance interrupt doesn't occur (and no hang as result) > >> Stefano - is this expected? > > > > No maintenance interrupts at all? That's strange. You should be > > receiving them when LRs are full and you still have interrupts pending > > to be added to them. > > > > You could add another printk here to see if you should be receiving > > them: > > > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > > + { > > + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); > > GICH[GICH_HCR] |= GICH_HCR_UIE; > > - else > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > > - > > + } > > } > > > > Requested properly: > > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > > But does not occur OK, let's see what's going on then by printing the irq number of the maintenance interrupt: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..fed3167 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -55,6 +55,7 @@ static struct { static DEFINE_PER_CPU(uint64_t, lr_mask); static uint8_t nr_lrs; +static bool uie_on; #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1)) /* The GIC mapping of CPU interfaces does not necessarily match the @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v) { int i = 0; unsigned long flags; + unsigned long hcr; /* The idle domain has no LRs to be cleared. Since gic_restore_state * doesn't write any LR registers for the idle domain they could be @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; + hcr = GICH[GICH_HCR]; + if ( hcr & GICH_HCR_UIE ) + { + GICH[GICH_HCR] &= ~GICH_HCR_UIE; + uie_on = 1; + } + spin_lock_irqsave(&v->arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) intack = GICC[GICC_IAR]; irq = intack & GICC_IA_IRQ; + if ( uie_on ) + { + uie_on = 0; + printk("received maintenance interrupt irq=%d\n", irq); + } if ( likely(irq >= 16 && irq < 1021) ) { local_irq_enable();