From patchwork Wed Nov 19 17:07:28 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefano Stabellini X-Patchwork-Id: 41201 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f70.google.com (mail-ee0-f70.google.com [74.125.83.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 412E6241C9 for ; Wed, 19 Nov 2014 17:22:04 +0000 (UTC) Received: by mail-ee0-f70.google.com with SMTP id b57sf1012941eek.5 for ; Wed, 19 Nov 2014 09:22:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:in-reply-to:message-id :references:user-agent:mime-version:cc:subject:precedence:list-id :list-unsubscribe:list-post:list-help:list-subscribe:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:list-archive:content-type:content-transfer-encoding; bh=grVoYd1PKe5DN+Ep8JRI535lC1iyE4fflbWn+WGHaKQ=; b=Ey/7izNmsGaWHFswCS//aWklUDE/ybJTUN7LZSa8tn3ciS48SfhaNymqY4rodJsCZO vd9zNfrVxCphrSz5QpwDjhiataOWFrIZuXEp7bE7SJFZaNmF0O3vN2cA/UhP6h9FMN2I 95GkVgJj82jj6ewHNoatSc/gig2e+oVojYGsV86qjbpWBTM3FQefIrJxKrwvakNMOg1w jFEB9JaA6Vl3AajckZvGE1O1ToU3hSWQ1dvRkX7Lt9ZLKAsJ8eN3RAYLx+tbsFjLaZUZ oSWWtoF82wV9Sg+9IQvyKYGOmYaWmyzOxQFNfvyPphHHrBPXUwenhLCM9OTwwbtyZk+E SKNw== X-Gm-Message-State: ALoCoQkXWkPTuw6J11QY29WPIU+6WQXV5cymnPI60UHPaEfhFby43uvl/61anTGzDMLPvpFqHiqv X-Received: by 10.112.146.104 with SMTP id tb8mr251679lbb.22.1416417723411; Wed, 19 Nov 2014 09:22:03 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.26.72 with SMTP id j8ls1106098lag.105.gmail; Wed, 19 Nov 2014 09:22:03 -0800 (PST) X-Received: by 10.112.200.34 with SMTP id jp2mr43709339lbc.1.1416417723212; Wed, 19 Nov 2014 09:22:03 -0800 (PST) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com. [209.85.215.53]) by mx.google.com with ESMTPS id v1si2448140lag.49.2014.11.19.09.22.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Nov 2014 09:22:03 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) client-ip=209.85.215.53; Received: by mail-la0-f53.google.com with SMTP id pn19so928004lab.12 for ; Wed, 19 Nov 2014 09:22:03 -0800 (PST) X-Received: by 10.112.52.37 with SMTP id q5mr6448570lbo.32.1416417723033; Wed, 19 Nov 2014 09:22:03 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp123773lbc; Wed, 19 Nov 2014 09:22:02 -0800 (PST) X-Received: by 10.224.60.196 with SMTP id q4mr53912244qah.63.1416417721150; Wed, 19 Nov 2014 09:22:01 -0800 (PST) Received: from lists.xen.org (lists.xen.org. [50.57.142.19]) by mx.google.com with ESMTPS id r8si2651565qaj.16.2014.11.19.09.22.00 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 19 Nov 2014 09:22:01 -0800 (PST) Received-SPF: none (google.com: xen-devel-bounces@lists.xen.org does not designate permitted sender hosts) client-ip=50.57.142.19; Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Xr8vN-0005RE-Ds; Wed, 19 Nov 2014 17:20:13 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Xr8vL-0005R4-At for xen-devel@lists.xen.org; Wed, 19 Nov 2014 17:20:11 +0000 Received: from [193.109.254.147] by server-9.bemta-14.messagelabs.com id FF/68-02712-A41DC645; Wed, 19 Nov 2014 17:20:10 +0000 X-Env-Sender: Stefano.Stabellini@citrix.com X-Msg-Ref: server-6.tower-27.messagelabs.com!1416417608!13550031!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n X-StarScan-Received: X-StarScan-Version: 6.12.4; banners=-,-,- X-VirusChecked: Checked Received: (qmail 26604 invoked from network); 19 Nov 2014 17:20:09 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-6.tower-27.messagelabs.com with RC4-SHA encrypted SMTP; 19 Nov 2014 17:20:09 -0000 X-IronPort-AV: E=Sophos;i="5.07,418,1413244800"; d="scan'208";a="194470421" Received: from ukmail1.uk.xensource.com (10.80.16.128) by smtprelay.citrix.com (10.13.107.79) with Microsoft SMTP Server id 14.3.181.6; Wed, 19 Nov 2014 12:07:49 -0500 Received: from kaball.uk.xensource.com ([10.80.2.59]) by ukmail1.uk.xensource.com with esmtp (Exim 4.69) (envelope-from ) id 1Xr8jN-00077U-1B; Wed, 19 Nov 2014 17:07:49 +0000 Date: Wed, 19 Nov 2014 17:07:28 +0000 From: Stefano Stabellini X-X-Sender: sstabellini@kaball.uk.xensource.com To: Andrii Tseglytskyi In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 X-DLP: MIA1 Cc: Julien Grall , "xen-devel@lists.xen.org" , Ian Campbell , Stefano Stabellini Subject: Re: [Xen-devel] Xen 4.5 random freeze question X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Post: , List-Help: , List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: stefano.stabellini@eu.citrix.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Archive: I think that's OK: it looks like that on your board for some reasons when UIE is set you get irq 1023 (spurious interrupt) instead of your normal maintenance interrupt. But everything should work anyway without issues. This is the same patch as before but on top of the lastest xen-unstable tree. Please confirm if it works. On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > I got this strange log: > > (XEN) received maintenance interrupt irq=1023 > > And platform does not hang due to this: > + hcr = GICH[GICH_HCR]; > + if ( hcr & GICH_HCR_UIE ) > + { > + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > + uie_on = 1; > + } > > On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini > wrote: > > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini > >> wrote: > >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi > >> >> wrote: > >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini > >> >> > wrote: > >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >> >>> Hi Stefano, > >> >> >>> > >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini > >> >> >>> wrote: > >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >> >>> >> Hi Stefano, > >> >> >>> >> > >> >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> >> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > >> >> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > >> >> >>> >> > > else > >> >> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > >> >> >>> >> > > > >> >> >>> >> > > } > >> >> >>> >> > > >> >> >>> >> > Yes, exactly > >> >> >>> >> > >> >> >>> >> I tried, hang still occurs with this change > >> >> >>> > > >> >> >>> > We need to figure out why during the hang you still have all the LRs > >> >> >>> > busy even if you are getting maintenance interrupts that should cause > >> >> >>> > them to be cleared. > >> >> >>> > > >> >> >>> > >> >> >>> I see that I have free LRs during maintenance interrupt > >> >> >>> > >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt > >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0 > >> >> >>> (XEN) HW_LR[0]=9a015856 > >> >> >>> (XEN) HW_LR[1]=0 > >> >> >>> (XEN) HW_LR[2]=0 > >> >> >>> (XEN) HW_LR[3]=0 > >> >> >>> (XEN) Inflight irq=86 lr=0 > >> >> >>> (XEN) Inflight irq=2 lr=255 > >> >> >>> (XEN) Pending irq=2 > >> >> >>> > >> >> >>> But I see that after I got hang - maintenance interrupts are generated > >> >> >>> continuously. Platform continues printing the same log till reboot. > >> >> >> > >> >> >> Exactly the same log? As in the one above you just pasted? > >> >> >> That is very very suspicious. > >> >> > > >> >> > Yes exactly the same log. And looks like it means that LRs are flushed > >> >> > correctly. > >> >> > > >> >> >> > >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and > >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a > >> >> >> new maintenance interrupt immediately causing an infinite loop. > >> >> >> > >> >> > > >> >> > Yes, this is what I'm thinking about. Taking in account all collected > >> >> > debug info it looks like once LRs are overloaded with SGIs - > >> >> > maintenance interrupt occurs. > >> >> > And then it is not handled properly, and occurs again and again - so > >> >> > platform hangs inside its handler. > >> >> > > >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on > >> >> >> hypervisor entry. > >> >> >> > >> >> > > >> >> > Now trying. > >> >> > > >> >> >> > >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >> >> index 4d2a92d..6ae8dc4 100644 > >> >> >> --- a/xen/arch/arm/gic.c > >> >> >> +++ b/xen/arch/arm/gic.c > >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) > >> >> >> if ( is_idle_vcpu(v) ) > >> >> >> return; > >> >> >> > >> >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >> >> + > >> >> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); > >> >> >> > >> >> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void) > >> >> >> > >> >> >> gic_restore_pending_irqs(current); > >> >> >> > >> >> >> - > >> >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> >> >> GICH[GICH_HCR] |= GICH_HCR_UIE; > >> >> >> - else > >> >> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> >> >> - > >> >> >> } > >> >> >> > >> >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) > >> >> > > >> >> > >> >> Heh - I don't see hangs with this patch :) But also I see that > >> >> maintenance interrupt doesn't occur (and no hang as result) > >> >> Stefano - is this expected? > >> > > >> > No maintenance interrupts at all? That's strange. You should be > >> > receiving them when LRs are full and you still have interrupts pending > >> > to be added to them. > >> > > >> > You could add another printk here to see if you should be receiving > >> > them: > >> > > >> > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> > + { > >> > + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); > >> > GICH[GICH_HCR] |= GICH_HCR_UIE; > >> > - else > >> > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> > - > >> > + } > >> > } > >> > > >> > >> Requested properly: > >> > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> > >> But does not occur > > > > OK, let's see what's going on then by printing the irq number of the > > maintenance interrupt: > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 4d2a92d..fed3167 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -55,6 +55,7 @@ static struct { > > static DEFINE_PER_CPU(uint64_t, lr_mask); > > > > static uint8_t nr_lrs; > > +static bool uie_on; > > #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1)) > > > > /* The GIC mapping of CPU interfaces does not necessarily match the > > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v) > > { > > int i = 0; > > unsigned long flags; > > + unsigned long hcr; > > > > /* The idle domain has no LRs to be cleared. Since gic_restore_state > > * doesn't write any LR registers for the idle domain they could be > > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > > > + hcr = GICH[GICH_HCR]; > > + if ( hcr & GICH_HCR_UIE ) > > + { > > + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > > + uie_on = 1; > > + } > > + > > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) > > intack = GICC[GICC_IAR]; > > irq = intack & GICC_IA_IRQ; > > > > + if ( uie_on ) > > + { > > + uie_on = 0; > > + printk("received maintenance interrupt irq=%d\n", irq); > > + } > > if ( likely(irq >= 16 && irq < 1021) ) > > { > > local_irq_enable(); > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(&v->arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); - else - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)