diff mbox

generic: Add the exception case checking routine for ppi interrupt

Message ID 1472530639-21616-1-git-send-email-majun258@huawei.com
State New
Headers show

Commit Message

majun (F) Aug. 30, 2016, 4:17 a.m. UTC
From: Ma Jun <majun258@huawei.com>


During system booting, if the interrupt which has no action registered
is triggered, it would cause system panic when try to access the
action member.

Signed-off-by: Ma Jun <majun258@huawei.com>

---
 kernel/irq/chip.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

-- 
1.7.1

Comments

majun (F) Aug. 30, 2016, 10:35 a.m. UTC | #1
在 2016/8/30 16:50, Marc Zyngier 写道:
> On 30/08/16 05:17, MaJun wrote:

>> From: Ma Jun <majun258@huawei.com>

>>

>> During system booting, if the interrupt which has no action registered

>> is triggered, it would cause system panic when try to access the

>> action member.

> 

> And why would that interrupt be enabled? If you enable a PPI before

> registering a handler, you're doing something wrong.

> 


Actually,the problem described above happened during the capture kernel booting.

In my system, sometimes there is a pending physical timer interrupt(30)
when the first kernel panic and the status is kept until the capture kernel booting.

So, this interrupt will be handled during capture kernel booting.

Becasue we use virt timer interrupt but not physical timer interrupt in capture kernel,
the interrupt 30 has no action handler.

Besides, I think we need to do exception check in this function just
like "handle_fasteoi_irq" does.

Thanks
MaJun

> Thanks,

> 

> 	M.

>
Mark Rutland Aug. 30, 2016, 11:21 a.m. UTC | #2
On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
> +Mark

> On 30/08/16 11:35, majun (F) wrote:

> > 在 2016/8/30 16:50, Marc Zyngier 写道:

> >> On 30/08/16 05:17, MaJun wrote:

> >>> From: Ma Jun <majun258@huawei.com>

> >>>

> >>> During system booting, if the interrupt which has no action registered

> >>> is triggered, it would cause system panic when try to access the

> >>> action member.

> >>

> >> And why would that interrupt be enabled? If you enable a PPI before

> >> registering a handler, you're doing something wrong.

> > 

> > Actually,the problem described above happened during the capture

> > kernel booting.

> > 

> > In my system, sometimes there is a pending physical timer

> > interrupt(30) when the first kernel panic and the status is kept

> > until the capture kernel booting.

> 

> And that's perfectly fine. The interrupt can be pending forever, as it

> shouldn't get enabled.

> 

> > So, this interrupt will be handled during capture kernel booting.

> 

> Why? Who enables it?

> 

> > Becasue we use virt timer interrupt but not physical timer interrupt

> > in capture kernel, the interrupt 30 has no action handler.

> 

> Again: who enables this interrupt? Whichever driver enables it should be

> fixed.


I'm also at a loss.

In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
arch_timer_register(), we'll only request_percpu_irq the virt PPI.
arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
enable_percpu_irq() the virt PPI.

We don't fiddle with arch_timer_uses_ppi after calling
arch_timer_register(). So I can't see how we could enable another IRQ in
this case.

Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
we've succesfully requested, so it doesnt' seem like there's an issue
there.

From a quick look at teh GIC driver, it looks like we reset PPIs
correctly, so it doesn't look like we have a "latent enable".

Thanks,
Mark.
majun (F) Aug. 31, 2016, 6:35 a.m. UTC | #3
Hi Marc & Mark:

在 2016/8/30 19:21, Mark Rutland 写道:
> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:

>> +Mark

>> On 30/08/16 11:35, majun (F) wrote:

>>> 在 2016/8/30 16:50, Marc Zyngier 写道:

>>>> On 30/08/16 05:17, MaJun wrote:

>>>>> From: Ma Jun <majun258@huawei.com>

>>>>>

>>>>> During system booting, if the interrupt which has no action registered

>>>>> is triggered, it would cause system panic when try to access the

>>>>> action member.

>>>>

>>>> And why would that interrupt be enabled? If you enable a PPI before

>>>> registering a handler, you're doing something wrong.

>>>

>>> Actually,the problem described above happened during the capture

>>> kernel booting.

>>>

>>> In my system, sometimes there is a pending physical timer

>>> interrupt(30) when the first kernel panic and the status is kept

>>> until the capture kernel booting.

>>

>> And that's perfectly fine. The interrupt can be pending forever, as it

>> shouldn't get enabled.

>>

>>> So, this interrupt will be handled during capture kernel booting.

>>

>> Why? Who enables it?

>>

>>> Becasue we use virt timer interrupt but not physical timer interrupt

>>> in capture kernel, the interrupt 30 has no action handler.

>>

>> Again: who enables this interrupt? Whichever driver enables it should be

>> fixed.

> 

> I'm also at a loss.

> 

> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in

> arch_timer_register(), we'll only request_percpu_irq the virt PPI.

> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi

> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only

> enable_percpu_irq() the virt PPI.

> 

> We don't fiddle with arch_timer_uses_ppi after calling

> arch_timer_register(). So I can't see how we could enable another IRQ in

> this case.

> 

> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what

> we've succesfully requested, so it doesnt' seem like there's an issue

> there.

> 

>>From a quick look at teh GIC driver, it looks like we reset PPIs

> correctly, so it doesn't look like we have a "latent enable".

> 


I just checked the status of irq 30 during capture kernel booting.

The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
Because irq 30 triggered only 1 time during capture kernel booting,
I think this problem maybe happened in the case like:
1:irq 30 triggered, but not acked by cpu yet.
2:local_irq_disable() called
3:system reboot -->capture kernel booting
4:local_irq_enable()
5:irq 30 acked by CPU.

Is this case possible?

Thanks
MaJun

> Thanks,

> Mark.

> 

> .

>
majun (F) Sept. 1, 2016, 8:15 a.m. UTC | #4
在 2016/8/31 16:35, Marc Zyngier 写道:
> On 31/08/16 07:35, majun (F) wrote:

[...]
>>>

>>

>> I just checked the status of irq 30 during capture kernel booting.

>>

>> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.

>> Because irq 30 triggered only 1 time during capture kernel booting,

>> I think this problem maybe happened in the case like:

>> 1:irq 30 triggered, but not acked by cpu yet.

>> 2:local_irq_disable() called

>> 3:system reboot -->capture kernel booting

>> 4:local_irq_enable()

>> 5:irq 30 acked by CPU.

>>

>> Is this case possible?

> 

> I can't see how, because you've missed:

> 

> 3b: All PPIs are disabled as each CPU comes up

> 

> So for (5) to occur, I can only see two possibilities:

> (a) either something else is enabling the timer PPI


I checked the whole process, the irq 30 alway keeping disabled.

> (b) your GIC doesn't correctly retire a pending PPI that is being disabled


According to our hardware guy said, GIC in our system has problem in this case.
Usually, when we mask irq 30, the interrupt which in pending status but not acked by cpu
should be released/cleared by hardware, but actually, we did't do like this in our system.

So, this conclusion just same as you assumption.

Do you have any suggestion or workaround for this problem?

Thanks!
Majun

> 

> I'm discounting (b) because I can't see how the system would work

> otherwise, so (a) must be happening somehow.

> 



> Thanks,

> 

> 	M.

>
diff mbox

Patch

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8114d06..9a0e872 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -766,11 +766,23 @@  handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
  */
 void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 {
-	struct irq_chip *chip = irq_desc_get_chip(desc);
-	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
+	struct irq_chip *chip = NULL;
+	struct irqaction *action;
+	void *dev_id;
 	irqreturn_t res;
 
+	action = desc->action;
+
+	/* Unexpected interrupt in some execption case
+	 * we just send eoi to end this interrupt
+	 */
+	if (unlikely(!action)) {
+		mask_irq(desc);
+		goto out;
+	}
+	dev_id = raw_cpu_ptr(action->percpu_dev_id);
+
+	chip = irq_desc_get_chip(desc);
 	kstat_incr_irqs_this_cpu(irq, desc);
 
 	if (chip->irq_ack)
@@ -779,7 +791,7 @@  void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 	trace_irq_handler_entry(irq, action);
 	res = action->handler(irq, dev_id);
 	trace_irq_handler_exit(irq, action, res);
-
+out:
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);
 }