diff mbox

[RFC,4/4] xen/pvhvm: Make MSI IRQs work after kexec

Message ID alpine.DEB.2.02.1407211504390.2295@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini July 21, 2014, 2:13 p.m. UTC
On Wed, 16 Jul 2014, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> > 
> > > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> > >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> > >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> > >> fails as we have no IRQ mapping information for that. Requesting for new
> > >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> > >> we don't recieve these IRQs.
> > >
> > > receive
> > >
> > 
> > Thanks for your comments!
> 
> Thank you for quick turnaround with the answers!
> > 
> > > How come '__write_msi_msg' does not result in new MSI IRQs?
> > >
> > 
> > Actually that was the hidden question in my RFC :-)
> > 
> > Let me describe what I see. When normal boot is performed we have the
> > following in xen_hvm_setup_msi_irqs():
> > 
> > __read_msi_msg()
> >  pirq -> 0
> > 
> > then we allocate new pirq with
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 54
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=54 --> irq=72
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> > 
> > After kexec we see the following:
> > __read_msi_msg()
> >  pirq -> 54
> > 
> > but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 55
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=55 --> irq=75
> > 
> > However (afaict) mapping in xen wasn't updated:
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
> 
> I am wondering if that is related to in QEMU traditional:
> 
>     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
> 
> (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
> 
> If you have that patch in, is the PIRQ value correctly updated?
> 
> > 
> > > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> > > from the device and updating the internal PIRQ<->IRQ code to match
> > > with the reality?
> > >
> > 
> > Yea, 'always trust the device'.
> > 
> > >> 
> > >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> > >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> > >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> > >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> > >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> > >
> > > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> > > At least that is how the commit you mentioned worked.
> > >
> > 
> > I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> > xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> > xen_allocate_pirq_msi() which brings us new pirq.
> > 
> > > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> > > that we might be called twice by a buggy driver. As such we want to check
> > > our PIRQ<->IRQ to figure this out.
> > 
> > But if we're called twice we'll see the same pirq, right? Or there are
> 
> Good point.
> > some cases when we see 'crap' instead of pirq here?
> 
> For PCI passthrough devices they will be zero until they are enabled.
> But I am not sure about the emulated devices, such as e1000 or such, which
> would also go through this path (I think - do we have MSI devices that
> we emulate in QEMU?)
> 
> > 
> > I think it would be nice to use the same pirq after kexec instead of
> > allocating a new one even in case we can make remapping work.
> 
> I concur.
> 
> Stefano, do you recall why you used xen_irq_from_pirq instead of just
> trusting the 'pirq' value? Was it to workaround broken QEMU?

If I recall correctly the problem is that pirq == 0 is a valid pirq
number.  So the check pirq <= 0 is wrong.

Can we rely on the fact that msg.data is always 0 on first read?  If so,
then we could simply:

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..d824743 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -231,8 +231,7 @@  static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		__read_msi_msg(msidesc, &msg);
 		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
 			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
+		if (msg.data != XEN_PIRQ_MSI_DATA) {
 			pirq = xen_allocate_pirq_msi(dev, msidesc);
 			if (pirq < 0) {
 				irq = -ENODEV;