diff mbox

[Xen-devel,1/3] x86/xen: Introduce a global flag to fix the MSI mask bug

Message ID 1414377878-23497-2-git-send-email-wangyijing@huawei.com
State New
Headers show

Commit Message

wangyijing Oct. 27, 2014, 2:44 a.m. UTC
Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
fixed MSI mask bug which may cause kernel crash. But the commit
made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
And the commit 0e4ccb1505a9 will be reverted in the later patch.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
---
 arch/x86/pci/xen.c  |    2 ++
 drivers/pci/msi.c   |    7 ++++++-
 include/linux/msi.h |    1 +
 3 files changed, 9 insertions(+), 1 deletions(-)

Comments

David Vrabel Oct. 27, 2014, 11:09 a.m. UTC | #1
On 27/10/14 02:44, Yijing Wang wrote:
> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> fixed MSI mask bug which may cause kernel crash. But the commit
> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> And the commit 0e4ccb1505a9 will be reverted in the later patch.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

In the sense that it keeps the odd Xen behaviour.  But...

Konrad, why was this fixed like this in the first place?  IMO, it would
have been better to get Xen to trap-and-emulate accesses to the relevant
MSI/MSI-X registers.  The mask/unmask on setup/teardown isn't
performance critical.

David
Konrad Rzeszutek Wilk Oct. 27, 2014, 7:45 p.m. UTC | #2
On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote:
> On 27/10/14 02:44, Yijing Wang wrote:
> > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> > fixed MSI mask bug which may cause kernel crash. But the commit
> > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> > And the commit 0e4ccb1505a9 will be reverted in the later patch.
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> In the sense that it keeps the odd Xen behaviour.  But...
> 
> Konrad, why was this fixed like this in the first place?  IMO, it would

As 0e4ccb1505a9 explains:
    PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
    
    Certain platforms do not allow writes in the MSI-X BARs to setup or tear
    down vector values.  To combat against the generic code trying to write to
    that and either silently being ignored or crashing due to the pagetables
    being marked R/O this patch introduces a platform override.
    
    Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
    and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
    and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
    code.
    
    For Xen, which does not allow the guest to write to MSI-X tables - as the
    hypervisor is solely responsible for setting the vector values - we
    implement two nops.
    
    This fixes a Xen guest crash when passing a PCI device with MSI-X to the
    guest.  See the bugzilla for more details.
    
    [bhelgaas: add bugzilla info]
    Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581

> have been better to get Xen to trap-and-emulate accesses to the relevant
> MSI/MSI-X registers.  The mask/unmask on setup/teardown isn't
> performance critical.
> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
David Vrabel Oct. 28, 2014, 4:44 p.m. UTC | #3
On 27/10/14 19:45, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote:
>> On 27/10/14 02:44, Yijing Wang wrote:
>>> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
>>> fixed MSI mask bug which may cause kernel crash. But the commit
>>> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
>>> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
>>> And the commit 0e4ccb1505a9 will be reverted in the later patch.
>>
>> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
>>
>> In the sense that it keeps the odd Xen behaviour.  But...
>>
>> Konrad, why was this fixed like this in the first place?  IMO, it would
> 
> As 0e4ccb1505a9 explains:
>     PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
>     
>     Certain platforms do not allow writes in the MSI-X BARs to setup or tear
>     down vector values.  To combat against the generic code trying to write to
>     that and either silently being ignored or crashing due to the pagetables
>     being marked R/O this patch introduces a platform override.
>     
>     Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
>     and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
>     and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
>     code.
>     
>     For Xen, which does not allow the guest to write to MSI-X tables - as the
>     hypervisor is solely responsible for setting the vector values - we
>     implement two nops.

My question specifically was: could this be fixed by allowing writes to
set/clear the mask bit?  If so, why was this not done and could we do
this now?

David
Konrad Rzeszutek Wilk Oct. 28, 2014, 4:57 p.m. UTC | #4
On Tue, Oct 28, 2014 at 04:44:44PM +0000, David Vrabel wrote:
> On 27/10/14 19:45, Konrad Rzeszutek Wilk wrote:
> > On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote:
> >> On 27/10/14 02:44, Yijing Wang wrote:
> >>> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> >>> fixed MSI mask bug which may cause kernel crash. But the commit
> >>> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> >>> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> >>> And the commit 0e4ccb1505a9 will be reverted in the later patch.
> >>
> >> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> >>
> >> In the sense that it keeps the odd Xen behaviour.  But...
> >>
> >> Konrad, why was this fixed like this in the first place?  IMO, it would
> > 
> > As 0e4ccb1505a9 explains:
> >     PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
> >     
> >     Certain platforms do not allow writes in the MSI-X BARs to setup or tear
> >     down vector values.  To combat against the generic code trying to write to
> >     that and either silently being ignored or crashing due to the pagetables
> >     being marked R/O this patch introduces a platform override.
> >     
> >     Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
> >     and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
> >     and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
> >     code.
> >     
> >     For Xen, which does not allow the guest to write to MSI-X tables - as the
> >     hypervisor is solely responsible for setting the vector values - we
> >     implement two nops.
> 
> My question specifically was: could this be fixed by allowing writes to
> set/clear the mask bit?  If so, why was this not done and could we do

No.
> this now?
> 
> David
Bjorn Helgaas Nov. 11, 2014, 12:04 a.m. UTC | #5
On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote:
> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> fixed MSI mask bug which may cause kernel crash. But the commit
> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> And the commit 0e4ccb1505a9 will be reverted in the later patch.

The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs.
But it makes mask_irq a no-op for both MSI-X and MSI.  The MSI mask bit is
in config space, not in memory space.  So why does mask_irq need to be a
no-op for MSI as well?  Are Xen guests prohibited from writing to config
space, too?  (It's fine if they are; it's just that the changelog
specifically mentioned MSI-X memory space tables, and it didn't mention
config space at all.)

And I *assume* there's some Xen mechanism that accomplishes the mask_irq in
a different way, since the actual mask_irq interface does nothing?  (This
is really a question for 0e4ccb1505a9, since I don't think this particular
patch changes anything in that respect.)

> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> CC: David Vrabel <david.vrabel@citrix.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: xen-devel@lists.xenproject.org
> ---
>  arch/x86/pci/xen.c  |    2 ++
>  drivers/pci/msi.c   |    7 ++++++-
>  include/linux/msi.h |    1 +
>  3 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 093f5f4..5ef62ed 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -427,6 +427,7 @@ int __init pci_xen_init(void)
>  	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
>  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
>  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
> +	pci_msi_ignore_mask = 1;
>  #endif
>  	return 0;
>  }
> @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void)
>  	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
>  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
>  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
> +	pci_msi_ignore_mask = 1;
>  #endif
>  	xen_setup_acpi_sci();
>  	__acpi_register_gsi = acpi_register_gsi_xen;
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 38511d9..ecb5f54 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -23,6 +23,7 @@
>  #include "pci.h"
>  
>  static int pci_msi_enable = 1;
> +int pci_msi_ignore_mask;
>  
>  #define msix_table_size(flags)	((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
>  
> @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
>  {
>  	u32 mask_bits = desc->masked;
>  
> -	if (!desc->msi_attrib.maskbit)
> +	if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit)
>  		return 0;
>  
>  	mask_bits &= ~mask;
> @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag)
>  	u32 mask_bits = desc->masked;
>  	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
>  						PCI_MSIX_ENTRY_VECTOR_CTRL;
> +
> +	if (pci_msi_ignore_mask)
> +		return 0;
> +
>  	mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  	if (flag)
>  		mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 44f4746..86dc501 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -10,6 +10,7 @@ struct msi_msg {
>  	u32	data;		/* 16 bits of msi message data */
>  };
>  
> +extern int pci_msi_ignore_mask;
>  /* Helper functions */
>  struct irq_data;
>  struct msi_desc;
> -- 
> 1.7.1
>
wangyijing Nov. 11, 2014, 1:49 a.m. UTC | #6
On 2014/11/11 8:04, Bjorn Helgaas wrote:
> On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote:
>> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
>> fixed MSI mask bug which may cause kernel crash. But the commit
>> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
>> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
>> And the commit 0e4ccb1505a9 will be reverted in the later patch.
> 
> The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs.
> But it makes mask_irq a no-op for both MSI-X and MSI.  The MSI mask bit is
> in config space, not in memory space.  So why does mask_irq need to be a
> no-op for MSI as well?  Are Xen guests prohibited from writing to config
> space, too?  (It's fine if they are; it's just that the changelog
> specifically mentioned MSI-X memory space tables, and it didn't mention
> config space at all.)
> 
> And I *assume* there's some Xen mechanism that accomplishes the mask_irq in
> a different way, since the actual mask_irq interface does nothing?  (This
> is really a question for 0e4ccb1505a9, since I don't think this particular
> patch changes anything in that respect.)

Yes, it's another history problem, maybe Konrad know the detail.

> 
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> CC: David Vrabel <david.vrabel@citrix.com>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: xen-devel@lists.xenproject.org
>> ---
>>  arch/x86/pci/xen.c  |    2 ++
>>  drivers/pci/msi.c   |    7 ++++++-
>>  include/linux/msi.h |    1 +
>>  3 files changed, 9 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 093f5f4..5ef62ed 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -427,6 +427,7 @@ int __init pci_xen_init(void)
>>  	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
>>  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
>>  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
>> +	pci_msi_ignore_mask = 1;
>>  #endif
>>  	return 0;
>>  }
>> @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void)
>>  	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
>>  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
>>  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
>> +	pci_msi_ignore_mask = 1;
>>  #endif
>>  	xen_setup_acpi_sci();
>>  	__acpi_register_gsi = acpi_register_gsi_xen;
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 38511d9..ecb5f54 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -23,6 +23,7 @@
>>  #include "pci.h"
>>  
>>  static int pci_msi_enable = 1;
>> +int pci_msi_ignore_mask;
>>  
>>  #define msix_table_size(flags)	((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
>>  
>> @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
>>  {
>>  	u32 mask_bits = desc->masked;
>>  
>> -	if (!desc->msi_attrib.maskbit)
>> +	if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit)
>>  		return 0;
>>  
>>  	mask_bits &= ~mask;
>> @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag)
>>  	u32 mask_bits = desc->masked;
>>  	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
>>  						PCI_MSIX_ENTRY_VECTOR_CTRL;
>> +
>> +	if (pci_msi_ignore_mask)
>> +		return 0;
>> +
>>  	mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
>>  	if (flag)
>>  		mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>> index 44f4746..86dc501 100644
>> --- a/include/linux/msi.h
>> +++ b/include/linux/msi.h
>> @@ -10,6 +10,7 @@ struct msi_msg {
>>  	u32	data;		/* 16 bits of msi message data */
>>  };
>>  
>> +extern int pci_msi_ignore_mask;
>>  /* Helper functions */
>>  struct irq_data;
>>  struct msi_desc;
>> -- 
>> 1.7.1
>>
> 
> .
>
Konrad Rzeszutek Wilk Nov. 11, 2014, 3:45 p.m. UTC | #7
On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote:
> On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote:
> > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> > fixed MSI mask bug which may cause kernel crash. But the commit
> > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> > And the commit 0e4ccb1505a9 will be reverted in the later patch.
> 
> The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs.
> But it makes mask_irq a no-op for both MSI-X and MSI.  The MSI mask bit is
> in config space, not in memory space.  So why does mask_irq need to be a
> no-op for MSI as well?  Are Xen guests prohibited from writing to config

The PV guests it refers do not do write to config space. They have
an PCI frontend and backend driver which communicates to the backend (the
trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c)
is the one that sets up the overrides. When an MSI or MSI-X is requested
it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'.
If you look there you can see:

173         if (type == PCI_CAP_ID_MSIX)                                            
174                 ret = xen_pci_frontend_enable_msix(dev, v, nvec);               
175         else                                                                    
176                 ret = xen_pci_frontend_enable_msi(dev, v);                      
177         if (ret)                                                                
178                 goto error;                                                     

Which are the calls to the Xen PCI driver to communicate with the
backend to setup the MSI.

> space, too?  (It's fine if they are; it's just that the changelog
> specifically mentioned MSI-X memory space tables, and it didn't mention
> config space at all.)

Correct. The config space is accessible to the guest but if it writes
to it - all of those values are ignored by the hypervisor - and that
is because the backend is suppose to communicate to the hypervisor
whether the guest can indeed setup MSI or MSI-X.

> 
> And I *assume* there's some Xen mechanism that accomplishes the mask_irq in
> a different way, since the actual mask_irq interface does nothing?  (This
> is really a question for 0e4ccb1505a9, since I don't think this particular
> patch changes anything in that respect.)

Correct. 'request_irq' ends up doing that. Or rather it ends up
calling xen_setup_msi_irqs which takes care of that.

The Xen PV guests (not to be confused with Xen HVM guests) run without
any emulated devices. That means most of the x86 platform things - ioports,
VGA, etc - are removed. Instead that functionality is provided via 
frontend drivers that communicate to the backend via a ring.

Hopefully this clarifies it?
> 
> > Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> > CC: David Vrabel <david.vrabel@citrix.com>
> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > CC: xen-devel@lists.xenproject.org
> > ---
> >  arch/x86/pci/xen.c  |    2 ++
> >  drivers/pci/msi.c   |    7 ++++++-
> >  include/linux/msi.h |    1 +
> >  3 files changed, 9 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> > index 093f5f4..5ef62ed 100644
> > --- a/arch/x86/pci/xen.c
> > +++ b/arch/x86/pci/xen.c
> > @@ -427,6 +427,7 @@ int __init pci_xen_init(void)
> >  	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
> >  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
> >  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
> > +	pci_msi_ignore_mask = 1;
> >  #endif
> >  	return 0;
> >  }
> > @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void)
> >  	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
> >  	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
> >  	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
> > +	pci_msi_ignore_mask = 1;
> >  #endif
> >  	xen_setup_acpi_sci();
> >  	__acpi_register_gsi = acpi_register_gsi_xen;
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 38511d9..ecb5f54 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -23,6 +23,7 @@
> >  #include "pci.h"
> >  
> >  static int pci_msi_enable = 1;
> > +int pci_msi_ignore_mask;
> >  
> >  #define msix_table_size(flags)	((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
> >  
> > @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
> >  {
> >  	u32 mask_bits = desc->masked;
> >  
> > -	if (!desc->msi_attrib.maskbit)
> > +	if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit)
> >  		return 0;
> >  
> >  	mask_bits &= ~mask;
> > @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag)
> >  	u32 mask_bits = desc->masked;
> >  	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
> >  						PCI_MSIX_ENTRY_VECTOR_CTRL;
> > +
> > +	if (pci_msi_ignore_mask)
> > +		return 0;
> > +
> >  	mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> >  	if (flag)
> >  		mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index 44f4746..86dc501 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -10,6 +10,7 @@ struct msi_msg {
> >  	u32	data;		/* 16 bits of msi message data */
> >  };
> >  
> > +extern int pci_msi_ignore_mask;
> >  /* Helper functions */
> >  struct irq_data;
> >  struct msi_desc;
> > -- 
> > 1.7.1
> >
Bjorn Helgaas Nov. 11, 2014, 8:24 p.m. UTC | #8
On Tue, Nov 11, 2014 at 10:45:38AM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote:
> > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote:
> > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> > > fixed MSI mask bug which may cause kernel crash. But the commit
> > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> > > And the commit 0e4ccb1505a9 will be reverted in the later patch.
> > 
> > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs.
> > But it makes mask_irq a no-op for both MSI-X and MSI.  The MSI mask bit is
> > in config space, not in memory space.  So why does mask_irq need to be a
> > no-op for MSI as well?  Are Xen guests prohibited from writing to config
> 
> The PV guests it refers do not do write to config space. They have
> an PCI frontend and backend driver which communicates to the backend (the
> trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c)
> is the one that sets up the overrides. When an MSI or MSI-X is requested
> it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'.
> If you look there you can see:
> 
> 173         if (type == PCI_CAP_ID_MSIX)
> 174                 ret = xen_pci_frontend_enable_msix(dev, v, nvec);
> 175         else
> 176                 ret = xen_pci_frontend_enable_msi(dev, v);
> 177         if (ret)
> 178                 goto error;
> 
> Which are the calls to the Xen PCI driver to communicate with the
> backend to setup the MSI.
> 
> > space, too?  (It's fine if they are; it's just that the changelog
> > specifically mentioned MSI-X memory space tables, and it didn't mention
> > config space at all.)
> 
> Correct. The config space is accessible to the guest but if it writes
> to it - all of those values are ignored by the hypervisor - and that
> is because the backend is suppose to communicate to the hypervisor
> whether the guest can indeed setup MSI or MSI-X.
> 
> > 
> > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in
> > a different way, since the actual mask_irq interface does nothing?  (This
> > is really a question for 0e4ccb1505a9, since I don't think this particular
> > patch changes anything in that respect.)
> 
> Correct. 'request_irq' ends up doing that. Or rather it ends up
> calling xen_setup_msi_irqs which takes care of that.
> 
> The Xen PV guests (not to be confused with Xen HVM guests) run without
> any emulated devices. That means most of the x86 platform things - ioports,
> VGA, etc - are removed. Instead that functionality is provided via 
> frontend drivers that communicate to the backend via a ring.
> 
> Hopefully this clarifies it?

I think so.  I propose the following changelog.  Let me know if it's still
inaccurate:

  PCI/MSI: Add pci_msi_ignore_mask to prevent MSI/MSI-X BAR writes

  MSI-X vector Mask Bits are in MSI-X Tables in PCI memory space.  Xen guests
  can't write to those tables.  MSI vector Mask Bits are in PCI configuration
  space.  Xen guests can write to config space, but those writes are ignored.

  Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and
  msix_mask_irq()") added a way to override default_mask_msi_irqs() and
  default_mask_msix_irqs() so they can be no-ops in Xen guests, but this is
  more complicated than necessary.

  Add "pci_msi_ignore_mask" in the core PCI MSI code.  If set,
  default_mask_msi_irqs() and default_mask_msix_irqs() return without doing
  anything.  This is less flexible, but much simpler.

I guess you mentioned PV and HVM guests, and it sounds like all this only
applies to HVM guests.

Bjorn
Konrad Rzeszutek Wilk Nov. 11, 2014, 10:04 p.m. UTC | #9
On Tue, Nov 11, 2014 at 01:24:30PM -0700, Bjorn Helgaas wrote:
> On Tue, Nov 11, 2014 at 10:45:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote:
> > > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote:
> > > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()")
> > > > fixed MSI mask bug which may cause kernel crash. But the commit
> > > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask"
> > > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution.
> > > > And the commit 0e4ccb1505a9 will be reverted in the later patch.
> > > 
> > > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs.
> > > But it makes mask_irq a no-op for both MSI-X and MSI.  The MSI mask bit is
> > > in config space, not in memory space.  So why does mask_irq need to be a
> > > no-op for MSI as well?  Are Xen guests prohibited from writing to config
> > 
> > The PV guests it refers do not do write to config space. They have
> > an PCI frontend and backend driver which communicates to the backend (the
> > trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c)
> > is the one that sets up the overrides. When an MSI or MSI-X is requested
> > it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'.
> > If you look there you can see:
> > 
> > 173         if (type == PCI_CAP_ID_MSIX)
> > 174                 ret = xen_pci_frontend_enable_msix(dev, v, nvec);
> > 175         else
> > 176                 ret = xen_pci_frontend_enable_msi(dev, v);
> > 177         if (ret)
> > 178                 goto error;
> > 
> > Which are the calls to the Xen PCI driver to communicate with the
> > backend to setup the MSI.
> > 
> > > space, too?  (It's fine if they are; it's just that the changelog
> > > specifically mentioned MSI-X memory space tables, and it didn't mention
> > > config space at all.)
> > 
> > Correct. The config space is accessible to the guest but if it writes
> > to it - all of those values are ignored by the hypervisor - and that
> > is because the backend is suppose to communicate to the hypervisor
> > whether the guest can indeed setup MSI or MSI-X.
> > 
> > > 
> > > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in
> > > a different way, since the actual mask_irq interface does nothing?  (This
> > > is really a question for 0e4ccb1505a9, since I don't think this particular
> > > patch changes anything in that respect.)
> > 
> > Correct. 'request_irq' ends up doing that. Or rather it ends up
> > calling xen_setup_msi_irqs which takes care of that.
> > 
> > The Xen PV guests (not to be confused with Xen HVM guests) run without
> > any emulated devices. That means most of the x86 platform things - ioports,
> > VGA, etc - are removed. Instead that functionality is provided via 
> > frontend drivers that communicate to the backend via a ring.
> > 
> > Hopefully this clarifies it?
> 
> I think so.  I propose the following changelog.  Let me know if it's still
> inaccurate:
> 
>   PCI/MSI: Add pci_msi_ignore_mask to prevent MSI/MSI-X BAR writes
> 
>   MSI-X vector Mask Bits are in MSI-X Tables in PCI memory space.  Xen guests

Xen PV
>   can't write to those tables.  MSI vector Mask Bits are in PCI configuration
>   space.  Xen guests can write to config space, but those writes are ignored.
> 
>   Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and
>   msix_mask_irq()") added a way to override default_mask_msi_irqs() and
>   default_mask_msix_irqs() so they can be no-ops in Xen guests, but this is

Xen PV guests
>   more complicated than necessary.
> 
>   Add "pci_msi_ignore_mask" in the core PCI MSI code.  If set,
>   default_mask_msi_irqs() and default_mask_msix_irqs() return without doing
>   anything.  This is less flexible, but much simpler.
> 
> I guess you mentioned PV and HVM guests, and it sounds like all this only
> applies to HVM guests.

No, PV guests :-)

HVM guests will do the normal x86 type machinery.
> 
> Bjorn
diff mbox

Patch

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 093f5f4..5ef62ed 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -427,6 +427,7 @@  int __init pci_xen_init(void)
 	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
 	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
 	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
+	pci_msi_ignore_mask = 1;
 #endif
 	return 0;
 }
@@ -508,6 +509,7 @@  int __init pci_xen_initial_domain(void)
 	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
 	x86_msi.msi_mask_irq = xen_nop_msi_mask_irq;
 	x86_msi.msix_mask_irq = xen_nop_msix_mask_irq;
+	pci_msi_ignore_mask = 1;
 #endif
 	xen_setup_acpi_sci();
 	__acpi_register_gsi = acpi_register_gsi_xen;
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 38511d9..ecb5f54 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -23,6 +23,7 @@ 
 #include "pci.h"
 
 static int pci_msi_enable = 1;
+int pci_msi_ignore_mask;
 
 #define msix_table_size(flags)	((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
 
@@ -166,7 +167,7 @@  u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
 {
 	u32 mask_bits = desc->masked;
 
-	if (!desc->msi_attrib.maskbit)
+	if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit)
 		return 0;
 
 	mask_bits &= ~mask;
@@ -198,6 +199,10 @@  u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag)
 	u32 mask_bits = desc->masked;
 	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 						PCI_MSIX_ENTRY_VECTOR_CTRL;
+
+	if (pci_msi_ignore_mask)
+		return 0;
+
 	mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
 	if (flag)
 		mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 44f4746..86dc501 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -10,6 +10,7 @@  struct msi_msg {
 	u32	data;		/* 16 bits of msi message data */
 };
 
+extern int pci_msi_ignore_mask;
 /* Helper functions */
 struct irq_data;
 struct msi_desc;