[RFC] Revert "arm64: PCI: Exclude ACPI "consumer" resources from host bridge windows"

Message ID 20210510234020.1330087-1-luzmaximilian@gmail.com
State New
Headers show
Series
  • [RFC] Revert "arm64: PCI: Exclude ACPI "consumer" resources from host bridge windows"
Related show

Commit Message

Maximilian Luz May 10, 2021, 11:40 p.m.
The Microsoft Surface Pro X has host bridges defined as

    Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: Hardware ID
    Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID

    Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
    {
        Name (RBUF, ResourceTemplate ()
        {
            Memory32Fixed (ReadWrite,
                0x60200000,         // Address Base
                0x01DF0000,         // Address Length
                )
            WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,
                0x0000,             // Granularity
                0x0000,             // Range Minimum
                0x0001,             // Range Maximum
                0x0000,             // Translation Offset
                0x0002,             // Length
                ,, )
        })
        Return (RBUF) /* \_SB_.PCI0._CRS.RBUF */
    }

meaning that the memory resources aren't (explicitly) defined as
"producers", i.e. host bridge windows.

Commit 8fd4391ee717 ("arm64: PCI: Exclude ACPI "consumer" resources from
host bridge windows") introduced a check that removes such resources,
causing BAR allocation failures later on:

    [ 0.150731] pci 0002:00:00.0: BAR 14: no space for [mem size 0x00100000]
    [ 0.150744] pci 0002:00:00.0: BAR 14: failed to assign [mem size 0x00100000]
    [ 0.150758] pci 0002:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit]
    [ 0.150769] pci 0002:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

This eventually prevents the PCIe NVME drive from being accessible.

On x86 we already skip the check for producer/window due to some history
with negligent firmware. It seems that Microsoft is intent on continuing
that history on their ARM devices, so let's drop that check here too.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---

Please note: I am not sure if this is the right way to fix that, e.g. I
don't know if any additional checks like on IA64 or x86 might be
required instead, or if this might break things on other devices. So
please consider this more as a bug report rather than a fix.

Apologies for the re-send, I seem to have unintentionally added a blank
line before the subject.

---
 arch/arm64/kernel/pci.c | 14 --------------
 1 file changed, 14 deletions(-)

Comments

Will Deacon May 26, 2021, 8:58 p.m. | #1
On Tue, May 11, 2021 at 01:40:20AM +0200, Maximilian Luz wrote:
> The Microsoft Surface Pro X has host bridges defined as

> 

>     Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: Hardware ID

>     Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID

> 

>     Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings

>     {

>         Name (RBUF, ResourceTemplate ()

>         {

>             Memory32Fixed (ReadWrite,

>                 0x60200000,         // Address Base

>                 0x01DF0000,         // Address Length

>                 )

>             WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,

>                 0x0000,             // Granularity

>                 0x0000,             // Range Minimum

>                 0x0001,             // Range Maximum

>                 0x0000,             // Translation Offset

>                 0x0002,             // Length

>                 ,, )

>         })

>         Return (RBUF) /* \_SB_.PCI0._CRS.RBUF */

>     }

> 

> meaning that the memory resources aren't (explicitly) defined as

> "producers", i.e. host bridge windows.

> 

> Commit 8fd4391ee717 ("arm64: PCI: Exclude ACPI "consumer" resources from

> host bridge windows") introduced a check that removes such resources,

> causing BAR allocation failures later on:

> 

>     [ 0.150731] pci 0002:00:00.0: BAR 14: no space for [mem size 0x00100000]

>     [ 0.150744] pci 0002:00:00.0: BAR 14: failed to assign [mem size 0x00100000]

>     [ 0.150758] pci 0002:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit]

>     [ 0.150769] pci 0002:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

> 

> This eventually prevents the PCIe NVME drive from being accessible.

> 

> On x86 we already skip the check for producer/window due to some history

> with negligent firmware. It seems that Microsoft is intent on continuing

> that history on their ARM devices, so let's drop that check here too.

> 

> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>

> ---

> 

> Please note: I am not sure if this is the right way to fix that, e.g. I

> don't know if any additional checks like on IA64 or x86 might be

> required instead, or if this might break things on other devices. So

> please consider this more as a bug report rather than a fix.

> 

> Apologies for the re-send, I seem to have unintentionally added a blank

> line before the subject.

> 

> ---

>  arch/arm64/kernel/pci.c | 14 --------------

>  1 file changed, 14 deletions(-)


Adding Lorenzo to cc, as he'll have a much better idea about this than me.

This is:

https://lore.kernel.org/r/20210510234020.1330087-1-luzmaximilian@gmail.com

Will
Lorenzo Pieralisi May 27, 2021, 9:32 a.m. | #2
On Wed, May 26, 2021 at 09:58:36PM +0100, Will Deacon wrote:
> On Tue, May 11, 2021 at 01:40:20AM +0200, Maximilian Luz wrote:

> > The Microsoft Surface Pro X has host bridges defined as

> > 

> >     Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: Hardware ID

> >     Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID

> > 

> >     Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings

> >     {

> >         Name (RBUF, ResourceTemplate ()

> >         {

> >             Memory32Fixed (ReadWrite,

> >                 0x60200000,         // Address Base

> >                 0x01DF0000,         // Address Length

> >                 )

> >             WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,

> >                 0x0000,             // Granularity

> >                 0x0000,             // Range Minimum

> >                 0x0001,             // Range Maximum

> >                 0x0000,             // Translation Offset

> >                 0x0002,             // Length

> >                 ,, )

> >         })

> >         Return (RBUF) /* \_SB_.PCI0._CRS.RBUF */

> >     }

> > 

> > meaning that the memory resources aren't (explicitly) defined as

> > "producers", i.e. host bridge windows.

> > 

> > Commit 8fd4391ee717 ("arm64: PCI: Exclude ACPI "consumer" resources from

> > host bridge windows") introduced a check that removes such resources,

> > causing BAR allocation failures later on:

> > 

> >     [ 0.150731] pci 0002:00:00.0: BAR 14: no space for [mem size 0x00100000]

> >     [ 0.150744] pci 0002:00:00.0: BAR 14: failed to assign [mem size 0x00100000]

> >     [ 0.150758] pci 0002:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit]

> >     [ 0.150769] pci 0002:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

> > 

> > This eventually prevents the PCIe NVME drive from being accessible.

> > 

> > On x86 we already skip the check for producer/window due to some history

> > with negligent firmware. It seems that Microsoft is intent on continuing

> > that history on their ARM devices, so let's drop that check here too.

> > 

> > Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>

> > ---

> > 

> > Please note: I am not sure if this is the right way to fix that, e.g. I

> > don't know if any additional checks like on IA64 or x86 might be

> > required instead, or if this might break things on other devices. So

> > please consider this more as a bug report rather than a fix.

> > 

> > Apologies for the re-send, I seem to have unintentionally added a blank

> > line before the subject.

> > 

> > ---

> >  arch/arm64/kernel/pci.c | 14 --------------

> >  1 file changed, 14 deletions(-)

> 

> Adding Lorenzo to cc, as he'll have a much better idea about this than me.

> 

> This is:

> 

> https://lore.kernel.org/r/20210510234020.1330087-1-luzmaximilian@gmail.com


Sigh. We can't apply this patch since it would trigger regressions on
other platforms (IIUC the root complex registers would end up in the
host bridge memory windows).

I am not keen on reverting commit 8fd4391ee717 because it does the
right thing.

I think this requires a quirk and immediate reporting to Microsoft.

Bjorn, what are your thoughts on this ?

Thanks,
Lorenzo
Maximilian Luz May 27, 2021, 11:31 a.m. | #3
On 5/27/21 11:32 AM, Lorenzo Pieralisi wrote:
> On Wed, May 26, 2021 at 09:58:36PM +0100, Will Deacon wrote:

>> On Tue, May 11, 2021 at 01:40:20AM +0200, Maximilian Luz wrote:

>>> The Microsoft Surface Pro X has host bridges defined as

>>>

>>>      Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: Hardware ID

>>>      Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID

>>>

>>>      Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings

>>>      {

>>>          Name (RBUF, ResourceTemplate ()

>>>          {

>>>              Memory32Fixed (ReadWrite,

>>>                  0x60200000,         // Address Base

>>>                  0x01DF0000,         // Address Length

>>>                  )

>>>              WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,

>>>                  0x0000,             // Granularity

>>>                  0x0000,             // Range Minimum

>>>                  0x0001,             // Range Maximum

>>>                  0x0000,             // Translation Offset

>>>                  0x0002,             // Length

>>>                  ,, )

>>>          })

>>>          Return (RBUF) /* \_SB_.PCI0._CRS.RBUF */

>>>      }

>>>

>>> meaning that the memory resources aren't (explicitly) defined as

>>> "producers", i.e. host bridge windows.

>>>

>>> Commit 8fd4391ee717 ("arm64: PCI: Exclude ACPI "consumer" resources from

>>> host bridge windows") introduced a check that removes such resources,

>>> causing BAR allocation failures later on:

>>>

>>>      [ 0.150731] pci 0002:00:00.0: BAR 14: no space for [mem size 0x00100000]

>>>      [ 0.150744] pci 0002:00:00.0: BAR 14: failed to assign [mem size 0x00100000]

>>>      [ 0.150758] pci 0002:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit]

>>>      [ 0.150769] pci 0002:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

>>>

>>> This eventually prevents the PCIe NVME drive from being accessible.

>>>

>>> On x86 we already skip the check for producer/window due to some history

>>> with negligent firmware. It seems that Microsoft is intent on continuing

>>> that history on their ARM devices, so let's drop that check here too.

>>>

>>> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>

>>> ---

>>>

>>> Please note: I am not sure if this is the right way to fix that, e.g. I

>>> don't know if any additional checks like on IA64 or x86 might be

>>> required instead, or if this might break things on other devices. So

>>> please consider this more as a bug report rather than a fix.

>>>

>>> Apologies for the re-send, I seem to have unintentionally added a blank

>>> line before the subject.

>>>

>>> ---

>>>   arch/arm64/kernel/pci.c | 14 --------------

>>>   1 file changed, 14 deletions(-)

>>

>> Adding Lorenzo to cc, as he'll have a much better idea about this than me.

>>

>> This is:

>>

>> https://lore.kernel.org/r/20210510234020.1330087-1-luzmaximilian@gmail.com

> 

> Sigh. We can't apply this patch since it would trigger regressions on

> other platforms (IIUC the root complex registers would end up in the

> host bridge memory windows).

> 

> I am not keen on reverting commit 8fd4391ee717 because it does the

> right thing.

> 

> I think this requires a quirk and immediate reporting to Microsoft.


Since I wrote this I have found other arm64 devices with the same
problem. I don't think that this is Microsoft exclusive anymore, but
rather that this is a Qualcomm problem (Qualcomm SoC seems to be the
common thread). See e.g. DSDTs in [1]. So it should probably be reported
to them.

Regards,
Max

[1]: https://github.com/aarch64-laptops/build/tree/dfce25bc12655713c7e1e0422b191e9c944e4fb2/misc
Bjorn Helgaas May 27, 2021, 4:34 p.m. | #4
On Thu, May 27, 2021 at 10:32:00AM +0100, Lorenzo Pieralisi wrote:
> On Wed, May 26, 2021 at 09:58:36PM +0100, Will Deacon wrote:

> > On Tue, May 11, 2021 at 01:40:20AM +0200, Maximilian Luz wrote:

> > > The Microsoft Surface Pro X has host bridges defined as

> > > 

> > >     Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: Hardware ID

> > >     Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID

> > > 

> > >     Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings

> > >     {

> > >         Name (RBUF, ResourceTemplate ()

> > >         {

> > >             Memory32Fixed (ReadWrite,

> > >                 0x60200000,         // Address Base

> > >                 0x01DF0000,         // Address Length

> > >                 )

> > >             WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,

> > >                 0x0000,             // Granularity

> > >                 0x0000,             // Range Minimum

> > >                 0x0001,             // Range Maximum

> > >                 0x0000,             // Translation Offset

> > >                 0x0002,             // Length

> > >                 ,, )

> > >         })

> > >         Return (RBUF) /* \_SB_.PCI0._CRS.RBUF */

> > >     }

> > > 

> > > meaning that the memory resources aren't (explicitly) defined as

> > > "producers", i.e. host bridge windows.

> > > 

> > > Commit 8fd4391ee717 ("arm64: PCI: Exclude ACPI "consumer" resources from

> > > host bridge windows") introduced a check that removes such resources,

> > > causing BAR allocation failures later on:

> > > 

> > >     [ 0.150731] pci 0002:00:00.0: BAR 14: no space for [mem size 0x00100000]

> > >     [ 0.150744] pci 0002:00:00.0: BAR 14: failed to assign [mem size 0x00100000]

> > >     [ 0.150758] pci 0002:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit]

> > >     [ 0.150769] pci 0002:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

> > > 

> > > This eventually prevents the PCIe NVME drive from being accessible.

> > > 

> > > On x86 we already skip the check for producer/window due to some history

> > > with negligent firmware. It seems that Microsoft is intent on continuing

> > > that history on their ARM devices, so let's drop that check here too.

> > > 

> > > Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>

> > > ---

> > > 

> > > Please note: I am not sure if this is the right way to fix that, e.g. I

> > > don't know if any additional checks like on IA64 or x86 might be

> > > required instead, or if this might break things on other devices. So

> > > please consider this more as a bug report rather than a fix.

> > > 

> > > Apologies for the re-send, I seem to have unintentionally added a blank

> > > line before the subject.

> > > 

> > > ---

> > >  arch/arm64/kernel/pci.c | 14 --------------

> > >  1 file changed, 14 deletions(-)

> > 

> > Adding Lorenzo to cc, as he'll have a much better idea about this than me.

> > 

> > This is:

> > 

> > https://lore.kernel.org/r/20210510234020.1330087-1-luzmaximilian@gmail.com

> 

> Sigh. We can't apply this patch since it would trigger regressions on

> other platforms (IIUC the root complex registers would end up in the

> host bridge memory windows).

> 

> I am not keen on reverting commit 8fd4391ee717 because it does the

> right thing.

> 

> I think this requires a quirk and immediate reporting to Microsoft.

> 

> Bjorn, what are your thoughts on this ?


In retrospect, I think 8fd4391ee717 (which I wrote), was probably a
mistake.

Sure, it's a nice idea to have PNP0A03 _CRS methods that work nicely
as designed, by describing host bridge registers as "consumer"
resources and host bridge windows as "producer" registers, instead of
having the bridge registers in _CRS of an unrelated PNP0C02 device.

But realistically, the PNP0A03/PNP0C02 issue is a solved problem, even
though it's ugly, and I'm not sure why I thought Microsoft would see
value in doing this differently on arm64 than on x86 and ia64.

What would break if we reverted 8fd4391ee717?  I guess any arm64
platforms that described host bridge register space in PNP0A03 _CRS
"consumer" resources?  And Windows probably doesn't work or isn't
supported on those platforms?

Bjorn
Lorenzo Pieralisi May 27, 2021, 4:56 p.m. | #5
On Thu, May 27, 2021 at 11:34:52AM -0500, Bjorn Helgaas wrote:

[...]

> > > https://lore.kernel.org/r/20210510234020.1330087-1-luzmaximilian@gmail.com

> > 

> > Sigh. We can't apply this patch since it would trigger regressions on

> > other platforms (IIUC the root complex registers would end up in the

> > host bridge memory windows).

> > 

> > I am not keen on reverting commit 8fd4391ee717 because it does the

> > right thing.

> > 

> > I think this requires a quirk and immediate reporting to Microsoft.

> > 

> > Bjorn, what are your thoughts on this ?

> 

> In retrospect, I think 8fd4391ee717 (which I wrote), was probably a

> mistake.

> 

> Sure, it's a nice idea to have PNP0A03 _CRS methods that work nicely

> as designed, by describing host bridge registers as "consumer"

> resources and host bridge windows as "producer" registers, instead of

> having the bridge registers in _CRS of an unrelated PNP0C02 device.

> 

> But realistically, the PNP0A03/PNP0C02 issue is a solved problem, even

> though it's ugly, and I'm not sure why I thought Microsoft would see

> value in doing this differently on arm64 than on x86 and ia64.


We hoped we could comply with the specs, given that we were starting
from a clean slate (and not from ACPI tables cut and paste)

> What would break if we reverted 8fd4391ee717?  I guess any arm64

> platforms that described host bridge register space in PNP0A03 _CRS

> "consumer" resources ?


Yes. We would end up with that register space in the host bridge memory
windows - this does not sound right.

> And Windows probably doesn't work or isn't supported on those

> platforms?


By the look of it the answer is yes, Windows was not bootstrapped on
those platforms given that I *assume* Windows does not discriminate
between producer and consumer resources at all.

Lorenzo

Patch

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 1006ed2d7c60..80f87fe0a2b8 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -94,19 +94,6 @@  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 	return 0;
 }
 
-static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
-{
-	struct resource_entry *entry, *tmp;
-	int status;
-
-	status = acpi_pci_probe_root_resources(ci);
-	resource_list_for_each_entry_safe(entry, tmp, &ci->resources) {
-		if (!(entry->res->flags & IORESOURCE_WINDOW))
-			resource_list_destroy_entry(entry);
-	}
-	return status;
-}
-
 /*
  * Lookup the bus range for the domain in MCFG, and set up config space
  * mapping.
@@ -184,7 +171,6 @@  struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	}
 
 	root_ops->release_info = pci_acpi_generic_release_info;
-	root_ops->prepare_resources = pci_acpi_root_prepare_resources;
 	root_ops->pci_ops = (struct pci_ops *)&ri->cfg->ops->pci_ops;
 	bus = acpi_pci_root_create(root, root_ops, &ri->common, ri->cfg);
 	if (!bus)