mbox series

[v2,00/12] arm64: dts: qcom: sc8280xp: PCIe fixes and GICv3 ITS enable

Message ID 20240223152124.20042-1-johan+linaro@kernel.org
Headers show
Series arm64: dts: qcom: sc8280xp: PCIe fixes and GICv3 ITS enable | expand

Message

Johan Hovold Feb. 23, 2024, 3:21 p.m. UTC
This series addresses a few problems with the sc8280xp PCIe
implementation.

The DWC PCIe controller can either use its internal MSI controller or an
external one such as the GICv3 ITS. Enabling the latter allows for
assigning affinity to individual interrupts, but results in a large
amount of Correctable Errors being logged on both the Lenovo ThinkPad
X13s and the sc8280xp-crd reference design.

It turns out that these errors are always generated, but for some yet to
be determined reason, the AER interrupts are never received when using
the internal MSI controller, which makes the link errors harder to
notice.

On the X13s, there is a large number of errors generated when bringing
up the link on boot. This is related to the fact that UEFI firmware has
already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the
link at Gen3 generates a massive amount of errors until the Wi-Fi
firmware is restarted. This has now also been shown to cause the Wi-Fi
to sometimes not start at all on boot for some users.

A recent commit enabling ASPM on certain Qualcomm platforms introduced
further errors when using the Wi-Fi on the X13s as well as when
accessing the NVMe on the CRD. The exact reason for this has not yet
been identified, but disabling ASPM L0s makes the errors go away. This
could suggest that either the current ASPM implementation is incomplete
or that L0s is not supported with these devices.

Note that the X13s and CRD use the same Wi-Fi controller, but the errors
are only generated on the X13s. The NVMe controller on my X13s does not
support L0s so there are no issues there, unlike on the CRD which uses a
different controller. The modem on the CRD does not generate any errors,
but both the NVMe and modem keeps bouncing in and out of L0s/L1 also
when not used, which could indicate that there are bigger problems with
the ASPM implementation. I don't have a modem on my X13s so I have not
been able to test whether L0s causes any trouble there.

Enabling AER error reporting on sc8280xp could similarly also reveal
existing problems with the related sa8295p and sa8540p platforms as they
share the base dtsi.

After discussing this with Bjorn Andersson at Qualcomm we have decided
to go ahead and disable L0s for all controllers on the CRD and the
X13s.

Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a
significant impact on the power consumption (and there are indications
that this applies generally for L0s on these platforms).

***

As this series fixes a regression in 6.7 (which enabled ASPM) and fixes
a user-reported problem with the Wi-Fi often not starting at boot, I
think we should merge this series for the 6.8 cycle. The final patch
enabling the GIC ITS should wait for 6.9.

The DT bindings and PCI patch are expected to go through the PCI tree,
while Bjorn A takes the devicetree updates through the Qualcomm tree.

Note that the devicetree binding updates will conflict with the
following series which was just merged for 6.9 today:

	https://lore.kernel.org/lkml/513dfb69-aea1-4848-8755-613e03c65843@linaro.org/

The resolution is simply to add 'required-opps' and 'aspm-no-l0s' to the
new 'qcom,pcie-common.yaml' schema file instead of 'qcom,pcie.yaml',
while removing 'msi-map-mask'.

***

Johan


Changes in v2
 - drop RFC from ASPM patches and add stable tags
 - reorder patches and move ITS patch last
 - fix s/GB/MB/ typo in Gen2 speed commit messages
 - fix an incorrect Fixes tag
 - amend commit message X13 wifi link speed patch after user
   confirmation that this fixes the wifi startup issue
 - disable L0s also for modem and wifi on CRD
 - disable L0s also for nvme and modem on X13s


Johan Hovold (12):
  dt-bindings: PCI: qcom: Allow 'required-opps'
  dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
  dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
  PCI: qcom: Add support for disabling ASPM L0s in devicetree
  arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
  arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
  arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
  arm64: dts: qcom: sc8280xp-crd: disable ASPM L0s for NVMe
  arm64: dts: qcom: sc8280xp-crd: disable ASPM L0s for modem and Wi-Fi
  arm64: dts: qcom: sc8280xp-x13s: disable ASPM L0s for Wi-Fi
  arm64: dts: qcom: sc8280xp-x13s: disable ASPM L0s for NVMe and modem
  arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe

 .../devicetree/bindings/pci/qcom,pcie.yaml    |  6 +++++-
 arch/arm64/boot/dts/qcom/sc8280xp-crd.dts     |  7 +++++++
 .../qcom/sc8280xp-lenovo-thinkpad-x13s.dts    |  7 +++++++
 arch/arm64/boot/dts/qcom/sc8280xp.dtsi        | 17 +++++++++++++++-
 drivers/pci/controller/dwc/pcie-qcom.c        | 20 +++++++++++++++++++
 5 files changed, 55 insertions(+), 2 deletions(-)

Comments

Konrad Dybcio Feb. 23, 2024, 9:25 p.m. UTC | #1
On 23.02.2024 16:21, Johan Hovold wrote:
> Limit the WiFi PCIe link speed to Gen2 speed (500 MB/s), which is the
> speed that Windows uses.
> 
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---

Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
Bjorn Helgaas Feb. 28, 2024, 10:08 p.m. UTC | #2
[+to Mani]

On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:
> This series addresses a few problems with the sc8280xp PCIe
> implementation.
> ...

> A recent commit enabling ASPM on certain Qualcomm platforms introduced
> further errors when using the Wi-Fi on the X13s as well as when
> accessing the NVMe on the CRD. The exact reason for this has not yet
> been identified, but disabling ASPM L0s makes the errors go away. This
> could suggest that either the current ASPM implementation is incomplete
> or that L0s is not supported with these devices.
> ...

> As this series fixes a regression in 6.7 (which enabled ASPM) and fixes
> a user-reported problem with the Wi-Fi often not starting at boot, I
> think we should merge this series for the 6.8 cycle. The final patch
> enabling the GIC ITS should wait for 6.9.
> 
> The DT bindings and PCI patch are expected to go through the PCI tree,
> while Bjorn A takes the devicetree updates through the Qualcomm tree.
> ...

> Johan Hovold (12):
>   dt-bindings: PCI: qcom: Allow 'required-opps'
>   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
>   dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
>   PCI: qcom: Add support for disabling ASPM L0s in devicetree

The ASPM patches fix a v6.7 regression, so it would be good to fix
that in v6.8.

Mani, if you are OK with them, I can add them to for-linus for v6.8.  

What about the 'required-opps' and 'msi-map-mask' patches?  If they're
important, I can merge them for v6.8, too, but it's late in the cycle
and it's not clear from the commit logs why they shouldn't wait for
v6.9.

Bjorn
Konrad Dybcio Feb. 28, 2024, 11:30 p.m. UTC | #3
On 2/28/24 23:08, Bjorn Helgaas wrote:
> [+to Mani]
> 
> On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:
>> This series addresses a few problems with the sc8280xp PCIe
>> implementation.
>> ...
> 
>> A recent commit enabling ASPM on certain Qualcomm platforms introduced
>> further errors when using the Wi-Fi on the X13s as well as when
>> accessing the NVMe on the CRD. The exact reason for this has not yet
>> been identified, but disabling ASPM L0s makes the errors go away. This
>> could suggest that either the current ASPM implementation is incomplete
>> or that L0s is not supported with these devices.
>> ...
> 
>> As this series fixes a regression in 6.7 (which enabled ASPM) and fixes
>> a user-reported problem with the Wi-Fi often not starting at boot, I
>> think we should merge this series for the 6.8 cycle. The final patch
>> enabling the GIC ITS should wait for 6.9.
>>
>> The DT bindings and PCI patch are expected to go through the PCI tree,
>> while Bjorn A takes the devicetree updates through the Qualcomm tree.
>> ...
> 
>> Johan Hovold (12):
>>    dt-bindings: PCI: qcom: Allow 'required-opps'
>>    dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
>>    dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
>>    PCI: qcom: Add support for disabling ASPM L0s in devicetree
> 
> The ASPM patches fix a v6.7 regression, so it would be good to fix
> that in v6.8.
> 
> Mani, if you are OK with them, I can add them to for-linus for v6.8.
> 
> What about the 'required-opps' and 'msi-map-mask' patches?  If they're
> important, I can merge them for v6.8, too, but it's late in the cycle
> and it's not clear from the commit logs why they shouldn't wait for
> v6.9.

Either way, I believe they should go through the qcom tree, as it's
a very hot one with lots of different changes to a given file.

Unless the soc-late window is already closed... Bjorn A will know.

Konrad
Johan Hovold Feb. 29, 2024, 7:45 a.m. UTC | #4
On Wed, Feb 28, 2024 at 04:08:43PM -0600, Bjorn Helgaas wrote:
> [+to Mani]
> 
> On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:
> > This series addresses a few problems with the sc8280xp PCIe
> > implementation.
> > ...
> 
> > A recent commit enabling ASPM on certain Qualcomm platforms introduced
> > further errors when using the Wi-Fi on the X13s as well as when
> > accessing the NVMe on the CRD. The exact reason for this has not yet
> > been identified, but disabling ASPM L0s makes the errors go away. This
> > could suggest that either the current ASPM implementation is incomplete
> > or that L0s is not supported with these devices.
> > ...
> 
> > As this series fixes a regression in 6.7 (which enabled ASPM) and fixes
> > a user-reported problem with the Wi-Fi often not starting at boot, I
> > think we should merge this series for the 6.8 cycle. The final patch
> > enabling the GIC ITS should wait for 6.9.
> > 
> > The DT bindings and PCI patch are expected to go through the PCI tree,
> > while Bjorn A takes the devicetree updates through the Qualcomm tree.
> > ...
> 
> > Johan Hovold (12):
> >   dt-bindings: PCI: qcom: Allow 'required-opps'
> >   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
> >   dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
> >   PCI: qcom: Add support for disabling ASPM L0s in devicetree
> 
> The ASPM patches fix a v6.7 regression, so it would be good to fix
> that in v6.8.
> 
> Mani, if you are OK with them, I can add them to for-linus for v6.8.  
> 
> What about the 'required-opps' and 'msi-map-mask' patches?  If they're
> important, I can merge them for v6.8, too, but it's late in the cycle
> and it's not clear from the commit logs why they shouldn't wait for
> v6.9.

The 'required-opps' binding patch is a prerequisite for the
corresponding DT fix, which is tagged for stable and that should go in
6.8.

The 'msi-map-mask' binding update is strictly only needed for enabling
the ITS, which is 6.9 material, even if it's arguably also a fix for the
current binding. So this one could possibly wait for 6.9 even if it may
make sense to just take all three now for 6.8 to only have to deal with
the mentioned binding conflict once.

Johan
Manivannan Sadhasivam Feb. 29, 2024, 10:08 a.m. UTC | #5
On Wed, Feb 28, 2024 at 04:08:43PM -0600, Bjorn Helgaas wrote:
> [+to Mani]
> 
> On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:
> > This series addresses a few problems with the sc8280xp PCIe
> > implementation.
> > ...
> 
> > A recent commit enabling ASPM on certain Qualcomm platforms introduced
> > further errors when using the Wi-Fi on the X13s as well as when
> > accessing the NVMe on the CRD. The exact reason for this has not yet
> > been identified, but disabling ASPM L0s makes the errors go away. This
> > could suggest that either the current ASPM implementation is incomplete
> > or that L0s is not supported with these devices.
> > ...
> 
> > As this series fixes a regression in 6.7 (which enabled ASPM) and fixes
> > a user-reported problem with the Wi-Fi often not starting at boot, I
> > think we should merge this series for the 6.8 cycle. The final patch
> > enabling the GIC ITS should wait for 6.9.
> > 
> > The DT bindings and PCI patch are expected to go through the PCI tree,
> > while Bjorn A takes the devicetree updates through the Qualcomm tree.
> > ...
> 
> > Johan Hovold (12):
> >   dt-bindings: PCI: qcom: Allow 'required-opps'
> >   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
> >   dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
> >   PCI: qcom: Add support for disabling ASPM L0s in devicetree
> 
> The ASPM patches fix a v6.7 regression, so it would be good to fix
> that in v6.8.
> 
> Mani, if you are OK with them, I can add them to for-linus for v6.8.  
> 
> What about the 'required-opps' and 'msi-map-mask' patches?  If they're
> important, I can merge them for v6.8, too, but it's late in the cycle
> and it's not clear from the commit logs why they shouldn't wait for
> v6.9.
> 

I'm checking with Qcom HW team on the ASPM behavior. So please hold off the ASPM
related patches until I get an answer. But 'required-opps' and 'msi-map-mask'
patches can be applied for 6.9 (not strictly fixing anything in 6.8).

- Mani
Johan Hovold Feb. 29, 2024, 10:25 a.m. UTC | #6
On Thu, Feb 29, 2024 at 03:38:53PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Feb 28, 2024 at 04:08:43PM -0600, Bjorn Helgaas wrote:
> > On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:

> > > Johan Hovold (12):
> > >   dt-bindings: PCI: qcom: Allow 'required-opps'
> > >   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
> > >   dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
> > >   PCI: qcom: Add support for disabling ASPM L0s in devicetree
> > 
> > The ASPM patches fix a v6.7 regression, so it would be good to fix
> > that in v6.8.
> > 
> > Mani, if you are OK with them, I can add them to for-linus for v6.8.  
> > 
> > What about the 'required-opps' and 'msi-map-mask' patches?  If they're
> > important, I can merge them for v6.8, too, but it's late in the cycle
> > and it's not clear from the commit logs why they shouldn't wait for
> > v6.9.
> > 
> 
> I'm checking with Qcom HW team on the ASPM behavior. So please hold off the ASPM
> related patches until I get an answer. But 'required-opps' and 'msi-map-mask'
> patches can be applied for 6.9 (not strictly fixing anything in 6.8).

As I mentioned, the 'required-opps' binding update is needed to fix the
missing OPP vote so blocking the binding patch would block merging the
DT fix which could otherwise go into 6.8.

The 'msi-map-mask' is arguably a fix of the binding which should never
have had that property, but sure, it's strictly only needed for 6.9.

And Bjorn A has already checked with the Qualcomm PCI team regarding
ASPM. It's also been two weeks since you said you were going to check
with your contacts. Is it really worth waiting more for an answer from
that part of the team? We can always amend the ASPM fixes later when/if
we learn more.

Note that this is also a blocker for merging ITS support for 6.9.

Johan
Manivannan Sadhasivam Feb. 29, 2024, 12:24 p.m. UTC | #7
On Thu, Feb 29, 2024 at 11:25:48AM +0100, Johan Hovold wrote:
> On Thu, Feb 29, 2024 at 03:38:53PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Feb 28, 2024 at 04:08:43PM -0600, Bjorn Helgaas wrote:
> > > On Fri, Feb 23, 2024 at 04:21:12PM +0100, Johan Hovold wrote:
> 
> > > > Johan Hovold (12):
> > > >   dt-bindings: PCI: qcom: Allow 'required-opps'
> > > >   dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
> > > >   dt-bindings: PCI: qcom: Allow 'aspm-no-l0s'
> > > >   PCI: qcom: Add support for disabling ASPM L0s in devicetree
> > > 
> > > The ASPM patches fix a v6.7 regression, so it would be good to fix
> > > that in v6.8.
> > > 
> > > Mani, if you are OK with them, I can add them to for-linus for v6.8.  
> > > 
> > > What about the 'required-opps' and 'msi-map-mask' patches?  If they're
> > > important, I can merge them for v6.8, too, but it's late in the cycle
> > > and it's not clear from the commit logs why they shouldn't wait for
> > > v6.9.
> > > 
> > 
> > I'm checking with Qcom HW team on the ASPM behavior. So please hold off the ASPM
> > related patches until I get an answer. But 'required-opps' and 'msi-map-mask'
> > patches can be applied for 6.9 (not strictly fixing anything in 6.8).
> 
> As I mentioned, the 'required-opps' binding update is needed to fix the
> missing OPP vote so blocking the binding patch would block merging the
> DT fix which could otherwise go into 6.8.
> 

I agree that the fix gets the priority. But some maintainers perfer to merge fix
patches _only_ if they are fixing the issue introduced in the ongoing release.
But if Bjorn has no issues in merging these for 6.8, then it is fine.

> The 'msi-map-mask' is arguably a fix of the binding which should never
> have had that property, but sure, it's strictly only needed for 6.9.
> 
> And Bjorn A has already checked with the Qualcomm PCI team regarding
> ASPM. It's also been two weeks since you said you were going to check
> with your contacts. Is it really worth waiting more for an answer from
> that part of the team? We can always amend the ASPM fixes later when/if
> we learn more.
> 
> Note that this is also a blocker for merging ITS support for 6.9.
> 

I got it, but we cannot just merge the patches without finding the rootcause. I
heard from Qcom that this AER error could also be due to PHY init sequence as
spotted on some other platforms, so if that is the case then the DT property is
not correct.

Since this is not the hot target now (for Qcom), it takes time to check things.

- Mani
Johan Hovold Feb. 29, 2024, 1:10 p.m. UTC | #8
On Thu, Feb 29, 2024 at 05:54:16PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Feb 29, 2024 at 11:25:48AM +0100, Johan Hovold wrote:

> > As I mentioned, the 'required-opps' binding update is needed to fix the
> > missing OPP vote so blocking the binding patch would block merging the
> > DT fix which could otherwise go into 6.8.

> I agree that the fix gets the priority. But some maintainers perfer to merge fix
> patches _only_ if they are fixing the issue introduced in the ongoing release.
> But if Bjorn has no issues in merging these for 6.8, then it is fine.

It also depends on the severity of the issue and to some extent the
complexity of the fix. These binding fixes are certainly low risk. :)

> > The 'msi-map-mask' is arguably a fix of the binding which should never
> > have had that property, but sure, it's strictly only needed for 6.9.
> > 
> > And Bjorn A has already checked with the Qualcomm PCI team regarding
> > ASPM. It's also been two weeks since you said you were going to check
> > with your contacts. Is it really worth waiting more for an answer from
> > that part of the team? We can always amend the ASPM fixes later when/if
> > we learn more.
> > 
> > Note that this is also a blocker for merging ITS support for 6.9.

> I got it, but we cannot just merge the patches without finding the rootcause. I
> heard from Qcom that this AER error could also be due to PHY init sequence as
> spotted on some other platforms, so if that is the case then the DT property is
> not correct.

I've verified the PHY sequences both against what the UEFI firmware (and
hence Windows) uses as well as against the internal Qualcomm
documentation (with the help of Bjorn A). And Qualcomm did say that such
errors are also seen under Windows on these platforms.

But the fact that the symptoms differ between the CRD and X13s, which
use the same Atheros Wi-Fi controller (and the same PHY initialisation)
also suggests that this may to some extent be due to some property of
the machine.

Notably, on the X13s there are lot of errors when pushing data
(e.g. using iperf3), whereas on the CRD the are no errors when running
such tests.

When leaving the CRD on for long periods of time with the Wi-Fi
disconnected, I do however see occasional correctable errors.

> Since this is not the hot target now (for Qcom), it takes time to
> check things.

I think that based on the available data it's reasonable to go ahead and
merge these patches. In the event that this turns out to be a
configuration issue, we can just drop the 'aspm-no-l0s' properties
again.

Johan
Manivannan Sadhasivam Feb. 29, 2024, 1:54 p.m. UTC | #9
On Thu, Feb 29, 2024 at 02:10:21PM +0100, Johan Hovold wrote:
> On Thu, Feb 29, 2024 at 05:54:16PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Feb 29, 2024 at 11:25:48AM +0100, Johan Hovold wrote:
> 
> > > As I mentioned, the 'required-opps' binding update is needed to fix the
> > > missing OPP vote so blocking the binding patch would block merging the
> > > DT fix which could otherwise go into 6.8.
> 
> > I agree that the fix gets the priority. But some maintainers perfer to merge fix
> > patches _only_ if they are fixing the issue introduced in the ongoing release.
> > But if Bjorn has no issues in merging these for 6.8, then it is fine.
> 
> It also depends on the severity of the issue and to some extent the
> complexity of the fix. These binding fixes are certainly low risk. :)
> 

Right.

> > > The 'msi-map-mask' is arguably a fix of the binding which should never
> > > have had that property, but sure, it's strictly only needed for 6.9.
> > > 
> > > And Bjorn A has already checked with the Qualcomm PCI team regarding
> > > ASPM. It's also been two weeks since you said you were going to check
> > > with your contacts. Is it really worth waiting more for an answer from
> > > that part of the team? We can always amend the ASPM fixes later when/if
> > > we learn more.
> > > 
> > > Note that this is also a blocker for merging ITS support for 6.9.
> 
> > I got it, but we cannot just merge the patches without finding the rootcause. I
> > heard from Qcom that this AER error could also be due to PHY init sequence as
> > spotted on some other platforms, so if that is the case then the DT property is
> > not correct.
> 
> I've verified the PHY sequences both against what the UEFI firmware (and
> hence Windows) uses as well as against the internal Qualcomm
> documentation (with the help of Bjorn A). And Qualcomm did say that such
> errors are also seen under Windows on these platforms.
> 

Well, sometimes the init sequence published by qualcomm could turn out to be
faulty. That's why they publish v2 sequence and such. And I want to rule out
that possibility in this case.

> But the fact that the symptoms differ between the CRD and X13s, which
> use the same Atheros Wi-Fi controller (and the same PHY initialisation)
> also suggests that this may to some extent be due to some property of
> the machine.
> 
> Notably, on the X13s there are lot of errors when pushing data
> (e.g. using iperf3), whereas on the CRD the are no errors when running
> such tests.
> 
> When leaving the CRD on for long periods of time with the Wi-Fi
> disconnected, I do however see occasional correctable errors.
> 

This may be due to hardware design that I agree (possibly influenced by driver
defect).

> > Since this is not the hot target now (for Qcom), it takes time to
> > check things.
> 
> I think that based on the available data it's reasonable to go ahead and
> merge these patches. In the event that this turns out to be a
> configuration issue, we can just drop the 'aspm-no-l0s' properties
> again.
> 

Well the problem is, if you are not sure, then adding the DT properties is
certainly not correct. As that implies a hardware defect, but it may not be.
So let's wait for some time to find out the actual issue.

- Mani
Johan Hovold Feb. 29, 2024, 3:37 p.m. UTC | #10
On Thu, Feb 29, 2024 at 07:24:07PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Feb 29, 2024 at 02:10:21PM +0100, Johan Hovold wrote:

> > I think that based on the available data it's reasonable to go ahead and
> > merge these patches. In the event that this turns out to be a
> > configuration issue, we can just drop the 'aspm-no-l0s' properties
> > again.
> 
> Well the problem is, if you are not sure, then adding the DT properties is
> certainly not correct. As that implies a hardware defect, but it may not be.
> So let's wait for some time to find out the actual issue.

Our devicetrees are always going to be a tentative description of the
hardware, and this especially true for Qualcomm that don't publish any
documentation so that people are forced to rely on informed guesses
based on downstream devicetrees and drivers and reverse engineering.

As far as I can tell, after having spent a lot of time on this and
checking with sources inside Qualcomm, the hardware is to blame here. If
this turns out not to be true, we can always revise later. We do this
all the time, as you know.

I'm all for digging further into this issue with the help of Qualcomm,
but I don't think that should block this series as that would leave the
link errors that we hit since 6.7 in place and effectively prevent us
from enabling the ITS in 6.9.

Johan
Bjorn Helgaas Feb. 29, 2024, 8:52 p.m. UTC | #11
On Thu, Feb 29, 2024 at 02:10:21PM +0100, Johan Hovold wrote:
> On Thu, Feb 29, 2024 at 05:54:16PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Feb 29, 2024 at 11:25:48AM +0100, Johan Hovold wrote:
> 
> > > As I mentioned, the 'required-opps' binding update is needed to
> > > fix the missing OPP vote so blocking the binding patch would
> > > block merging the DT fix which could otherwise go into 6.8.
> 
> > I agree that the fix gets the priority. But some maintainers
> > perfer to merge fix patches _only_ if they are fixing the issue
> > introduced in the ongoing release.  But if Bjorn has no issues in
> > merging these for 6.8, then it is fine.

I do prefer to merge only regression and important fixes after the
merge window, so I want to be able to provide justification.

> It also depends on the severity of the issue and to some extent the
> complexity of the fix. These binding fixes are certainly low risk.
> :)

IIUC we're talking about:

  arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
  dt-bindings: PCI: qcom: Allow 'required-opps'

These don't look like a regression fix (correct me if I'm wrong), and
I can't tell whether they fix a user-visible problem, since
sc8280xp.dtsi does already contain 'required-opps' for ufs_mem_hc,
usb_0, and usb_1, which are mentioned in the commit log as covering up
the issue.

If these patches wait until v6.9, what badness ensues?

Bjorn
Johan Hovold March 1, 2024, 8:09 a.m. UTC | #12
On Thu, Feb 29, 2024 at 02:52:40PM -0600, Bjorn Helgaas wrote:
> On Thu, Feb 29, 2024 at 02:10:21PM +0100, Johan Hovold wrote:

> > It also depends on the severity of the issue and to some extent the
> > complexity of the fix. These binding fixes are certainly low risk.
> > :)
> 
> IIUC we're talking about:
> 
>   arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
>   dt-bindings: PCI: qcom: Allow 'required-opps'

Right.

> These don't look like a regression fix (correct me if I'm wrong), and
> I can't tell whether they fix a user-visible problem, since
> sc8280xp.dtsi does already contain 'required-opps' for ufs_mem_hc,
> usb_0, and usb_1, which are mentioned in the commit log as covering up
> the issue.

The issue has been there since PCIe support was added for this platform
and does not cause any issues until the USB and UFS controllers are
runtime suspended.

When that happens nothing is currently making sure that we have enough
power to run PCIe at gen3 speeds, something which can potentially result
in system instability (e.g. resets).

> If these patches wait until v6.9, what badness ensues?

We'd have a few more weeks where users enabling runtime PM for USB on
the X13s could hit this before we can get the fix backported to stable.

I could have put some more details in the commit message for the DT
patch, but I did not think that amending the PCIe binding would be
controversial. (I guess we can also take the DT fix without waiting for
the binding update as it has been acked by a DT maintainer even if that
would result in some DT checker warnings until things are aligned
again.)

Let me know what you decide regarding getting the whole series into 6.8,
and then I can spend some more time on rewording, splitting and rebasing
this series if needed.

Johan
Manivannan Sadhasivam March 1, 2024, 12:24 p.m. UTC | #13
On Thu, Feb 29, 2024 at 04:37:27PM +0100, Johan Hovold wrote:
> On Thu, Feb 29, 2024 at 07:24:07PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Feb 29, 2024 at 02:10:21PM +0100, Johan Hovold wrote:
> 
> > > I think that based on the available data it's reasonable to go ahead and
> > > merge these patches. In the event that this turns out to be a
> > > configuration issue, we can just drop the 'aspm-no-l0s' properties
> > > again.
> > 
> > Well the problem is, if you are not sure, then adding the DT properties is
> > certainly not correct. As that implies a hardware defect, but it may not be.
> > So let's wait for some time to find out the actual issue.
> 
> Our devicetrees are always going to be a tentative description of the
> hardware, and this especially true for Qualcomm that don't publish any
> documentation so that people are forced to rely on informed guesses
> based on downstream devicetrees and drivers and reverse engineering.
> 
> As far as I can tell, after having spent a lot of time on this and
> checking with sources inside Qualcomm, the hardware is to blame here. If
> this turns out not to be true, we can always revise later. We do this
> all the time, as you know.
> 
> I'm all for digging further into this issue with the help of Qualcomm,
> but I don't think that should block this series as that would leave the
> link errors that we hit since 6.7 in place and effectively prevent us
> from enabling the ITS in 6.9.
> 

Sounds fair. I will report back, perhaps with a fix based on what I get to know.

But I think it is better to disable L0s in the SoC dtsi itself. That's not only
because there are patches to essentially disable L0s in 2 of the available
platforms making use of this Soc, but also you are enabling GIC ITS in the SoC
dtsi and that may affect sa8540p which is making use of this dtsi.

The users of that SoC may have not noticed the errors as you did before, but
enabling GIC ITS will certainly make the issue visible to them (more likely).

Also, if it turns out to be a hardware IP issue, then we can leave the patches
as it is, otherwise we can revert them easily.

- Mani
Johan Hovold March 1, 2024, 12:46 p.m. UTC | #14
On Fri, Mar 01, 2024 at 05:54:06PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Feb 29, 2024 at 04:37:27PM +0100, Johan Hovold wrote:

> > I'm all for digging further into this issue with the help of Qualcomm,
> > but I don't think that should block this series as that would leave the
> > link errors that we hit since 6.7 in place and effectively prevent us
> > from enabling the ITS in 6.9.
> 
> Sounds fair. I will report back, perhaps with a fix based on what I get to know.

Sounds good, thanks.

> But I think it is better to disable L0s in the SoC dtsi itself. That's not only
> because there are patches to essentially disable L0s in 2 of the available
> platforms making use of this Soc, but also you are enabling GIC ITS in the SoC
> dtsi and that may affect sa8540p which is making use of this dtsi.

I did not do so on purpose as I'm only disabling L0s on machines where
I've confirmed the issue. And the assumption for now is that this is a
machine-level issue.

> The users of that SoC may have not noticed the errors as you did before, but
> enabling GIC ITS will certainly make the issue visible to them (more likely).

Sure and that would be good to know as that would give us another data
point which may help determine where the problem lies. Enabling the ITS
will (hopefully) be done in 6.9 so we'll have a whole cycle to disable
L0s where needed. I don't think this should be done before then.

Johan
Manivannan Sadhasivam March 1, 2024, 1:14 p.m. UTC | #15
On Fri, Mar 01, 2024 at 01:46:15PM +0100, Johan Hovold wrote:
> On Fri, Mar 01, 2024 at 05:54:06PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Feb 29, 2024 at 04:37:27PM +0100, Johan Hovold wrote:
> 
> > > I'm all for digging further into this issue with the help of Qualcomm,
> > > but I don't think that should block this series as that would leave the
> > > link errors that we hit since 6.7 in place and effectively prevent us
> > > from enabling the ITS in 6.9.
> > 
> > Sounds fair. I will report back, perhaps with a fix based on what I get to know.
> 
> Sounds good, thanks.
> 
> > But I think it is better to disable L0s in the SoC dtsi itself. That's not only
> > because there are patches to essentially disable L0s in 2 of the available
> > platforms making use of this Soc, but also you are enabling GIC ITS in the SoC
> > dtsi and that may affect sa8540p which is making use of this dtsi.
> 
> I did not do so on purpose as I'm only disabling L0s on machines where
> I've confirmed the issue. And the assumption for now is that this is a
> machine-level issue.
> 
> > The users of that SoC may have not noticed the errors as you did before, but
> > enabling GIC ITS will certainly make the issue visible to them (more likely).
> 
> Sure and that would be good to know as that would give us another data
> point which may help determine where the problem lies. Enabling the ITS
> will (hopefully) be done in 6.9 so we'll have a whole cycle to disable
> L0s where needed. I don't think this should be done before then.
> 

Ok. Let's see what happens :)

For the series:

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

- Mani