[v5,2/2] acpi: PM: Add quirks for AMD Renoir/Lucienne CPUs to force the D3 hint

Message ID 20210604165403.2317-2-mario.limonciello@amd.com
State New
Headers show
Series
  • Untitled series #133835
Related show

Commit Message

Limonciello, Mario June 4, 2021, 4:54 p.m.
AMD systems from Renoir and Lucienne require that the NVME controller
is put into D3 over a Modern Standby / suspend-to-idle
cycle.  This is "typically" accomplished using the `StorageD3Enable`
property in the _DSD, but this property was introduced after many
of these systems launched and most OEM systems don't have it in
their BIOS.

On AMD Renoir without these drives going into D3 over suspend-to-idle
the resume will fail with the NVME controller being reset and a trace
like this in the kernel logs:
```
[   83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting
[   83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting
[   83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting
[   83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting
[   95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller
[   95.332843] nvme nvme0: Abort status: 0x371
[   95.332852] nvme nvme0: Abort status: 0x371
[   95.332856] nvme nvme0: Abort status: 0x371
[   95.332859] nvme nvme0: Abort status: 0x371
[   95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16
[   95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16
```

The Microsoft documentation for StorageD3Enable mentioned that Windows has
a hardcoded allowlist for D3 support, which was used for these platforms.
Introduce quirks to hardcode them for Linux as well.

As this property is now "standardized", OEM systems using AMD Cezanne and
newer APU's have adopted this property, and quirks like this should not be
necessary.

CC: Julian Sikorski <belegdol@gmail.com>
CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com>
CC: Alexander Deucher <Alexander.Deucher@amd.com>
CC: Rafael J. Wysocki <rjw@rjwysocki.net>
CC: Prike Liang <prike.liang@amd.com>
Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/acpi/device_pm.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Changes from v4->v5:
 * Add this patch back in as it's been made apparent that the
   system needs to be hardcoded for these.
   Changes:
   - Drop Cezanne - it's now covered by StorageD3Enable
   - Rebase ontop of acpi_storage_d3 outside of NVME

Comments

Raul Rangel June 4, 2021, 5:43 p.m. | #1
On Fri, Jun 4, 2021 at 10:54 AM Mario Limonciello
<mario.limonciello@amd.com> wrote:

> +
> +#ifdef CONFIG_X86
> +static const struct x86_cpu_id storage_d3_cpu_ids[] = {
> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL),  /* Renoir */
> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */
> +       {}
> +};
> +#endif
> +

Is this the same matching logic that Windows is using?
Deucher, Alexander June 4, 2021, 6:48 p.m. | #2
[AMD Public Use]

> -----Original Message-----

> From: Limonciello, Mario <Mario.Limonciello@amd.com>

> Sent: Friday, June 4, 2021 1:58 PM

> To: Raul Rangel <rrangel@chromium.org>

> Cc: Keith Busch <kbusch@kernel.org>; Jens Axboe <axboe@fb.com>;

> Christoph Hellwig <hch@lst.de>; Sagi Grimberg <sagi@grimberg.me>; Rafael

> J . Wysocki <rjw@rjwysocki.net>; open list:NVM EXPRESS DRIVER <linux-

> nvme@lists.infradead.org>; linux-acpi@vger.kernel.org;

> david.e.box@linux.intel.com; S-k, Shyam-sundar <Shyam-sundar.S-

> k@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Liang,

> Prike <Prike.Liang@amd.com>; Julian Sikorski <belegdol@gmail.com>

> Subject: Re: [PATCH v5 2/2] acpi: PM: Add quirks for AMD Renoir/Lucienne

> CPUs to force the D3 hint

> 

> On 6/4/2021 12:43, Raul Rangel wrote:

> > On Fri, Jun 4, 2021 at 10:54 AM Mario Limonciello

> > <mario.limonciello@amd.com> wrote:

> >

> >> +

> >> +#ifdef CONFIG_X86

> >> +static const struct x86_cpu_id storage_d3_cpu_ids[] = {

> >> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL),  /* Renoir

> */

> >> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /*

> Lucienne */

> >> +       {}

> >> +};

> >> +#endif

> >> +

> >

> > Is this the same matching logic that Windows is using?

> >

> 

> I don't have access to confirm their logic for it - but they do have an allowlist

> that was used for systems before StorageD3Enable was introduced as well as

> a registry key to override it in Windows.



My understanding from the windows team is that these AMD platforms use the allow list.

Alex

> 

> In Linux we can do it a number of ways:

> 

> * Detect the CPU in RN/LCN platforms

>    - Like I did in this patch

> 

> * Detect some other PCI device only present in RN/LCN platforms and set

> this hint

>    - Like some earlier versions of this patch series from Prike did

> 

> * Introduce a tri-state module parameter like d3=auto,off,on

>    - Set up logic behind auto to use acpi_storage_d3 primarily and look at a

> quirk list as a fallback if that was false.

> 

> * Add a compile time option to include these quirks in either acpi or nvme.ko

> only if a user selected them.

> 

> * Enumerate all the field systems SMBIOS we can find with these CPUs

>    - Expect this to be a large quirk list.

> 

> I don't have a strong opinion between those two first options, but suspect

> the 3rd through 5th aren't really acceptable or scalable.

> 

> I'm open to other suggestions but testers of the patches thus far have made

> it clear that /something/ needs to be done to avoid the problems on RN with

> Linux though.
Rafael J. Wysocki June 7, 2021, 2:39 p.m. | #3
On Fri, Jun 4, 2021 at 6:54 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>

> AMD systems from Renoir and Lucienne require that the NVME controller

> is put into D3 over a Modern Standby / suspend-to-idle

> cycle.  This is "typically" accomplished using the `StorageD3Enable`

> property in the _DSD, but this property was introduced after many

> of these systems launched and most OEM systems don't have it in

> their BIOS.

>

> On AMD Renoir without these drives going into D3 over suspend-to-idle

> the resume will fail with the NVME controller being reset and a trace

> like this in the kernel logs:

> ```

> [   83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting

> [   83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting

> [   83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting

> [   83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting

> [   95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller

> [   95.332843] nvme nvme0: Abort status: 0x371

> [   95.332852] nvme nvme0: Abort status: 0x371

> [   95.332856] nvme nvme0: Abort status: 0x371

> [   95.332859] nvme nvme0: Abort status: 0x371

> [   95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16

> [   95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16

> ```

>

> The Microsoft documentation for StorageD3Enable mentioned that Windows has

> a hardcoded allowlist for D3 support, which was used for these platforms.

> Introduce quirks to hardcode them for Linux as well.

>

> As this property is now "standardized", OEM systems using AMD Cezanne and

> newer APU's have adopted this property, and quirks like this should not be

> necessary.

>

> CC: Julian Sikorski <belegdol@gmail.com>

> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com>

> CC: Alexander Deucher <Alexander.Deucher@amd.com>

> CC: Rafael J. Wysocki <rjw@rjwysocki.net>

> CC: Prike Liang <prike.liang@amd.com>

> Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro

> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

> ---

>  drivers/acpi/device_pm.c | 19 +++++++++++++++++++

>  1 file changed, 19 insertions(+)

>

> Changes from v4->v5:

>  * Add this patch back in as it's been made apparent that the

>    system needs to be hardcoded for these.

>    Changes:

>    - Drop Cezanne - it's now covered by StorageD3Enable

>    - Rebase ontop of acpi_storage_d3 outside of NVME

>

> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c

> index 1edb68d00b8e..8fd2a15bf478 100644

> --- a/drivers/acpi/device_pm.c

> +++ b/drivers/acpi/device_pm.c

> @@ -20,6 +20,10 @@

>  #include <linux/pm_runtime.h>

>  #include <linux/suspend.h>

>

> +#ifdef CONFIG_X86

> +#include <asm/cpu_device_id.h>

> +#endif


This is a generic file, not x86 (or any other arch-specific)
#ifdeffery in it, please.

There is the x86/ subdir under drivers/acpi/ for x86-specific stuff.

> +

>  #include "internal.h"

>

>  /**

> @@ -1341,6 +1345,15 @@ int acpi_dev_pm_attach(struct device *dev, bool power_on)

>  }

>  EXPORT_SYMBOL_GPL(acpi_dev_pm_attach);

>

> +

> +#ifdef CONFIG_X86

> +static const struct x86_cpu_id storage_d3_cpu_ids[] = {

> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL),  /* Renoir */

> +       X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */

> +       {}

> +};

> +#endif

> +

>  /**

>   * acpi_storage_d3 - Check if a storage device should use D3.

>   * @dev: Device to check

> @@ -1356,6 +1369,12 @@ bool acpi_storage_d3(struct device *dev)

>         struct acpi_device *adev = ACPI_COMPANION(dev);

>         u8 val;

>

> +#ifdef CONFIG_X86

> +       /* Devices requiring D3, but from before StorageD3Enable was "standardized" */

> +       if (x86_match_cpu(storage_d3_cpu_ids))

> +               return true;

> +#endif

> +

>         if (!adev)

>                 return false;

>         if (fwnode_property_read_u8(acpi_fwnode_handle(adev), "StorageD3Enable",

> --

> 2.25.1

>

Patch

diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index 1edb68d00b8e..8fd2a15bf478 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -20,6 +20,10 @@ 
 #include <linux/pm_runtime.h>
 #include <linux/suspend.h>
 
+#ifdef CONFIG_X86
+#include <asm/cpu_device_id.h>
+#endif
+
 #include "internal.h"
 
 /**
@@ -1341,6 +1345,15 @@  int acpi_dev_pm_attach(struct device *dev, bool power_on)
 }
 EXPORT_SYMBOL_GPL(acpi_dev_pm_attach);
 
+
+#ifdef CONFIG_X86
+static const struct x86_cpu_id storage_d3_cpu_ids[] = {
+	X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL),	/* Renoir */
+	X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL),	/* Lucienne */
+	{}
+};
+#endif
+
 /**
  * acpi_storage_d3 - Check if a storage device should use D3.
  * @dev: Device to check
@@ -1356,6 +1369,12 @@  bool acpi_storage_d3(struct device *dev)
 	struct acpi_device *adev = ACPI_COMPANION(dev);
 	u8 val;
 
+#ifdef CONFIG_X86
+	/* Devices requiring D3, but from before StorageD3Enable was "standardized" */
+	if (x86_match_cpu(storage_d3_cpu_ids))
+		return true;
+#endif
+
 	if (!adev)
 		return false;
 	if (fwnode_property_read_u8(acpi_fwnode_handle(adev), "StorageD3Enable",