diff mbox series

[v2,3/4] mm: Change ghes code to allow poison of non-struct pfn

Message ID 20231123003513.24292-4-ankita@nvidia.com
State New
Headers show
Series mm: Implement ECC handling for pfn with no struct page | expand

Commit Message

Ankit Agrawal Nov. 23, 2023, 12:35 a.m. UTC
From: Ankit Agrawal <ankita@nvidia.com>

The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check. This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().

Update code to allow memory_failure() call on PFNs failing pfn_valid().

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 drivers/acpi/apei/ghes.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

Comments

Borislav Petkov Dec. 2, 2023, 11:23 p.m. UTC | #1
On Thu, Nov 23, 2023 at 06:05:11AM +0530, ankita@nvidia.com wrote:
> -	pfn = PHYS_PFN(physical_addr);
> -	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
> -		pr_warn_ratelimited(FW_WARN GHES_PFX
> -		"Invalid address in generic error data: %#llx\n",
> -		physical_addr);
> -		return false;
> -	}

You don't just remove a pfn valid test just because your weird device
can't stomach it - you extend it, like

  3ad6fd77a2d6 ("x86/sgx: Add check for SGX pages to ghes_do_memory_failure()")

did, for example.
Jason Gunthorpe Dec. 4, 2023, 2:36 p.m. UTC | #2
On Sun, Dec 03, 2023 at 12:23:19AM +0100, Borislav Petkov wrote:
> On Thu, Nov 23, 2023 at 06:05:11AM +0530, ankita@nvidia.com wrote:
> > -	pfn = PHYS_PFN(physical_addr);
> > -	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
> > -		pr_warn_ratelimited(FW_WARN GHES_PFX
> > -		"Invalid address in generic error data: %#llx\n",
> > -		physical_addr);
> > -		return false;
> > -	}
> 
> You don't just remove a pfn valid test just because your weird device
> can't stomach it - you extend it, like

It wasn't removed. patch 1 moved it to memory_failure() where it makes
a lot more sense.

Jason
Borislav Petkov Dec. 4, 2023, 3:36 p.m. UTC | #3
On Mon, Dec 04, 2023 at 10:36:50AM -0400, Jason Gunthorpe wrote:
> It wasn't removed. patch 1 moved it to memory_failure() where it makes
> a lot more sense.

Why is this a separate patch then?
Ankit Agrawal Dec. 4, 2023, 3:54 p.m. UTC | #4
>> It wasn't removed. patch 1 moved it to memory_failure() where it makes
>> a lot more sense.
>
> Why is this a separate patch then?

This was done to keep ghes code separate from the memory failure code.
I can merge them if that is preferable.
Jason Gunthorpe Dec. 4, 2023, 3:55 p.m. UTC | #5
On Mon, Dec 04, 2023 at 03:54:52PM +0000, Ankit Agrawal wrote:
> >> It wasn't removed. patch 1 moved it to memory_failure() where it makes
> >> a lot more sense.
> >
> > Why is this a separate patch then?
> 
> This was done to keep ghes code separate from the memory failure code.
> I can merge them if that is preferable.

A single patch to move just this code could be a good idea

Jason
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 63ad0541db38..0ca6ab9fccbe 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -471,20 +471,10 @@  static void ghes_kick_task_work(struct callback_head *head)
 
 static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 {
-	unsigned long pfn;
-
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
 		return false;
 
-	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
-		pr_warn_ratelimited(FW_WARN GHES_PFX
-		"Invalid address in generic error data: %#llx\n",
-		physical_addr);
-		return false;
-	}
-
-	memory_failure_queue(pfn, flags);
+	memory_failure_queue(PHYS_PFN(physical_addr), flags);
 	return true;
 }