diff mbox series

[23/29] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

Message ID 20210313050820.EoPGLS3gj%akpm@linux-foundation.org
State New
Headers show
Series None | expand

Commit Message

Andrew Morton March 13, 2021, 5:08 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later.  My testcase (which calls hard-offline on
some tail page in a hugetlb, then accesses the address of the hugetlb
range) showed that page allocation code detects this page lock on buddy
page and printed out "BUG: Bad page state" message.

check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page.

This patch suggests to drop the 2nd page lock to fix the issue.

Link: https://lkml.kernel.org/r/20210304064437.962442-1-nao.horiguchi@gmail.com
Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory-failure.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Linus Torvalds March 13, 2021, 7:23 p.m. UTC | #1
On Fri, Mar 12, 2021 at 9:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

This patch can not possibly be correct, and is dangerous and very very wrong.

>  out:
> -       unlock_page(head);
> +       if (PageLocked(head))
> +               unlock_page(head);

You absolutely CANNOT do things like this. It is fundamentally insane.

You have two situations:

 (a) you know you locked the page.

     In this case an unconditional "unlock_page()" is the only correct
thing to do.

 (b) you don't know whether you maybe unlocked it again.

     In this case SOMEBODY ELSE might have locked the page, and you
doing "if it's locked, then unlock it" is pure and utter drivel, and
fundamentally and seriously wrong. You're unlocking SOMEBODY ELSES
lock, after having already unlocked your own once.

Now, it is possible that this kind of pattern could be ok if you
absolutely 100% know - and it is obvious from the code - that you are
the only thread that can possibly access the page. But if that is the
case, then the page should never have been locked in the first place,
because that would be an insane and pointless operation, since the
whole - and only - point of locking is to enforce some kind of
exclusivity - and if exclusivity is explicit in the code-path, then
locking the page is pointless.

And yes, in this case I can see a remote theoretical model: "all good
cases keep the page locked, and the only trhing that unlocks the page
also guarantees nothing else can possiblly see it".

But no. I don't actually believe this is fuaranteed to the case here,
and even if it were, I don't think this is a code sequence we can or
should accept.

End result: there is no possible situation that I can think of where the above

       if (PageLocked(head))
               unlock_page(head);

kind of sequence can EVER possibly be correct, and it shows a complete
disregard of everything that locking is all about.

Now, the memory error handling may be so special that this effectively
"works" in practice, but there is no way I can apply such a
fundamentally broken patch.

I see two options:

 - make the result of identify_page_state() *tell* you whether the
page was already unlocked (and then make the unlock conditional on
*that*)

   This is valid, because now you removed that whole "somebody else
might have transferred the lock" issue: you know you already unlocked
the page based on the return value, so you don't unlock it again.
That's perfectly valid.

 - probably better: make identify_page_state() itself always unlock
the page, and turn the two different code sequences that currently do

        res = identify_page_state(pfn, p, page_flags);
  out:
        unlock_page(head);
        return res;

into just doing

        return identify_page_state(pfn, p, page_flags);
  out:
        unlock_page(head);
        return -EBUSY;

instead, and move that now pointless "res" variable into the only
if-statement that actually uses it (for other things entirely! It
should be a "enum mf_result" there!)

And yes, that second (and imho better) rule means that now
{page_action()" itself needs to be the thing that unlocks the page,
and that in turn might be done a few different ways:

 (a) either add a separate "MF_UNLOCKED" state bit to the possible
return codes and make that me_huge_page() that unlocks the page set
that bit (NOTE! It still needs to also return either MF_FAILED or
MF_RECOVERED, so this "MF_UNLOCKED" bit really should be a separate
bitmask entirely from the other MF_xyz bits)

 (b) make the rule be that *all* the ->action() functions need to just
unlock the page.

( suspect (b) is the better model to aim for, because honestly, static
code rules for locking are basically almost always superior to dynamic
"based on this flag" locking rules. You can in theory actually have
static tooling check that the locking nests correctly with the
unlocking that way (and it's just conceptually simpler to have a hard
rule about "this function always unlocks the page").

But that existing patch is *so* fundamentally wrong that I cannot
possibly apply it, and I feel bad about the fact that it has made it
all the way to me with sign-offs and reviewed-by's and everything.

                  Linus
HORIGUCHI NAOYA(堀口 直也) March 14, 2021, 6:36 a.m. UTC | #2
On Sat, Mar 13, 2021 at 11:23:40AM -0800, Linus Torvalds wrote:
> On Fri, Mar 12, 2021 at 9:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:

> >

> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>

> > Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

> 

> This patch can not possibly be correct, and is dangerous and very very wrong.

> 

> >  out:

> > -       unlock_page(head);

> > +       if (PageLocked(head))

> > +               unlock_page(head);

> 

> You absolutely CANNOT do things like this. It is fundamentally insane.

> 

> You have two situations:

> 

>  (a) you know you locked the page.

> 

>      In this case an unconditional "unlock_page()" is the only correct

> thing to do.

> 

>  (b) you don't know whether you maybe unlocked it again.

> 

>      In this case SOMEBODY ELSE might have locked the page, and you

> doing "if it's locked, then unlock it" is pure and utter drivel, and

> fundamentally and seriously wrong. You're unlocking SOMEBODY ELSES

> lock, after having already unlocked your own once.

> 

> Now, it is possible that this kind of pattern could be ok if you

> absolutely 100% know - and it is obvious from the code - that you are

> the only thread that can possibly access the page. But if that is the

> case, then the page should never have been locked in the first place,

> because that would be an insane and pointless operation, since the

> whole - and only - point of locking is to enforce some kind of

> exclusivity - and if exclusivity is explicit in the code-path, then

> locking the page is pointless.

> 

> And yes, in this case I can see a remote theoretical model: "all good

> cases keep the page locked, and the only trhing that unlocks the page

> also guarantees nothing else can possiblly see it".

> 

> But no. I don't actually believe this is fuaranteed to the case here,

> and even if it were, I don't think this is a code sequence we can or

> should accept.

> 

> End result: there is no possible situation that I can think of where the above

> 

>        if (PageLocked(head))

>                unlock_page(head);

> 

> kind of sequence can EVER possibly be correct, and it shows a complete

> disregard of everything that locking is all about.

> 

> Now, the memory error handling may be so special that this effectively

> "works" in practice, but there is no way I can apply such a

> fundamentally broken patch.


OK, I'll update the patch with the second approach you mentioned below.

> 

> I see two options:

> 

>  - make the result of identify_page_state() *tell* you whether the

> page was already unlocked (and then make the unlock conditional on

> *that*)

> 

>    This is valid, because now you removed that whole "somebody else

> might have transferred the lock" issue: you know you already unlocked

> the page based on the return value, so you don't unlock it again.

> That's perfectly valid.

> 

>  - probably better: make identify_page_state() itself always unlock

> the page, and turn the two different code sequences that currently do

> 

>         res = identify_page_state(pfn, p, page_flags);

>   out:

>         unlock_page(head);

>         return res;

> 

> into just doing

> 

>         return identify_page_state(pfn, p, page_flags);

>   out:

>         unlock_page(head);

>         return -EBUSY;

> 

> instead, and move that now pointless "res" variable into the only

> if-statement that actually uses it (for other things entirely! It

> should be a "enum mf_result" there!)

> 

> And yes, that second (and imho better) rule means that now

> {page_action()" itself needs to be the thing that unlocks the page,

> and that in turn might be done a few different ways:

> 

>  (a) either add a separate "MF_UNLOCKED" state bit to the possible

> return codes and make that me_huge_page() that unlocks the page set

> that bit (NOTE! It still needs to also return either MF_FAILED or

> MF_RECOVERED, so this "MF_UNLOCKED" bit really should be a separate

> bitmask entirely from the other MF_xyz bits)

> 

>  (b) make the rule be that *all* the ->action() functions need to just

> unlock the page.

> 

> ( suspect (b) is the better model to aim for, because honestly, static

> code rules for locking are basically almost always superior to dynamic

> "based on this flag" locking rules. You can in theory actually have

> static tooling check that the locking nests correctly with the

> unlocking that way (and it's just conceptually simpler to have a hard

> rule about "this function always unlocks the page").


I'll try (b). Thank you for the detailed advices.

- Naoya Horiguchi

> 

> But that existing patch is *so* fundamentally wrong that I cannot

> possibly apply it, and I feel bad about the fact that it has made it

> all the way to me with sign-offs and reviewed-by's and everything.

> 

>                   Linus

>
diff mbox series

Patch

--- a/mm/memory-failure.c~mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers
+++ a/mm/memory-failure.c
@@ -834,7 +834,6 @@  static int me_huge_page(struct page *p,
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
-		lock_page(hpage);
 	}
 
 	return res;
@@ -1290,7 +1289,8 @@  static int memory_failure_hugetlb(unsign
 
 	res = identify_page_state(pfn, p, page_flags);
 out:
-	unlock_page(head);
+	if (PageLocked(head))
+		unlock_page(head);
 	return res;
 }