diff mbox series

PM: hibernate: Fix a bug in copying the zero bitmap to safe pages

Message ID 20230929-hib_zero_bitmap_fix-v1-1-6cfdcb785250@quicinc.com
State New
Headers show
Series PM: hibernate: Fix a bug in copying the zero bitmap to safe pages | expand

Commit Message

Pavan Kondeti Sept. 29, 2023, 5:31 p.m. UTC
The following crash is observed 100% of the time during resume from
the hibernation on a x86 QEMU system.

[   12.931887]  ? __die_body+0x1a/0x60
[   12.932324]  ? page_fault_oops+0x156/0x420
[   12.932824]  ? search_exception_tables+0x37/0x50
[   12.933389]  ? fixup_exception+0x21/0x300
[   12.933889]  ? exc_page_fault+0x69/0x150
[   12.934371]  ? asm_exc_page_fault+0x26/0x30
[   12.934869]  ? get_buffer.constprop.0+0xac/0x100
[   12.935428]  snapshot_write_next+0x7c/0x9f0
[   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
[   12.936530]  ? submit_bio_noacct+0x44/0x2c0
[   12.937035]  ? hib_submit_io+0xa5/0x110
[   12.937501]  load_image+0x83/0x1a0
[   12.937919]  swsusp_read+0x17f/0x1d0
[   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
[   12.938967]  load_image_and_restore+0x45/0xc0
[   12.939494]  software_resume+0x13c/0x180
[   12.939994]  resume_store+0xa3/0x1d0

The commit being fixed introduced a bug in copying the zero bitmap
to safe pages. A temporary bitmap is allocated in prepare_image()
to make a copy of zero bitmap after the unsafe pages are marked.
Freeing this temporary bitmap later results in an inconsistent state
of unsafe pages. Since free bit is left as is for this temporary bitmap
after free, these pages are treated as unsafe pages when they are
allocated again. This results in incorrect calculation of the number
of pages pre-allocated for the image.

nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;

The allocate_unsafe_pages is estimated to be higher than the actual
which results in running short of pages in safe_pages_list. Hence the
crash is observed in get_buffer() due to NULL pointer access of
safe_pages_list.

Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
---
 kernel/power/snapshot.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)


---
base-commit: 6465e260f48790807eef06b583b38ca9789b6072
change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae

Best regards,

Comments

Brian Geffon Sept. 30, 2023, 11:37 a.m. UTC | #1
On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
<quic_pkondeti@quicinc.com> wrote:
>
Hi Pavankumar,

> The following crash is observed 100% of the time during resume from
> the hibernation on a x86 QEMU system.
>
> [   12.931887]  ? __die_body+0x1a/0x60
> [   12.932324]  ? page_fault_oops+0x156/0x420
> [   12.932824]  ? search_exception_tables+0x37/0x50
> [   12.933389]  ? fixup_exception+0x21/0x300
> [   12.933889]  ? exc_page_fault+0x69/0x150
> [   12.934371]  ? asm_exc_page_fault+0x26/0x30
> [   12.934869]  ? get_buffer.constprop.0+0xac/0x100
> [   12.935428]  snapshot_write_next+0x7c/0x9f0
> [   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
> [   12.936530]  ? submit_bio_noacct+0x44/0x2c0
> [   12.937035]  ? hib_submit_io+0xa5/0x110
> [   12.937501]  load_image+0x83/0x1a0
> [   12.937919]  swsusp_read+0x17f/0x1d0
> [   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
> [   12.938967]  load_image_and_restore+0x45/0xc0
> [   12.939494]  software_resume+0x13c/0x180
> [   12.939994]  resume_store+0xa3/0x1d0
>
> The commit being fixed introduced a bug in copying the zero bitmap
> to safe pages. A temporary bitmap is allocated in prepare_image()
> to make a copy of zero bitmap after the unsafe pages are marked.
> Freeing this temporary bitmap later results in an inconsistent state
> of unsafe pages. Since free bit is left as is for this temporary bitmap
> after free, these pages are treated as unsafe pages when they are
> allocated again. This results in incorrect calculation of the number
> of pages pre-allocated for the image.
>
> nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
>
> The allocate_unsafe_pages is estimated to be higher than the actual
> which results in running short of pages in safe_pages_list. Hence the
> crash is observed in get_buffer() due to NULL pointer access of
> safe_pages_list.

Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b
which addresses the null pointer dereference which regardless
shouldn't be touching the list directly and should be using
__get_safe_page(). I'll review this patch in the next day or two.

>
> Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
> Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
> ---
>  kernel/power/snapshot.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 87e9f7e2bdc0..cb7341a71a21 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>                 struct memory_bitmap *zero_bm)
>  {
>         unsigned int nr_pages, nr_highmem;
> -       struct memory_bitmap tmp;
> +       struct memory_bitmap tmp_zero_bm;
>         struct linked_page *lp;
>         int error;
>
> @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         free_image_page(buffer, PG_UNSAFE_CLEAR);
>         buffer = NULL;
>
> +       /*
> +        * Allocate a temporary bitmap to create a copy of zero_bm in
> +        * safe pages. This allocation needs to be done before marking
> +        * unsafe pages below so that it can be freed without altering
> +        * the state of unsafe pages.
> +        */
> +       error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY);
> +       if (error)
> +               goto Free;
> +
>         nr_highmem = count_highmem_image_pages(bm);
>         mark_unsafe_pages(bm);
>
> @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         duplicate_memory_bitmap(new_bm, bm);
>         memory_bm_free(bm, PG_UNSAFE_KEEP);
>
> -       /* Make a copy of zero_bm so it can be created in safe pages */
> -       error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY);
> -       if (error)
> -               goto Free;
> -
> -       duplicate_memory_bitmap(&tmp, zero_bm);
> +       duplicate_memory_bitmap(&tmp_zero_bm, zero_bm);
>         memory_bm_free(zero_bm, PG_UNSAFE_KEEP);
>
>         /* Recreate zero_bm in safe pages */
> @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         if (error)
>                 goto Free;
>
> -       duplicate_memory_bitmap(zero_bm, &tmp);
> -       memory_bm_free(&tmp, PG_UNSAFE_KEEP);
> +       duplicate_memory_bitmap(zero_bm, &tmp_zero_bm);
> +       memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP);
>         /* At this point zero_bm is in safe pages and it can be used for restoring. */
>
>         if (nr_highmem > 0) {
>
> ---
> base-commit: 6465e260f48790807eef06b583b38ca9789b6072
> change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae
>
> Best regards,
> --
> Pavankumar Kondeti <quic_pkondeti@quicinc.com>

Thanks,
Brian

>
Brian Geffon Oct. 2, 2023, 5:56 p.m. UTC | #2
On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
<quic_pkondeti@quicinc.com> wrote:
>
> The following crash is observed 100% of the time during resume from
> the hibernation on a x86 QEMU system.
>
> [   12.931887]  ? __die_body+0x1a/0x60
> [   12.932324]  ? page_fault_oops+0x156/0x420
> [   12.932824]  ? search_exception_tables+0x37/0x50
> [   12.933389]  ? fixup_exception+0x21/0x300
> [   12.933889]  ? exc_page_fault+0x69/0x150
> [   12.934371]  ? asm_exc_page_fault+0x26/0x30
> [   12.934869]  ? get_buffer.constprop.0+0xac/0x100
> [   12.935428]  snapshot_write_next+0x7c/0x9f0
> [   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
> [   12.936530]  ? submit_bio_noacct+0x44/0x2c0
> [   12.937035]  ? hib_submit_io+0xa5/0x110
> [   12.937501]  load_image+0x83/0x1a0
> [   12.937919]  swsusp_read+0x17f/0x1d0
> [   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
> [   12.938967]  load_image_and_restore+0x45/0xc0
> [   12.939494]  software_resume+0x13c/0x180
> [   12.939994]  resume_store+0xa3/0x1d0
>
> The commit being fixed introduced a bug in copying the zero bitmap
> to safe pages. A temporary bitmap is allocated in prepare_image()
> to make a copy of zero bitmap after the unsafe pages are marked.
> Freeing this temporary bitmap later results in an inconsistent state
> of unsafe pages. Since free bit is left as is for this temporary bitmap
> after free, these pages are treated as unsafe pages when they are
> allocated again. This results in incorrect calculation of the number
> of pages pre-allocated for the image.
>
> nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
>
> The allocate_unsafe_pages is estimated to be higher than the actual
> which results in running short of pages in safe_pages_list. Hence the
> crash is observed in get_buffer() due to NULL pointer access of
> safe_pages_list.

After reading through the code, perhaps I'm missing something, I'm not
sure that this is really fixing the problem.

It seems like the problem would be that memory_bm_create() results in
calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that
get_image_page() will not touch allocated_unsafe_pages and instead
will mark the page as in use by setting it in the forbidden_pages_map
and the free_pages_map simultaneously. The problem is that the
free_pages_map was already populated by the call to mark_unsafe_pages,
meaning that if we allocated a safe page in get_image_page() we just
set the free bit when it otherwise should not be set.

When the page is later free'd via the call to memory_bm_free(&tmp,
PG_UNSAFE_KEEP), it results in calls to free_image_page() w/
clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not
touch the free_pages_map because we don't call
swsusp_unset_page_free().

With all that being said it seems like the correct way to deal with
that would be to do:
   error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE);
Here we know that the pages were not in the free_pages_map initially.

Followed by freeing it as:
    memory_bm_free(&tmp, PG_UNSAFE_CLEAR);
And here we know that swsusp_unset_page_free() will be called making
sure the page is not in the free_pages_map afterwards.

And that should result in an unchanged free_pages_map. Does that make
sense? Please correct me if I'm misunderstanding something.

>
> Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
> Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
> ---
>  kernel/power/snapshot.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 87e9f7e2bdc0..cb7341a71a21 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>                 struct memory_bitmap *zero_bm)
>  {
>         unsigned int nr_pages, nr_highmem;
> -       struct memory_bitmap tmp;
> +       struct memory_bitmap tmp_zero_bm;
>         struct linked_page *lp;
>         int error;
>
> @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         free_image_page(buffer, PG_UNSAFE_CLEAR);
>         buffer = NULL;
>
> +       /*
> +        * Allocate a temporary bitmap to create a copy of zero_bm in
> +        * safe pages. This allocation needs to be done before marking
> +        * unsafe pages below so that it can be freed without altering
> +        * the state of unsafe pages.
> +        */
> +       error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY);
> +       if (error)
> +               goto Free;
> +
>         nr_highmem = count_highmem_image_pages(bm);
>         mark_unsafe_pages(bm);
>
> @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         duplicate_memory_bitmap(new_bm, bm);
>         memory_bm_free(bm, PG_UNSAFE_KEEP);
>
> -       /* Make a copy of zero_bm so it can be created in safe pages */
> -       error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY);
> -       if (error)
> -               goto Free;
> -
> -       duplicate_memory_bitmap(&tmp, zero_bm);
> +       duplicate_memory_bitmap(&tmp_zero_bm, zero_bm);
>         memory_bm_free(zero_bm, PG_UNSAFE_KEEP);
>
>         /* Recreate zero_bm in safe pages */
> @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
>         if (error)
>                 goto Free;
>
> -       duplicate_memory_bitmap(zero_bm, &tmp);
> -       memory_bm_free(&tmp, PG_UNSAFE_KEEP);
> +       duplicate_memory_bitmap(zero_bm, &tmp_zero_bm);
> +       memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP);
>         /* At this point zero_bm is in safe pages and it can be used for restoring. */
>
>         if (nr_highmem > 0) {
>
> ---
> base-commit: 6465e260f48790807eef06b583b38ca9789b6072
> change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae
>
> Best regards,
> --
> Pavankumar Kondeti <quic_pkondeti@quicinc.com>
>
Brian Geffon Oct. 2, 2023, 6:34 p.m. UTC | #3
On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote:
>
> On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
> <quic_pkondeti@quicinc.com> wrote:
> >
> > The following crash is observed 100% of the time during resume from
> > the hibernation on a x86 QEMU system.
> >
> > [   12.931887]  ? __die_body+0x1a/0x60
> > [   12.932324]  ? page_fault_oops+0x156/0x420
> > [   12.932824]  ? search_exception_tables+0x37/0x50
> > [   12.933389]  ? fixup_exception+0x21/0x300
> > [   12.933889]  ? exc_page_fault+0x69/0x150
> > [   12.934371]  ? asm_exc_page_fault+0x26/0x30
> > [   12.934869]  ? get_buffer.constprop.0+0xac/0x100
> > [   12.935428]  snapshot_write_next+0x7c/0x9f0
> > [   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
> > [   12.936530]  ? submit_bio_noacct+0x44/0x2c0
> > [   12.937035]  ? hib_submit_io+0xa5/0x110
> > [   12.937501]  load_image+0x83/0x1a0
> > [   12.937919]  swsusp_read+0x17f/0x1d0
> > [   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
> > [   12.938967]  load_image_and_restore+0x45/0xc0
> > [   12.939494]  software_resume+0x13c/0x180
> > [   12.939994]  resume_store+0xa3/0x1d0
> >
> > The commit being fixed introduced a bug in copying the zero bitmap
> > to safe pages. A temporary bitmap is allocated in prepare_image()
> > to make a copy of zero bitmap after the unsafe pages are marked.
> > Freeing this temporary bitmap later results in an inconsistent state
> > of unsafe pages. Since free bit is left as is for this temporary bitmap
> > after free, these pages are treated as unsafe pages when they are
> > allocated again. This results in incorrect calculation of the number
> > of pages pre-allocated for the image.
> >
> > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
> >
> > The allocate_unsafe_pages is estimated to be higher than the actual
> > which results in running short of pages in safe_pages_list. Hence the
> > crash is observed in get_buffer() due to NULL pointer access of
> > safe_pages_list.
>
> After reading through the code, perhaps I'm missing something, I'm not
> sure that this is really fixing the problem.
>
> It seems like the problem would be that memory_bm_create() results in
> calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that
> get_image_page() will not touch allocated_unsafe_pages and instead
> will mark the page as in use by setting it in the forbidden_pages_map
> and the free_pages_map simultaneously. The problem is that the
> free_pages_map was already populated by the call to mark_unsafe_pages,
> meaning that if we allocated a safe page in get_image_page() we just
> set the free bit when it otherwise should not be set.
>
> When the page is later free'd via the call to memory_bm_free(&tmp,
> PG_UNSAFE_KEEP), it results in calls to free_image_page() w/
> clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not
> touch the free_pages_map because we don't call
> swsusp_unset_page_free().
>
> With all that being said it seems like the correct way to deal with
> that would be to do:
>    error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE);
> Here we know that the pages were not in the free_pages_map initially.
>
> Followed by freeing it as:
>     memory_bm_free(&tmp, PG_UNSAFE_CLEAR);
> And here we know that swsusp_unset_page_free() will be called making
> sure the page is not in the free_pages_map afterwards.
>
> And that should result in an unchanged free_pages_map. Does that make
> sense? Please correct me if I'm misunderstanding something.
>

To restate this another way, if I'm reading it correctly, I think the
outcome is actually nearly the same, the difference is, when
allocating the bitmap before w/ PG_ANY we're setting bits in the
free_page_list which will be unset a few lines later in the call to
mark_unsafe_pages(), and then we won't touch the free_pages_list
during the memory_bm_free() because it's called with PG_UNSAFE_KEEP.

> >
> > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
> > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
> > ---
> >  kernel/power/snapshot.c | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> > index 87e9f7e2bdc0..cb7341a71a21 100644
> > --- a/kernel/power/snapshot.c
> > +++ b/kernel/power/snapshot.c
> > @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
> >                 struct memory_bitmap *zero_bm)
> >  {
> >         unsigned int nr_pages, nr_highmem;
> > -       struct memory_bitmap tmp;
> > +       struct memory_bitmap tmp_zero_bm;
> >         struct linked_page *lp;
> >         int error;
> >
> > @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
> >         free_image_page(buffer, PG_UNSAFE_CLEAR);
> >         buffer = NULL;
> >
> > +       /*
> > +        * Allocate a temporary bitmap to create a copy of zero_bm in
> > +        * safe pages. This allocation needs to be done before marking
> > +        * unsafe pages below so that it can be freed without altering
> > +        * the state of unsafe pages.
> > +        */
> > +       error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY);
> > +       if (error)
> > +               goto Free;
> > +
> >         nr_highmem = count_highmem_image_pages(bm);
> >         mark_unsafe_pages(bm);
> >
> > @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
> >         duplicate_memory_bitmap(new_bm, bm);
> >         memory_bm_free(bm, PG_UNSAFE_KEEP);
> >
> > -       /* Make a copy of zero_bm so it can be created in safe pages */
> > -       error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY);
> > -       if (error)
> > -               goto Free;
> > -
> > -       duplicate_memory_bitmap(&tmp, zero_bm);
> > +       duplicate_memory_bitmap(&tmp_zero_bm, zero_bm);
> >         memory_bm_free(zero_bm, PG_UNSAFE_KEEP);
> >
> >         /* Recreate zero_bm in safe pages */
> > @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
> >         if (error)
> >                 goto Free;
> >
> > -       duplicate_memory_bitmap(zero_bm, &tmp);
> > -       memory_bm_free(&tmp, PG_UNSAFE_KEEP);
> > +       duplicate_memory_bitmap(zero_bm, &tmp_zero_bm);
> > +       memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP);
> >         /* At this point zero_bm is in safe pages and it can be used for restoring. */
> >
> >         if (nr_highmem > 0) {
> >
> > ---
> > base-commit: 6465e260f48790807eef06b583b38ca9789b6072
> > change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae
> >
> > Best regards,
> > --
> > Pavankumar Kondeti <quic_pkondeti@quicinc.com>
> >
Pavan Kondeti Oct. 3, 2023, 3:08 a.m. UTC | #4
On Sat, Sep 30, 2023 at 07:37:13AM -0400, Brian Geffon wrote:
> On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
> <quic_pkondeti@quicinc.com> wrote:
> >
> Hi Pavankumar,
> 
> > The following crash is observed 100% of the time during resume from
> > the hibernation on a x86 QEMU system.
> >
> > [   12.931887]  ? __die_body+0x1a/0x60
> > [   12.932324]  ? page_fault_oops+0x156/0x420
> > [   12.932824]  ? search_exception_tables+0x37/0x50
> > [   12.933389]  ? fixup_exception+0x21/0x300
> > [   12.933889]  ? exc_page_fault+0x69/0x150
> > [   12.934371]  ? asm_exc_page_fault+0x26/0x30
> > [   12.934869]  ? get_buffer.constprop.0+0xac/0x100
> > [   12.935428]  snapshot_write_next+0x7c/0x9f0
> > [   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
> > [   12.936530]  ? submit_bio_noacct+0x44/0x2c0
> > [   12.937035]  ? hib_submit_io+0xa5/0x110
> > [   12.937501]  load_image+0x83/0x1a0
> > [   12.937919]  swsusp_read+0x17f/0x1d0
> > [   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
> > [   12.938967]  load_image_and_restore+0x45/0xc0
> > [   12.939494]  software_resume+0x13c/0x180
> > [   12.939994]  resume_store+0xa3/0x1d0
> >
> > The commit being fixed introduced a bug in copying the zero bitmap
> > to safe pages. A temporary bitmap is allocated in prepare_image()
> > to make a copy of zero bitmap after the unsafe pages are marked.
> > Freeing this temporary bitmap later results in an inconsistent state
> > of unsafe pages. Since free bit is left as is for this temporary bitmap
> > after free, these pages are treated as unsafe pages when they are
> > allocated again. This results in incorrect calculation of the number
> > of pages pre-allocated for the image.
> >
> > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
> >
> > The allocate_unsafe_pages is estimated to be higher than the actual
> > which results in running short of pages in safe_pages_list. Hence the
> > crash is observed in get_buffer() due to NULL pointer access of
> > safe_pages_list.
> 
> Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b
> which addresses the null pointer dereference which regardless
> shouldn't be touching the list directly and should be using
> __get_safe_page().

Thanks for pointing me to this. I have verified hibernation by pulling this
commit to v6.6-rc3 and it works as expected.

This commit is currently queued for v6.7, can it be included in next -rc or 
we have to apply the patch I have sent to make sure that hibernation works on
v6.6 when it gets released.

> 
> >
> > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
> > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>

Thanks,
Pavan
Pavan Kondeti Oct. 3, 2023, 2:50 p.m. UTC | #5
On Tue, Oct 03, 2023 at 10:36:25AM -0400, Brian Geffon wrote:
> On Mon, Oct 2, 2023 at 11:05 PM Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
> >
> > On Mon, Oct 02, 2023 at 02:34:20PM -0400, Brian Geffon wrote:
> > > On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote:
> > > >
> > > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
> > > > <quic_pkondeti@quicinc.com> wrote:
> > > > >
> > > > > The following crash is observed 100% of the time during resume from
> > > > > the hibernation on a x86 QEMU system.
> > > > >
> > > > > [   12.931887]  ? __die_body+0x1a/0x60
> > > > > [   12.932324]  ? page_fault_oops+0x156/0x420
> > > > > [   12.932824]  ? search_exception_tables+0x37/0x50
> > > > > [   12.933389]  ? fixup_exception+0x21/0x300
> > > > > [   12.933889]  ? exc_page_fault+0x69/0x150
> > > > > [   12.934371]  ? asm_exc_page_fault+0x26/0x30
> > > > > [   12.934869]  ? get_buffer.constprop.0+0xac/0x100
> > > > > [   12.935428]  snapshot_write_next+0x7c/0x9f0
> > > > > [   12.935929]  ? submit_bio_noacct_nocheck+0x2c2/0x370
> > > > > [   12.936530]  ? submit_bio_noacct+0x44/0x2c0
> > > > > [   12.937035]  ? hib_submit_io+0xa5/0x110
> > > > > [   12.937501]  load_image+0x83/0x1a0
> > > > > [   12.937919]  swsusp_read+0x17f/0x1d0
> > > > > [   12.938355]  ? create_basic_memory_bitmaps+0x1b7/0x240
> > > > > [   12.938967]  load_image_and_restore+0x45/0xc0
> > > > > [   12.939494]  software_resume+0x13c/0x180
> > > > > [   12.939994]  resume_store+0xa3/0x1d0
> > > > >
> > > > > The commit being fixed introduced a bug in copying the zero bitmap
> > > > > to safe pages. A temporary bitmap is allocated in prepare_image()
> > > > > to make a copy of zero bitmap after the unsafe pages are marked.
> > > > > Freeing this temporary bitmap later results in an inconsistent state
> > > > > of unsafe pages. Since free bit is left as is for this temporary bitmap
> > > > > after free, these pages are treated as unsafe pages when they are
> > > > > allocated again. This results in incorrect calculation of the number
> > > > > of pages pre-allocated for the image.
> > > > >
> > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
> > > > >
> > > > > The allocate_unsafe_pages is estimated to be higher than the actual
> > > > > which results in running short of pages in safe_pages_list. Hence the
> > > > > crash is observed in get_buffer() due to NULL pointer access of
> > > > > safe_pages_list.
> > > >
> > > > After reading through the code, perhaps I'm missing something, I'm not
> > > > sure that this is really fixing the problem.
> > > >
> > > > It seems like the problem would be that memory_bm_create() results in
> > > > calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that
> > > > get_image_page() will not touch allocated_unsafe_pages and instead
> > > > will mark the page as in use by setting it in the forbidden_pages_map
> > > > and the free_pages_map simultaneously. The problem is that the
> > > > free_pages_map was already populated by the call to mark_unsafe_pages,
> > > > meaning that if we allocated a safe page in get_image_page() we just
> > > > set the free bit when it otherwise should not be set.
> > > >
> > > > When the page is later free'd via the call to memory_bm_free(&tmp,
> > > > PG_UNSAFE_KEEP), it results in calls to free_image_page() w/
> > > > clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not
> > > > touch the free_pages_map because we don't call
> > > > swsusp_unset_page_free().
> > > >
> > > > With all that being said it seems like the correct way to deal with
> > > > that would be to do:
> > > >    error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE);
> > > > Here we know that the pages were not in the free_pages_map initially.
> > > >
> > > > Followed by freeing it as:
> > > >     memory_bm_free(&tmp, PG_UNSAFE_CLEAR);
> > > > And here we know that swsusp_unset_page_free() will be called making
> > > > sure the page is not in the free_pages_map afterwards.
> > > >
> > > > And that should result in an unchanged free_pages_map. Does that make
> > > > sense? Please correct me if I'm misunderstanding something.
> > > >
> >
> > Thanks for your review. Appreciate the detailed summary.
> >
> > >
> > > To restate this another way, if I'm reading it correctly, I think the
> > > outcome is actually nearly the same, the difference is, when
> > > allocating the bitmap before w/ PG_ANY we're setting bits in the
> > > free_page_list which will be unset a few lines later in the call to
> > > mark_unsafe_pages(), and then we won't touch the free_pages_list
> > > during the memory_bm_free() because it's called with PG_UNSAFE_KEEP.
> > >
> >
> > The current patch and your suggestion both gives the same effect like
> > you said. Since it is a temporary buffer to hold the zero bit map page, I
> > did not allocate safe pages. Allocating safe pages means a bit more
> > work. In this case this it is not completely throw away work but
> > re-ordering the call seems to be simple here. Pls let me know if you
> > want to me incorporate your suggestion.
> 
> My personal opinion is that PG_SAFE makes a bit more sense, it's not
> really wasted work as any pages which are not safe end up being added
> to the allocated_unsafe_pages pool.
> 

Yes, the extra bit of works does not go waste. I will send v2 with your
suggestion. Thanks a lot for your review and detailed comments.

Thanks,
Pavan
diff mbox series

Patch

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 87e9f7e2bdc0..cb7341a71a21 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -2628,7 +2628,7 @@  static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
 		struct memory_bitmap *zero_bm)
 {
 	unsigned int nr_pages, nr_highmem;
-	struct memory_bitmap tmp;
+	struct memory_bitmap tmp_zero_bm;
 	struct linked_page *lp;
 	int error;
 
@@ -2636,6 +2636,16 @@  static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
 	free_image_page(buffer, PG_UNSAFE_CLEAR);
 	buffer = NULL;
 
+	/*
+	 * Allocate a temporary bitmap to create a copy of zero_bm in
+	 * safe pages. This allocation needs to be done before marking
+	 * unsafe pages below so that it can be freed without altering
+	 * the state of unsafe pages.
+	 */
+	error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY);
+	if (error)
+		goto Free;
+
 	nr_highmem = count_highmem_image_pages(bm);
 	mark_unsafe_pages(bm);
 
@@ -2646,12 +2656,7 @@  static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
 	duplicate_memory_bitmap(new_bm, bm);
 	memory_bm_free(bm, PG_UNSAFE_KEEP);
 
-	/* Make a copy of zero_bm so it can be created in safe pages */
-	error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY);
-	if (error)
-		goto Free;
-
-	duplicate_memory_bitmap(&tmp, zero_bm);
+	duplicate_memory_bitmap(&tmp_zero_bm, zero_bm);
 	memory_bm_free(zero_bm, PG_UNSAFE_KEEP);
 
 	/* Recreate zero_bm in safe pages */
@@ -2659,8 +2664,8 @@  static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm,
 	if (error)
 		goto Free;
 
-	duplicate_memory_bitmap(zero_bm, &tmp);
-	memory_bm_free(&tmp, PG_UNSAFE_KEEP);
+	duplicate_memory_bitmap(zero_bm, &tmp_zero_bm);
+	memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP);
 	/* At this point zero_bm is in safe pages and it can be used for restoring. */
 
 	if (nr_highmem > 0) {