diff mbox series

[v3,5/6] mm: Add mirror flag back on initrd memory

Message ID 20220607093805.1354256-6-mawupeng1@huawei.com
State New
Headers show
Series introduce mirrored memory support for arm64 | expand

Commit Message

mawupeng June 7, 2022, 9:38 a.m. UTC
From: Ma Wupeng <mawupeng1@huawei.com>

Initrd memory will be removed and then added in arm64_memblock_init() and this
will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
the lower 4G range has some non-mirrored memory.

In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
reinstalled if the origin memblock has this flag.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
 arch/arm64/mm/init.c     |  9 +++++++++
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 20 ++++++++++++++++++++
 3 files changed, 30 insertions(+)

Comments

David Hildenbrand June 7, 2022, 12:21 p.m. UTC | #1
On 07.06.22 11:38, Wupeng Ma wrote:
> From: Ma Wupeng <mawupeng1@huawei.com>
> 
> Initrd memory will be removed and then added in arm64_memblock_init() and this
> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
> the lower 4G range has some non-mirrored memory.
> 
> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
> reinstalled if the origin memblock has this flag.
> 
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>  arch/arm64/mm/init.c     |  9 +++++++++
>  include/linux/memblock.h |  1 +
>  mm/memblock.c            | 20 ++++++++++++++++++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 339ee84e5a61..11641f924d08 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
>  			"initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>  			phys_initrd_size = 0;
>  		} else {
> +			int flags, ret;
> +
> +			ret = memblock_get_flags(base, &flags);
> +			if (ret)
> +				flags = 0;
> +
>  			memblock_remove(base, size); /* clear MEMBLOCK_ flags */
>  			memblock_add(base, size);
>  			memblock_reserve(base, size);

Can you explain why we're removing+re-adding here exactly? Is it just to
clear flags as the comment indicates?


If it's really just about clearing flags, I wonder if we rather want to
have an interface that does exactly that, and hides the way this is
actually implemented (obtain flags, remove, re-add ...), internally.

But most probably there is more magic in the code and clearing flags
isn't all it ends up doing.
mawupeng June 8, 2022, 7:27 a.m. UTC | #2
在 2022/6/7 22:49, Ard Biesheuvel 写道:
> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.06.22 11:38, Wupeng Ma wrote:
>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>
>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
>>> the lower 4G range has some non-mirrored memory.
>>>
>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
>>> reinstalled if the origin memblock has this flag.
>>>
>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>> ---
>>>   arch/arm64/mm/init.c     |  9 +++++++++
>>>   include/linux/memblock.h |  1 +
>>>   mm/memblock.c            | 20 ++++++++++++++++++++
>>>   3 files changed, 30 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 339ee84e5a61..11641f924d08 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
>>>                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>>>                        phys_initrd_size = 0;
>>>                } else {
>>> +                     int flags, ret;
>>> +
>>> +                     ret = memblock_get_flags(base, &flags);
>>> +                     if (ret)
>>> +                             flags = 0;
>>> +
>>>                        memblock_remove(base, size); /* clear MEMBLOCK_ flags */
>>>                        memblock_add(base, size);
>>>                        memblock_reserve(base, size);
>>
>> Can you explain why we're removing+re-adding here exactly? Is it just to
>> clear flags as the comment indicates?
>>
> 
> This should only happen if the placement of the initrd conflicts with
> a mem= command line parameter or it is not covered by memblock for
> some other reason.
> 
> IOW, this should never happen, and if re-memblock_add'ing this memory
> unconditionally is causing problems, we should fix that instead of
> working around it.

This will happen if we use initrdmem=3G,100M to reserve initrd memory below
the 4G limit to test this scenario(just for testing, I have trouble to boot
qemu with initrd enabled and memory below 4G are all mirror memory).

Re-memblock_add'ing this memory unconditionally seems fine but clear all
flags(especially MEMBLOCK_MIRROR) may lead to some error log.

> 
>> If it's really just about clearing flags, I wonder if we rather want to
>> have an interface that does exactly that, and hides the way this is
>> actually implemented (obtain flags, remove, re-add ...), internally.
>>
>> But most probably there is more magic in the code and clearing flags
>> isn't all it ends up doing.
>>
> 
> I don't remember exactly why we needed to clear the flags, but I think
> it had to do with some corner case we hit when the initrd was
> partially covered.
If "mem=" is set in command line, memblock_mem_limit_remove_map() will
remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
memory back if this initrd mem has the MEMBLOCK_NOMAP flag?

The rfc version [1] introduce and use memblock_clear_nomap() to clear the
MEMBLOCK_NOMAP of this initrd memblock.
So maybe the usage of memblock_remove() is just to avoid introducing new
function(memblock_clear_nomap)?

Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
to solve this problem rather than bring flag MEMBLOCK_MIRROR back?

[1] https://lore.kernel.org/linux-arm-kernel/20160202180622.GP10166@arm.com/T/#t
> .
Mike Rapoport June 8, 2022, 10:02 a.m. UTC | #3
On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
> 
> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
> > On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
> > > 
> > > On 07.06.22 11:38, Wupeng Ma wrote:
> > > > From: Ma Wupeng <mawupeng1@huawei.com>
> > > > 
> > > > Initrd memory will be removed and then added in arm64_memblock_init() and this
> > > > will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
> > > > flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
> > > > the lower 4G range has some non-mirrored memory.
> > > > 
> > > > In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
> > > > reinstalled if the origin memblock has this flag.
> > > > 
> > > > Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> > > > ---
> > > >   arch/arm64/mm/init.c     |  9 +++++++++
> > > >   include/linux/memblock.h |  1 +
> > > >   mm/memblock.c            | 20 ++++++++++++++++++++
> > > >   3 files changed, 30 insertions(+)
> > > > 
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 339ee84e5a61..11641f924d08 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
> > > >                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
> > > >                        phys_initrd_size = 0;
> > > >                } else {
> > > > +                     int flags, ret;
> > > > +
> > > > +                     ret = memblock_get_flags(base, &flags);
> > > > +                     if (ret)
> > > > +                             flags = 0;
> > > > +
> > > >                        memblock_remove(base, size); /* clear MEMBLOCK_ flags */
> > > >                        memblock_add(base, size);
> > > >                        memblock_reserve(base, size);
> > > 
> > > Can you explain why we're removing+re-adding here exactly? Is it just to
> > > clear flags as the comment indicates?
> > > 
> > 
> > This should only happen if the placement of the initrd conflicts with
> > a mem= command line parameter or it is not covered by memblock for
> > some other reason.
> > 
> > IOW, this should never happen, and if re-memblock_add'ing this memory
> > unconditionally is causing problems, we should fix that instead of
> > working around it.
> 
> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
> the 4G limit to test this scenario(just for testing, I have trouble to boot
> qemu with initrd enabled and memory below 4G are all mirror memory).
> 
> Re-memblock_add'ing this memory unconditionally seems fine but clear all
> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
> 
> > 
> > > If it's really just about clearing flags, I wonder if we rather want to
> > > have an interface that does exactly that, and hides the way this is
> > > actually implemented (obtain flags, remove, re-add ...), internally.
> > > 
> > > But most probably there is more magic in the code and clearing flags
> > > isn't all it ends up doing.
> > > 
> > 
> > I don't remember exactly why we needed to clear the flags, but I think
> > it had to do with some corner case we hit when the initrd was
> > partially covered.
> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
> 
> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
> MEMBLOCK_NOMAP of this initrd memblock.
> So maybe the usage of memblock_remove() is just to avoid introducing new
> function(memblock_clear_nomap)?
> 
> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?

AFAICT, there are two corner cases that re-adding initrd memory covers:
* initrd memory is not a part of the memory reported to memblock, either
because of firmware weirdness or because it was cut out with mem=
* initrd memory overlaps a NOMAP region

So to make sure initrd memory is mapped properly and retains
MEMBLOCK_MIRROR I think the best we can do is

	memblock_add();
	memblock_clear_nomap();
	memblock_reserve();

 
> [1] https://lore.kernel.org/linux-arm-kernel/20160202180622.GP10166@arm.com/T/#t
> > .
David Hildenbrand June 8, 2022, 10:08 a.m. UTC | #4
On 08.06.22 12:02, Mike Rapoport wrote:
> On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
>>
>> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
>>> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 07.06.22 11:38, Wupeng Ma wrote:
>>>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>>>
>>>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
>>>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
>>>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
>>>>> the lower 4G range has some non-mirrored memory.
>>>>>
>>>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
>>>>> reinstalled if the origin memblock has this flag.
>>>>>
>>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>>>> ---
>>>>>   arch/arm64/mm/init.c     |  9 +++++++++
>>>>>   include/linux/memblock.h |  1 +
>>>>>   mm/memblock.c            | 20 ++++++++++++++++++++
>>>>>   3 files changed, 30 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>> index 339ee84e5a61..11641f924d08 100644
>>>>> --- a/arch/arm64/mm/init.c
>>>>> +++ b/arch/arm64/mm/init.c
>>>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
>>>>>                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>>>>>                        phys_initrd_size = 0;
>>>>>                } else {
>>>>> +                     int flags, ret;
>>>>> +
>>>>> +                     ret = memblock_get_flags(base, &flags);
>>>>> +                     if (ret)
>>>>> +                             flags = 0;
>>>>> +
>>>>>                        memblock_remove(base, size); /* clear MEMBLOCK_ flags */
>>>>>                        memblock_add(base, size);
>>>>>                        memblock_reserve(base, size);
>>>>
>>>> Can you explain why we're removing+re-adding here exactly? Is it just to
>>>> clear flags as the comment indicates?
>>>>
>>>
>>> This should only happen if the placement of the initrd conflicts with
>>> a mem= command line parameter or it is not covered by memblock for
>>> some other reason.
>>>
>>> IOW, this should never happen, and if re-memblock_add'ing this memory
>>> unconditionally is causing problems, we should fix that instead of
>>> working around it.
>>
>> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
>> the 4G limit to test this scenario(just for testing, I have trouble to boot
>> qemu with initrd enabled and memory below 4G are all mirror memory).
>>
>> Re-memblock_add'ing this memory unconditionally seems fine but clear all
>> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
>>
>>>
>>>> If it's really just about clearing flags, I wonder if we rather want to
>>>> have an interface that does exactly that, and hides the way this is
>>>> actually implemented (obtain flags, remove, re-add ...), internally.
>>>>
>>>> But most probably there is more magic in the code and clearing flags
>>>> isn't all it ends up doing.
>>>>
>>>
>>> I don't remember exactly why we needed to clear the flags, but I think
>>> it had to do with some corner case we hit when the initrd was
>>> partially covered.
>> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
>> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
>> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
>>
>> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
>> MEMBLOCK_NOMAP of this initrd memblock.
>> So maybe the usage of memblock_remove() is just to avoid introducing new
>> function(memblock_clear_nomap)?
>>
>> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
>> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
>> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?
> 
> AFAICT, there are two corner cases that re-adding initrd memory covers:
> * initrd memory is not a part of the memory reported to memblock, either
> because of firmware weirdness or because it was cut out with mem=
> * initrd memory overlaps a NOMAP region
> 
> So to make sure initrd memory is mapped properly and retains
> MEMBLOCK_MIRROR I think the best we can do is
> 
> 	memblock_add();
> 	memblock_clear_nomap();
> 	memblock_reserve();

Would simply detect+rejecting to boot on such setups be an option? The
replies so far indicate to me that this is rather a corner case than a
reasonable use case.
Ard Biesheuvel June 8, 2022, 10:12 a.m. UTC | #5
On Wed, 8 Jun 2022 at 12:08, David Hildenbrand <david@redhat.com> wrote:
>
> On 08.06.22 12:02, Mike Rapoport wrote:
> > On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
> >>
> >> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
> >>> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 07.06.22 11:38, Wupeng Ma wrote:
> >>>>> From: Ma Wupeng <mawupeng1@huawei.com>
> >>>>>
> >>>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
> >>>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
> >>>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
> >>>>> the lower 4G range has some non-mirrored memory.
> >>>>>
> >>>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
> >>>>> reinstalled if the origin memblock has this flag.
> >>>>>
> >>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >>>>> ---
> >>>>>   arch/arm64/mm/init.c     |  9 +++++++++
> >>>>>   include/linux/memblock.h |  1 +
> >>>>>   mm/memblock.c            | 20 ++++++++++++++++++++
> >>>>>   3 files changed, 30 insertions(+)
> >>>>>
> >>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >>>>> index 339ee84e5a61..11641f924d08 100644
> >>>>> --- a/arch/arm64/mm/init.c
> >>>>> +++ b/arch/arm64/mm/init.c
> >>>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
> >>>>>                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
> >>>>>                        phys_initrd_size = 0;
> >>>>>                } else {
> >>>>> +                     int flags, ret;
> >>>>> +
> >>>>> +                     ret = memblock_get_flags(base, &flags);
> >>>>> +                     if (ret)
> >>>>> +                             flags = 0;
> >>>>> +
> >>>>>                        memblock_remove(base, size); /* clear MEMBLOCK_ flags */
> >>>>>                        memblock_add(base, size);
> >>>>>                        memblock_reserve(base, size);
> >>>>
> >>>> Can you explain why we're removing+re-adding here exactly? Is it just to
> >>>> clear flags as the comment indicates?
> >>>>
> >>>
> >>> This should only happen if the placement of the initrd conflicts with
> >>> a mem= command line parameter or it is not covered by memblock for
> >>> some other reason.
> >>>
> >>> IOW, this should never happen, and if re-memblock_add'ing this memory
> >>> unconditionally is causing problems, we should fix that instead of
> >>> working around it.
> >>
> >> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
> >> the 4G limit to test this scenario(just for testing, I have trouble to boot
> >> qemu with initrd enabled and memory below 4G are all mirror memory).
> >>
> >> Re-memblock_add'ing this memory unconditionally seems fine but clear all
> >> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
> >>
> >>>
> >>>> If it's really just about clearing flags, I wonder if we rather want to
> >>>> have an interface that does exactly that, and hides the way this is
> >>>> actually implemented (obtain flags, remove, re-add ...), internally.
> >>>>
> >>>> But most probably there is more magic in the code and clearing flags
> >>>> isn't all it ends up doing.
> >>>>
> >>>
> >>> I don't remember exactly why we needed to clear the flags, but I think
> >>> it had to do with some corner case we hit when the initrd was
> >>> partially covered.
> >> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
> >> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
> >> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
> >>
> >> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
> >> MEMBLOCK_NOMAP of this initrd memblock.
> >> So maybe the usage of memblock_remove() is just to avoid introducing new
> >> function(memblock_clear_nomap)?
> >>
> >> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
> >> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
> >> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?
> >
> > AFAICT, there are two corner cases that re-adding initrd memory covers:
> > * initrd memory is not a part of the memory reported to memblock, either
> > because of firmware weirdness or because it was cut out with mem=
> > * initrd memory overlaps a NOMAP region
> >
> > So to make sure initrd memory is mapped properly and retains
> > MEMBLOCK_MIRROR I think the best we can do is
> >
> >       memblock_add();
> >       memblock_clear_nomap();
> >       memblock_reserve();
>
> Would simply detect+rejecting to boot on such setups be an option? The
> replies so far indicate to me that this is rather a corner case than a
> reasonable use case.
>

The sad reality is that mem= is known to be used in production for
limiting the amount of memory that the kernel takes control of, in
order to allow the remainder to be used in platform specific ways.

Of course, there are much better ways to achieve that, but given that
we currently support it, I don't think we can easily back that out.

I do think that there is no need to go out of our way to make this
case work seamlessly with mirrored memory, though. So I'd prefer to
make the remove+re-add conditional on there actually being a need to
do so. That way, we don't break the old use case or mirrored memory,
and whatever happens when the two are combined is DONTCARE.
mawupeng June 9, 2022, 8:15 a.m. UTC | #6
在 2022/6/8 18:12, Ard Biesheuvel 写道:
> On Wed, 8 Jun 2022 at 12:08, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.06.22 12:02, Mike Rapoport wrote:
>>> On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
>>>>
>>>> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
>>>>> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 07.06.22 11:38, Wupeng Ma wrote:
>>>>>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>>>>>
>>>>>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
>>>>>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
>>>>>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
>>>>>>> the lower 4G range has some non-mirrored memory.
>>>>>>>
>>>>>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
>>>>>>> reinstalled if the origin memblock has this flag.
>>>>>>>
>>>>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>>>>>> ---
>>>>>>>    arch/arm64/mm/init.c     |  9 +++++++++
>>>>>>>    include/linux/memblock.h |  1 +
>>>>>>>    mm/memblock.c            | 20 ++++++++++++++++++++
>>>>>>>    3 files changed, 30 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>>>> index 339ee84e5a61..11641f924d08 100644
>>>>>>> --- a/arch/arm64/mm/init.c
>>>>>>> +++ b/arch/arm64/mm/init.c
>>>>>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
>>>>>>>                         "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>>>>>>>                         phys_initrd_size = 0;
>>>>>>>                 } else {
>>>>>>> +                     int flags, ret;
>>>>>>> +
>>>>>>> +                     ret = memblock_get_flags(base, &flags);
>>>>>>> +                     if (ret)
>>>>>>> +                             flags = 0;
>>>>>>> +
>>>>>>>                         memblock_remove(base, size); /* clear MEMBLOCK_ flags */
>>>>>>>                         memblock_add(base, size);
>>>>>>>                         memblock_reserve(base, size);
>>>>>>
>>>>>> Can you explain why we're removing+re-adding here exactly? Is it just to
>>>>>> clear flags as the comment indicates?
>>>>>>
>>>>>
>>>>> This should only happen if the placement of the initrd conflicts with
>>>>> a mem= command line parameter or it is not covered by memblock for
>>>>> some other reason.
>>>>>
>>>>> IOW, this should never happen, and if re-memblock_add'ing this memory
>>>>> unconditionally is causing problems, we should fix that instead of
>>>>> working around it.
>>>>
>>>> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
>>>> the 4G limit to test this scenario(just for testing, I have trouble to boot
>>>> qemu with initrd enabled and memory below 4G are all mirror memory).
>>>>
>>>> Re-memblock_add'ing this memory unconditionally seems fine but clear all
>>>> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
>>>>
>>>>>
>>>>>> If it's really just about clearing flags, I wonder if we rather want to
>>>>>> have an interface that does exactly that, and hides the way this is
>>>>>> actually implemented (obtain flags, remove, re-add ...), internally.
>>>>>>
>>>>>> But most probably there is more magic in the code and clearing flags
>>>>>> isn't all it ends up doing.
>>>>>>
>>>>>
>>>>> I don't remember exactly why we needed to clear the flags, but I think
>>>>> it had to do with some corner case we hit when the initrd was
>>>>> partially covered.
>>>> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
>>>> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
>>>> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
>>>>
>>>> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
>>>> MEMBLOCK_NOMAP of this initrd memblock.
>>>> So maybe the usage of memblock_remove() is just to avoid introducing new
>>>> function(memblock_clear_nomap)?
>>>>
>>>> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
>>>> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
>>>> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?
>>>
>>> AFAICT, there are two corner cases that re-adding initrd memory covers:
>>> * initrd memory is not a part of the memory reported to memblock, either
>>> because of firmware weirdness or because it was cut out with mem=
>>> * initrd memory overlaps a NOMAP region
>>>
>>> So to make sure initrd memory is mapped properly and retains
>>> MEMBLOCK_MIRROR I think the best we can do is
>>>
>>>        memblock_add();
>>>        memblock_clear_nomap();
>>>        memblock_reserve();
>>
>> Would simply detect+rejecting to boot on such setups be an option? The
>> replies so far indicate to me that this is rather a corner case than a
>> reasonable use case.
>>
> 
> The sad reality is that mem= is known to be used in production for
> limiting the amount of memory that the kernel takes control of, in
> order to allow the remainder to be used in platform specific ways.
> 
> Of course, there are much better ways to achieve that, but given that
> we currently support it, I don't think we can easily back that out.
> 
> I do think that there is no need to go out of our way to make this
> case work seamlessly with mirrored memory, though. So I'd prefer to
> make the remove+re-add conditional on there actually being a need to
> do so. That way, we don't break the old use case or mirrored memory,
> and whatever happens when the two are combined is DONTCARE.

Does that mean that we don't need to care about this scenario with
mirror memory?

Thanks for reviewing.

> .
Ard Biesheuvel June 10, 2022, 11:06 a.m. UTC | #7
On Thu, 9 Jun 2022 at 10:16, mawupeng <mawupeng1@huawei.com> wrote:
>
>
>
> 在 2022/6/8 18:12, Ard Biesheuvel 写道:
> > On Wed, 8 Jun 2022 at 12:08, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 08.06.22 12:02, Mike Rapoport wrote:
> >>> On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
> >>>>
> >>>> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
> >>>>> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>> On 07.06.22 11:38, Wupeng Ma wrote:
> >>>>>>> From: Ma Wupeng <mawupeng1@huawei.com>
> >>>>>>>
> >>>>>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
> >>>>>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
> >>>>>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
> >>>>>>> the lower 4G range has some non-mirrored memory.
> >>>>>>>
> >>>>>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
> >>>>>>> reinstalled if the origin memblock has this flag.
> >>>>>>>
> >>>>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >>>>>>> ---
> >>>>>>>    arch/arm64/mm/init.c     |  9 +++++++++
> >>>>>>>    include/linux/memblock.h |  1 +
> >>>>>>>    mm/memblock.c            | 20 ++++++++++++++++++++
> >>>>>>>    3 files changed, 30 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >>>>>>> index 339ee84e5a61..11641f924d08 100644
> >>>>>>> --- a/arch/arm64/mm/init.c
> >>>>>>> +++ b/arch/arm64/mm/init.c
> >>>>>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
> >>>>>>>                         "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
> >>>>>>>                         phys_initrd_size = 0;
> >>>>>>>                 } else {
> >>>>>>> +                     int flags, ret;
> >>>>>>> +
> >>>>>>> +                     ret = memblock_get_flags(base, &flags);
> >>>>>>> +                     if (ret)
> >>>>>>> +                             flags = 0;
> >>>>>>> +
> >>>>>>>                         memblock_remove(base, size); /* clear MEMBLOCK_ flags */
> >>>>>>>                         memblock_add(base, size);
> >>>>>>>                         memblock_reserve(base, size);
> >>>>>>
> >>>>>> Can you explain why we're removing+re-adding here exactly? Is it just to
> >>>>>> clear flags as the comment indicates?
> >>>>>>
> >>>>>
> >>>>> This should only happen if the placement of the initrd conflicts with
> >>>>> a mem= command line parameter or it is not covered by memblock for
> >>>>> some other reason.
> >>>>>
> >>>>> IOW, this should never happen, and if re-memblock_add'ing this memory
> >>>>> unconditionally is causing problems, we should fix that instead of
> >>>>> working around it.
> >>>>
> >>>> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
> >>>> the 4G limit to test this scenario(just for testing, I have trouble to boot
> >>>> qemu with initrd enabled and memory below 4G are all mirror memory).
> >>>>
> >>>> Re-memblock_add'ing this memory unconditionally seems fine but clear all
> >>>> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
> >>>>
> >>>>>
> >>>>>> If it's really just about clearing flags, I wonder if we rather want to
> >>>>>> have an interface that does exactly that, and hides the way this is
> >>>>>> actually implemented (obtain flags, remove, re-add ...), internally.
> >>>>>>
> >>>>>> But most probably there is more magic in the code and clearing flags
> >>>>>> isn't all it ends up doing.
> >>>>>>
> >>>>>
> >>>>> I don't remember exactly why we needed to clear the flags, but I think
> >>>>> it had to do with some corner case we hit when the initrd was
> >>>>> partially covered.
> >>>> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
> >>>> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
> >>>> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
> >>>>
> >>>> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
> >>>> MEMBLOCK_NOMAP of this initrd memblock.
> >>>> So maybe the usage of memblock_remove() is just to avoid introducing new
> >>>> function(memblock_clear_nomap)?
> >>>>
> >>>> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
> >>>> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
> >>>> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?
> >>>
> >>> AFAICT, there are two corner cases that re-adding initrd memory covers:
> >>> * initrd memory is not a part of the memory reported to memblock, either
> >>> because of firmware weirdness or because it was cut out with mem=
> >>> * initrd memory overlaps a NOMAP region
> >>>
> >>> So to make sure initrd memory is mapped properly and retains
> >>> MEMBLOCK_MIRROR I think the best we can do is
> >>>
> >>>        memblock_add();
> >>>        memblock_clear_nomap();
> >>>        memblock_reserve();
> >>
> >> Would simply detect+rejecting to boot on such setups be an option? The
> >> replies so far indicate to me that this is rather a corner case than a
> >> reasonable use case.
> >>
> >
> > The sad reality is that mem= is known to be used in production for
> > limiting the amount of memory that the kernel takes control of, in
> > order to allow the remainder to be used in platform specific ways.
> >
> > Of course, there are much better ways to achieve that, but given that
> > we currently support it, I don't think we can easily back that out.
> >
> > I do think that there is no need to go out of our way to make this
> > case work seamlessly with mirrored memory, though. So I'd prefer to
> > make the remove+re-add conditional on there actually being a need to
> > do so. That way, we don't break the old use case or mirrored memory,
> > and whatever happens when the two are combined is DONTCARE.
>
> Does that mean that we don't need to care about this scenario with
> mirror memory?
>

We shouldn't, but we do, unfortunately.

So we should probably adopt the sequence suggested by Mike.
diff mbox series

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 339ee84e5a61..11641f924d08 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -350,9 +350,18 @@  void __init arm64_memblock_init(void)
 			"initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
 			phys_initrd_size = 0;
 		} else {
+			int flags, ret;
+
+			ret = memblock_get_flags(base, &flags);
+			if (ret)
+				flags = 0;
+
 			memblock_remove(base, size); /* clear MEMBLOCK_ flags */
 			memblock_add(base, size);
 			memblock_reserve(base, size);
+
+			if (flags & MEMBLOCK_MIRROR)
+				memblock_mark_mirror(base, size);
 		}
 	}
 
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..3d6a382ac9c8 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -487,6 +487,7 @@  bool memblock_is_map_memory(phys_addr_t addr);
 bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
 bool memblock_is_reserved(phys_addr_t addr);
 bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);
+int memblock_get_flags(phys_addr_t base, int *flags);
 
 void memblock_dump_all(void);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index b1d2a0009733..0c5b5699af6e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1796,6 +1796,26 @@  int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
 	return memblock_get_region_node(&type->regions[mid]);
 }
 
+/**
+ * memblock_get_flags - get a single memblock flags
+ * @base: base of memeblock to get
+ *
+ * Get the flags of memeblock with base: @base
+ *
+ * Return:
+ * 0 if ok, non-zero if fail
+ */
+int __init_memblock memblock_get_flags(phys_addr_t base, int *flags)
+{
+	int idx = memblock_search(&memblock.memory, base);
+
+	if (idx == -1)
+		return -EINVAL;
+
+	*flags = memblock.memory.regions[idx].flags;
+	return 0;
+}
+
 /**
  * memblock_is_region_memory - check if a region is a subset of memory
  * @base: base of region to check