diff mbox

[2/3] arm64: efi: Ensure efi_create_mapping() does not map overlapping regions

Message ID 1464707672-21882-3-git-send-email-catalin.marinas@arm.com
State New
Headers show

Commit Message

Catalin Marinas May 31, 2016, 3:14 p.m. UTC
Since the EFI page size is 4KB, it is possible for a !4KB page kernel to
align an EFI runtime map boundaries in a way that they can overlap
within the same page. This requires the current create_pgd_mapping()
code to be able to split existing larger mappings when an overlapping
region needs to be mapped.

With this patch, efi_create_mapping() scans the EFI memory map for
overlapping regions and trims the length of the current map to avoid a
large block mapping and subsequent split.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

---
 arch/arm64/kernel/efi.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Catalin Marinas June 2, 2016, 4:56 p.m. UTC | #1
On Thu, Jun 02, 2016 at 03:52:46PM +0100, Matt Fleming wrote:
> On Tue, 31 May, at 04:14:31PM, Catalin Marinas wrote:

> > Since the EFI page size is 4KB, it is possible for a !4KB page kernel to

> > align an EFI runtime map boundaries in a way that they can overlap

> > within the same page. This requires the current create_pgd_mapping()

> > code to be able to split existing larger mappings when an overlapping

> > region needs to be mapped.

> > 

> > With this patch, efi_create_mapping() scans the EFI memory map for

> > overlapping regions and trims the length of the current map to avoid a

> > large block mapping and subsequent split.

> > 

> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > ---

> >  arch/arm64/kernel/efi.c | 22 +++++++++++++++++++---

> >  1 file changed, 19 insertions(+), 3 deletions(-)

> > 

> > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> > index 78f52488f9ff..0d5753c31c7f 100644

> > --- a/arch/arm64/kernel/efi.c

> > +++ b/arch/arm64/kernel/efi.c

> > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> >  {

> >  	pteval_t prot_val = create_mapping_protection(md);

> > +	phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> > +	efi_memory_desc_t *next = md;

> >  

> > -	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> > -			   md->num_pages << EFI_PAGE_SHIFT,

> > -			   __pgprot(prot_val | PTE_NG));

> > +	/*

> > +	 * Search for the next EFI runtime map and check for any overlap with

> > +	 * the current map when aligned to PAGE_SIZE. In such case, defer

> > +	 * mapping the end of the current range until the next

> > +	 * efi_create_mapping() call.

> > +	 */

> > +	for_each_efi_memory_desc_continue(next) {

> > +		if (!(next->attribute & EFI_MEMORY_RUNTIME))

> > +			continue;

> > +		if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> > +			length -= (md->phys_addr + length) & ~PAGE_MASK;

> > +		break;

> > +	}

> > +

> > +	if (length)

> > +		create_pgd_mapping(mm, md->phys_addr, md->virt_addr, length,

> > +				   __pgprot(prot_val | PTE_NG));

> >  	return 0;

> >  }

> >  

> 

> Is this mapping in chunks scheme required because of the EFI

> Properties Table restriction whereby relative offsets between regions

> must be maintained?

> 

> Because if that's not the reason, I'm wondering why you can't simply

> update efi_get_virtmap() to align the virtual addresses to 64K?


Ard to confirm but I think the reason is the relative offset between
code and data regions that must be preserved. For example, on Juno I
get:

[    0.000000] efi:   0x0009fff6e000-0x0009fffaefff [Runtime Code       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] efi:   0x0009fffaf000-0x0009ffffefff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*

Since the code may assume relative loads from the data section, we need
to preserve this offset (which doesn't seem 64KB aligned).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 6, 2016, 9:43 a.m. UTC | #2
On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Since the EFI page size is 4KB, it is possible for a !4KB page kernel to

> align an EFI runtime map boundaries in a way that they can overlap

> within the same page. This requires the current create_pgd_mapping()

> code to be able to split existing larger mappings when an overlapping

> region needs to be mapped.

>

> With this patch, efi_create_mapping() scans the EFI memory map for

> overlapping regions and trims the length of the current map to avoid a

> large block mapping and subsequent split.

>


I remember the discussion, but IIRC this was about failure to do
break-before-make in general rather than a problem with splitting in
particular.

So first of all, I would like to point out that this failure to do
break-before-make is not a problem in practice, given that the EFI
page tables are not live until much later. The reason we want to fix
this is because the EFI page tables routines are shared with the
swapper, where it /is/ a concern, and there is a desire to make those
routines more strict when it comes to architectural rules regarding
page table manipulation.

Also, the reference to block mappings on !4K pages kernels is a bit
misleading here: such block mappings are at least 32 MB in size, and
we never map runtime code or data regions that big. The problem that
was flagged before is failure to do break-before-make, based on the
observation that a valid mapping (of the page that is shared between
two regions) is overwritten with another valid mapping. However, since
the 64K (or 16K) page that is shared between the two regions is mapped
with the exact same attributes (we ignore the permissions in that
case), the old and the new valid mappings are identical, which I don't
think should be a problem.

So, if it is allowed by the architecture to overwrite a page table
entry with the same value, perhaps we could educate the
break-before-make debug routines (if we merged those, I didn't check)
to allow that. And if we want to make absolutely sure that we don't
inadvertently map more than 32 MB worth of runtime services code or
data on a 16 KB pages kernel using block mappings, we could borrow the
PT debug logic to force mapping regions that are not aligned to
PAGE_SIZE down to pages in all cases. But I don't think we need the
logic in this patch.

Another comment below.

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> ---

>  arch/arm64/kernel/efi.c | 22 +++++++++++++++++++---

>  1 file changed, 19 insertions(+), 3 deletions(-)

>

> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> index 78f52488f9ff..0d5753c31c7f 100644

> --- a/arch/arm64/kernel/efi.c

> +++ b/arch/arm64/kernel/efi.c

> @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>  {

>         pteval_t prot_val = create_mapping_protection(md);

> +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> +       efi_memory_desc_t *next = md;

>

> -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> -                          md->num_pages << EFI_PAGE_SHIFT,

> -                          __pgprot(prot_val | PTE_NG));

> +       /*

> +        * Search for the next EFI runtime map and check for any overlap with

> +        * the current map when aligned to PAGE_SIZE. In such case, defer

> +        * mapping the end of the current range until the next

> +        * efi_create_mapping() call.

> +        */

> +       for_each_efi_memory_desc_continue(next) {

> +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> +                       continue;

> +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> +                       length -= (md->phys_addr + length) & ~PAGE_MASK;


This looks fishy. We could have more than two runtime regions sharing
a 64 KB page frame, for instance, and the middle regions will have
unaligned start and end addresses, and sizes smaller than 64 KB. This
subtraction may wrap in that case, producing a bogus length.

> +               break;

> +       }

> +

> +       if (length)

> +               create_pgd_mapping(mm, md->phys_addr, md->virt_addr, length,

> +                                  __pgprot(prot_val | PTE_NG));

>         return 0;

>  }

>

> --

> To unsubscribe from this list: send the line "unsubscribe linux-efi" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 6, 2016, 5:09 p.m. UTC | #3
On Mon, Jun 06, 2016 at 11:43:01AM +0200, Ard Biesheuvel wrote:
> On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

> > Since the EFI page size is 4KB, it is possible for a !4KB page kernel to

> > align an EFI runtime map boundaries in a way that they can overlap

> > within the same page. This requires the current create_pgd_mapping()

> > code to be able to split existing larger mappings when an overlapping

> > region needs to be mapped.

> >

> > With this patch, efi_create_mapping() scans the EFI memory map for

> > overlapping regions and trims the length of the current map to avoid a

> > large block mapping and subsequent split.

> 

> I remember the discussion, but IIRC this was about failure to do

> break-before-make in general rather than a problem with splitting in

> particular.


In this instance, we don't have break-before-make issue since the pgd
used for EFI runtime services is not active. So it's just about
splitting a larger block and simplifying the arch code.

> So first of all, I would like to point out that this failure to do

> break-before-make is not a problem in practice, given that the EFI

> page tables are not live until much later. The reason we want to fix

> this is because the EFI page tables routines are shared with the

> swapper, where it /is/ a concern, and there is a desire to make those

> routines more strict when it comes to architectural rules regarding

> page table manipulation.


Exactly.

> Also, the reference to block mappings on !4K pages kernels is a bit

> misleading here: such block mappings are at least 32 MB in size, and

> we never map runtime code or data regions that big.


Jeremy has some patches in flight to use the contiguous bit on such
mappings. With 64KB x 32 pages, we get a 2MB block. I want to avoid code
splitting such contiguous block as well (which with the contiguous bit
means going back and clearing it).

> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > ---

> >  arch/arm64/kernel/efi.c | 22 +++++++++++++++++++---

> >  1 file changed, 19 insertions(+), 3 deletions(-)

> >

> > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> > index 78f52488f9ff..0d5753c31c7f 100644

> > --- a/arch/arm64/kernel/efi.c

> > +++ b/arch/arm64/kernel/efi.c

> > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> >  {

> >         pteval_t prot_val = create_mapping_protection(md);

> > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> > +       efi_memory_desc_t *next = md;

> >

> > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> > -                          md->num_pages << EFI_PAGE_SHIFT,

> > -                          __pgprot(prot_val | PTE_NG));

> > +       /*

> > +        * Search for the next EFI runtime map and check for any overlap with

> > +        * the current map when aligned to PAGE_SIZE. In such case, defer

> > +        * mapping the end of the current range until the next

> > +        * efi_create_mapping() call.

> > +        */

> > +       for_each_efi_memory_desc_continue(next) {

> > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> > +                       continue;

> > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

> 

> This looks fishy. We could have more than two runtime regions sharing

> a 64 KB page frame, for instance, and the middle regions will have

> unaligned start and end addresses, and sizes smaller than 64 KB. This

> subtraction may wrap in that case, producing a bogus length.


I don't think I get it. This subtraction is meant to reduce the length
of the current md so that it is page aligned, deferring the mapping of
the last page to the next efi_create_mapping() call. Note that there is
a "break" just after the hunk you quoted, so we only care about the next
runtime map.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 6, 2016, 5:26 p.m. UTC | #4
> On 6 jun. 2016, at 19:09, Catalin Marinas <catalin.marinas@arm.com> wrote:

> 

>> On Mon, Jun 06, 2016 at 11:43:01AM +0200, Ard Biesheuvel wrote:

>>> On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

>>> Since the EFI page size is 4KB, it is possible for a !4KB page kernel to

>>> align an EFI runtime map boundaries in a way that they can overlap

>>> within the same page. This requires the current create_pgd_mapping()

>>> code to be able to split existing larger mappings when an overlapping

>>> region needs to be mapped.

>>> 

>>> With this patch, efi_create_mapping() scans the EFI memory map for

>>> overlapping regions and trims the length of the current map to avoid a

>>> large block mapping and subsequent split.

>> 

>> I remember the discussion, but IIRC this was about failure to do

>> break-before-make in general rather than a problem with splitting in

>> particular.

> 

> In this instance, we don't have break-before-make issue since the pgd

> used for EFI runtime services is not active. So it's just about

> splitting a larger block and simplifying the arch code.

> 

>> So first of all, I would like to point out that this failure to do

>> break-before-make is not a problem in practice, given that the EFI

>> page tables are not live until much later. The reason we want to fix

>> this is because the EFI page tables routines are shared with the

>> swapper, where it /is/ a concern, and there is a desire to make those

>> routines more strict when it comes to architectural rules regarding

>> page table manipulation.

> 

> Exactly.

> 

>> Also, the reference to block mappings on !4K pages kernels is a bit

>> misleading here: such block mappings are at least 32 MB in size, and

>> we never map runtime code or data regions that big.

> 

> Jeremy has some patches in flight to use the contiguous bit on such

> mappings. With 64KB x 32 pages, we get a 2MB block. I want to avoid code

> splitting such contiguous block as well (which with the contiguous bit

> means going back and clearing it).

> 

>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

>>> ---

>>> arch/arm64/kernel/efi.c | 22 +++++++++++++++++++---

>>> 1 file changed, 19 insertions(+), 3 deletions(-)

>>> 

>>> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

>>> index 78f52488f9ff..0d5753c31c7f 100644

>>> --- a/arch/arm64/kernel/efi.c

>>> +++ b/arch/arm64/kernel/efi.c

>>> @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>>> int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>>> {

>>>        pteval_t prot_val = create_mapping_protection(md);

>>> +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

>>> +       efi_memory_desc_t *next = md;

>>> 

>>> -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

>>> -                          md->num_pages << EFI_PAGE_SHIFT,

>>> -                          __pgprot(prot_val | PTE_NG));

>>> +       /*

>>> +        * Search for the next EFI runtime map and check for any overlap with

>>> +        * the current map when aligned to PAGE_SIZE. In such case, defer

>>> +        * mapping the end of the current range until the next

>>> +        * efi_create_mapping() call.

>>> +        */

>>> +       for_each_efi_memory_desc_continue(next) {

>>> +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

>>> +                       continue;

>>> +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

>>> +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

>> 

>> This looks fishy. We could have more than two runtime regions sharing

>> a 64 KB page frame, for instance, and the middle regions will have

>> unaligned start and end addresses, and sizes smaller than 64 KB. This

>> subtraction may wrap in that case, producing a bogus length.

> 

> I don't think I get it. This subtraction is meant to reduce the length

> of the current md so that it is page aligned, deferring the mapping of

> the last page to the next efi_create_mapping() call. Note that there is

> a "break" just after the hunk you quoted, so we only care about the next

> runtime map.

> 


If the current entry is only 4k in size, and starts 56k into the 64k page frame, you subtract 60k, and you end up with a length of -56k
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 6, 2016, 5:42 p.m. UTC | #5
On Mon, Jun 06, 2016 at 06:09:50PM +0100, Catalin Marinas wrote:
> On Mon, Jun 06, 2016 at 11:43:01AM +0200, Ard Biesheuvel wrote:

> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> > > index 78f52488f9ff..0d5753c31c7f 100644

> > > --- a/arch/arm64/kernel/efi.c

> > > +++ b/arch/arm64/kernel/efi.c

> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> > >  {

> > >         pteval_t prot_val = create_mapping_protection(md);

> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> > > +       efi_memory_desc_t *next = md;

> > >

> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> > > -                          md->num_pages << EFI_PAGE_SHIFT,

> > > -                          __pgprot(prot_val | PTE_NG));

> > > +       /*

> > > +        * Search for the next EFI runtime map and check for any overlap with

> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

> > > +        * mapping the end of the current range until the next

> > > +        * efi_create_mapping() call.

> > > +        */

> > > +       for_each_efi_memory_desc_continue(next) {

> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> > > +                       continue;

> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

> > 

> > This looks fishy. We could have more than two runtime regions sharing

> > a 64 KB page frame, for instance, and the middle regions will have

> > unaligned start and end addresses, and sizes smaller than 64 KB. This

> > subtraction may wrap in that case, producing a bogus length.

> 

> I don't think I get it. This subtraction is meant to reduce the length

> of the current md so that it is page aligned, deferring the mapping of

> the last page to the next efi_create_mapping() call. Note that there is

> a "break" just after the hunk you quoted, so we only care about the next

> runtime map.


I think I get it now. If the md we are currently mapping is within a
64KB page *and* phys_addr unaligned, length can wrap through 0. I will
update the patch.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 6, 2016, 9:18 p.m. UTC | #6
On 6 June 2016 at 19:42, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Jun 06, 2016 at 06:09:50PM +0100, Catalin Marinas wrote:

>> On Mon, Jun 06, 2016 at 11:43:01AM +0200, Ard Biesheuvel wrote:

>> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

>> > > index 78f52488f9ff..0d5753c31c7f 100644

>> > > --- a/arch/arm64/kernel/efi.c

>> > > +++ b/arch/arm64/kernel/efi.c

>> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>> > >  {

>> > >         pteval_t prot_val = create_mapping_protection(md);

>> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

>> > > +       efi_memory_desc_t *next = md;

>> > >

>> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

>> > > -                          md->num_pages << EFI_PAGE_SHIFT,

>> > > -                          __pgprot(prot_val | PTE_NG));

>> > > +       /*

>> > > +        * Search for the next EFI runtime map and check for any overlap with

>> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

>> > > +        * mapping the end of the current range until the next

>> > > +        * efi_create_mapping() call.

>> > > +        */

>> > > +       for_each_efi_memory_desc_continue(next) {

>> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

>> > > +                       continue;

>> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

>> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

>> >

>> > This looks fishy. We could have more than two runtime regions sharing

>> > a 64 KB page frame, for instance, and the middle regions will have

>> > unaligned start and end addresses, and sizes smaller than 64 KB. This

>> > subtraction may wrap in that case, producing a bogus length.

>>

>> I don't think I get it. This subtraction is meant to reduce the length

>> of the current md so that it is page aligned, deferring the mapping of

>> the last page to the next efi_create_mapping() call. Note that there is

>> a "break" just after the hunk you quoted, so we only care about the next

>> runtime map.

>

> I think I get it now. If the md we are currently mapping is within a

> 64KB page *and* phys_addr unaligned, length can wrap through 0. I will

> update the patch.

>


Indeed.

Another thing I failed to mention is that the new Memory Attributes
table support may map all of the RuntimeServicesCode regions a second
time, but with a higher granularity, using RO for .text and .rodata
and NX for .data and .bss (and the PE/COFF header). Due to the higher
granularity, regions that were mapped using the contiguous bit the
first time around may be split into smaller regions. Your current code
does not address that case.

I wonder how the PT debug code deals with this, and whether there is
anything we can reuse to inhibit cont and block mappings

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 28, 2016, 4:05 p.m. UTC | #7
(Restarting the thread before I forget the entire discussion)

On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:
> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> >> > > index 78f52488f9ff..0d5753c31c7f 100644

> >> > > --- a/arch/arm64/kernel/efi.c

> >> > > +++ b/arch/arm64/kernel/efi.c

> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> >> > >  {

> >> > >         pteval_t prot_val = create_mapping_protection(md);

> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> >> > > +       efi_memory_desc_t *next = md;

> >> > >

> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

> >> > > -                          __pgprot(prot_val | PTE_NG));

> >> > > +       /*

> >> > > +        * Search for the next EFI runtime map and check for any overlap with

> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

> >> > > +        * mapping the end of the current range until the next

> >> > > +        * efi_create_mapping() call.

> >> > > +        */

> >> > > +       for_each_efi_memory_desc_continue(next) {

> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> >> > > +                       continue;

> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

[...]
> Another thing I failed to mention is that the new Memory Attributes

> table support may map all of the RuntimeServicesCode regions a second

> time, but with a higher granularity, using RO for .text and .rodata

> and NX for .data and .bss (and the PE/COFF header).


Can this not be done in a single go without multiple passes? That's what
we did for the core arm64 code, the only one left being EFI run-time
mappings.

> Due to the higher

> granularity, regions that were mapped using the contiguous bit the

> first time around may be split into smaller regions. Your current code

> does not address that case.


If the above doesn't work, the only solution would be to permanently map
these ranges as individual pages, no large blocks.

> I wonder how the PT debug code deals with this, and whether there is

> anything we can reuse to inhibit cont and block mappings


Do you mean DEBUG_PAGEALLOC? When enabled, it defaults to single page
mappings for the kernel, no sections.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 28, 2016, 4:12 p.m. UTC | #8
On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:
> (Restarting the thread before I forget the entire discussion)

>

> On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

>> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

>> >> > > index 78f52488f9ff..0d5753c31c7f 100644

>> >> > > --- a/arch/arm64/kernel/efi.c

>> >> > > +++ b/arch/arm64/kernel/efi.c

>> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>> >> > >  {

>> >> > >         pteval_t prot_val = create_mapping_protection(md);

>> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

>> >> > > +       efi_memory_desc_t *next = md;

>> >> > >

>> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

>> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

>> >> > > -                          __pgprot(prot_val | PTE_NG));

>> >> > > +       /*

>> >> > > +        * Search for the next EFI runtime map and check for any overlap with

>> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

>> >> > > +        * mapping the end of the current range until the next

>> >> > > +        * efi_create_mapping() call.

>> >> > > +        */

>> >> > > +       for_each_efi_memory_desc_continue(next) {

>> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

>> >> > > +                       continue;

>> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

>> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

> [...]

>> Another thing I failed to mention is that the new Memory Attributes

>> table support may map all of the RuntimeServicesCode regions a second

>> time, but with a higher granularity, using RO for .text and .rodata

>> and NX for .data and .bss (and the PE/COFF header).

>

> Can this not be done in a single go without multiple passes? That's what

> we did for the core arm64 code, the only one left being EFI run-time

> mappings.

>


Well, we probably could, but it is far from trivial.

>> Due to the higher

>> granularity, regions that were mapped using the contiguous bit the

>> first time around may be split into smaller regions. Your current code

>> does not address that case.

>

> If the above doesn't work, the only solution would be to permanently map

> these ranges as individual pages, no large blocks.

>


That is not unreasonable, since regions >2MB are unusual.

>> I wonder how the PT debug code deals with this, and whether there is

>> anything we can reuse to inhibit cont and block mappings

>

> Do you mean DEBUG_PAGEALLOC? When enabled, it defaults to single page

> mappings for the kernel, no sections.

>


Indeed. So we could reuse the logic that imposes that to prevent
creating regions that are subject to splitting later on. That way,
there is no need to worry about the code running twice in case the
Memory Attributes table is exposed by the firmware.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 29, 2016, 9:39 a.m. UTC | #9
On Tue, Jun 28, 2016 at 06:12:22PM +0200, Ard Biesheuvel wrote:
> On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:

> > (Restarting the thread before I forget the entire discussion)

> >

> > On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

> >> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> >> >> > > index 78f52488f9ff..0d5753c31c7f 100644

> >> >> > > --- a/arch/arm64/kernel/efi.c

> >> >> > > +++ b/arch/arm64/kernel/efi.c

> >> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> >> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> >> >> > >  {

> >> >> > >         pteval_t prot_val = create_mapping_protection(md);

> >> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> >> >> > > +       efi_memory_desc_t *next = md;

> >> >> > >

> >> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> >> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

> >> >> > > -                          __pgprot(prot_val | PTE_NG));

> >> >> > > +       /*

> >> >> > > +        * Search for the next EFI runtime map and check for any overlap with

> >> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

> >> >> > > +        * mapping the end of the current range until the next

> >> >> > > +        * efi_create_mapping() call.

> >> >> > > +        */

> >> >> > > +       for_each_efi_memory_desc_continue(next) {

> >> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> >> >> > > +                       continue;

> >> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> >> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

> > [...]

> >> Another thing I failed to mention is that the new Memory Attributes

> >> table support may map all of the RuntimeServicesCode regions a second

> >> time, but with a higher granularity, using RO for .text and .rodata

> >> and NX for .data and .bss (and the PE/COFF header).

> >

> > Can this not be done in a single go without multiple passes? That's what

> > we did for the core arm64 code, the only one left being EFI run-time

> > mappings.

> 

> Well, we probably could, but it is far from trivial.

> 

> >> Due to the higher

> >> granularity, regions that were mapped using the contiguous bit the

> >> first time around may be split into smaller regions. Your current code

> >> does not address that case.

> >

> > If the above doesn't work, the only solution would be to permanently map

> > these ranges as individual pages, no large blocks.

> 

> That is not unreasonable, since regions >2MB are unusual.


We'll have the contiguous bit supported at some point and we won't be
able to use it for EFI run-time mappings. But I don't think that's
essential, minor improvement on a non-critical path.

I'll post some patches to always use PAGE_SIZE granularity for EFI
run-time mappings.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 29, 2016, 10:03 a.m. UTC | #10
On 29 June 2016 at 11:39, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Tue, Jun 28, 2016 at 06:12:22PM +0200, Ard Biesheuvel wrote:

>> On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> > (Restarting the thread before I forget the entire discussion)

>> >

>> > On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

>> >> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> >> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

>> >> >> > > index 78f52488f9ff..0d5753c31c7f 100644

>> >> >> > > --- a/arch/arm64/kernel/efi.c

>> >> >> > > +++ b/arch/arm64/kernel/efi.c

>> >> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>> >> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>> >> >> > >  {

>> >> >> > >         pteval_t prot_val = create_mapping_protection(md);

>> >> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

>> >> >> > > +       efi_memory_desc_t *next = md;

>> >> >> > >

>> >> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

>> >> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

>> >> >> > > -                          __pgprot(prot_val | PTE_NG));

>> >> >> > > +       /*

>> >> >> > > +        * Search for the next EFI runtime map and check for any overlap with

>> >> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

>> >> >> > > +        * mapping the end of the current range until the next

>> >> >> > > +        * efi_create_mapping() call.

>> >> >> > > +        */

>> >> >> > > +       for_each_efi_memory_desc_continue(next) {

>> >> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

>> >> >> > > +                       continue;

>> >> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

>> >> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

>> > [...]

>> >> Another thing I failed to mention is that the new Memory Attributes

>> >> table support may map all of the RuntimeServicesCode regions a second

>> >> time, but with a higher granularity, using RO for .text and .rodata

>> >> and NX for .data and .bss (and the PE/COFF header).

>> >

>> > Can this not be done in a single go without multiple passes? That's what

>> > we did for the core arm64 code, the only one left being EFI run-time

>> > mappings.

>>

>> Well, we probably could, but it is far from trivial.

>>

>> >> Due to the higher

>> >> granularity, regions that were mapped using the contiguous bit the

>> >> first time around may be split into smaller regions. Your current code

>> >> does not address that case.

>> >

>> > If the above doesn't work, the only solution would be to permanently map

>> > these ranges as individual pages, no large blocks.

>>

>> That is not unreasonable, since regions >2MB are unusual.

>

> We'll have the contiguous bit supported at some point and we won't be

> able to use it for EFI run-time mappings. But I don't think that's

> essential, minor improvement on a non-critical path.

>

> I'll post some patches to always use PAGE_SIZE granularity for EFI

> run-time mappings.

>


Given that contiguous bit mappings only affect the TLB footprint, I'd
be more concerned about not using block mappings for EfiMemoryMappedIo
regions (since they may cover fairly sizable NOR flashes like the 64
MB one QEMU mach-virt exposes).

So I would recommend to only use PAGE_SIZE granularity for
EfiRuntimeServicesCode and EfiRuntimeServicesData regions, since those
are the only ones that can be expected to appear in the Memory
Attributes table, and all other regions will only be mapped a single
time.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 29, 2016, 10:50 a.m. UTC | #11
On Wed, Jun 29, 2016 at 12:03:16PM +0200, Ard Biesheuvel wrote:
> On 29 June 2016 at 11:39, Catalin Marinas <catalin.marinas@arm.com> wrote:

> > On Tue, Jun 28, 2016 at 06:12:22PM +0200, Ard Biesheuvel wrote:

> >> On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> > (Restarting the thread before I forget the entire discussion)

> >> >

> >> > On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

> >> >> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> >> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

> >> >> >> > > index 78f52488f9ff..0d5753c31c7f 100644

> >> >> >> > > --- a/arch/arm64/kernel/efi.c

> >> >> >> > > +++ b/arch/arm64/kernel/efi.c

> >> >> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

> >> >> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

> >> >> >> > >  {

> >> >> >> > >         pteval_t prot_val = create_mapping_protection(md);

> >> >> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

> >> >> >> > > +       efi_memory_desc_t *next = md;

> >> >> >> > >

> >> >> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

> >> >> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

> >> >> >> > > -                          __pgprot(prot_val | PTE_NG));

> >> >> >> > > +       /*

> >> >> >> > > +        * Search for the next EFI runtime map and check for any overlap with

> >> >> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

> >> >> >> > > +        * mapping the end of the current range until the next

> >> >> >> > > +        * efi_create_mapping() call.

> >> >> >> > > +        */

> >> >> >> > > +       for_each_efi_memory_desc_continue(next) {

> >> >> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

> >> >> >> > > +                       continue;

> >> >> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

> >> >> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

> >> > [...]

> >> >> Another thing I failed to mention is that the new Memory Attributes

> >> >> table support may map all of the RuntimeServicesCode regions a second

> >> >> time, but with a higher granularity, using RO for .text and .rodata

> >> >> and NX for .data and .bss (and the PE/COFF header).

> >> >

> >> > Can this not be done in a single go without multiple passes? That's what

> >> > we did for the core arm64 code, the only one left being EFI run-time

> >> > mappings.

> >>

> >> Well, we probably could, but it is far from trivial.

> >>

> >> >> Due to the higher

> >> >> granularity, regions that were mapped using the contiguous bit the

> >> >> first time around may be split into smaller regions. Your current code

> >> >> does not address that case.

> >> >

> >> > If the above doesn't work, the only solution would be to permanently map

> >> > these ranges as individual pages, no large blocks.

> >>

> >> That is not unreasonable, since regions >2MB are unusual.

> >

> > We'll have the contiguous bit supported at some point and we won't be

> > able to use it for EFI run-time mappings. But I don't think that's

> > essential, minor improvement on a non-critical path.

> >

> > I'll post some patches to always use PAGE_SIZE granularity for EFI

> > run-time mappings.

> 

> Given that contiguous bit mappings only affect the TLB footprint, I'd

> be more concerned about not using block mappings for EfiMemoryMappedIo

> regions (since they may cover fairly sizable NOR flashes like the 64

> MB one QEMU mach-virt exposes).


Good point.

> So I would recommend to only use PAGE_SIZE granularity for

> EfiRuntimeServicesCode and EfiRuntimeServicesData regions, since those

> are the only ones that can be expected to appear in the Memory

> Attributes table, and all other regions will only be mapped a single

> time.


Is there a possibility that EfiMemoryMappedIo share the same 64K page
with EfiRuntimeServicesCode? If it does, it won't help much with
avoiding splitting. Unless I keep a combination of these series
(checking the end/start overlap) with a forced page-only mapping for
EfiRuntimeServicesCode/Data.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel June 29, 2016, 11:03 a.m. UTC | #12
On 29 June 2016 at 12:50, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jun 29, 2016 at 12:03:16PM +0200, Ard Biesheuvel wrote:

>> On 29 June 2016 at 11:39, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> > On Tue, Jun 28, 2016 at 06:12:22PM +0200, Ard Biesheuvel wrote:

>> >> On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> >> > (Restarting the thread before I forget the entire discussion)

>> >> >

>> >> > On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

>> >> >> >> > On 31 May 2016 at 17:14, Catalin Marinas <catalin.marinas@arm.com> wrote:

>> >> >> >> > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c

>> >> >> >> > > index 78f52488f9ff..0d5753c31c7f 100644

>> >> >> >> > > --- a/arch/arm64/kernel/efi.c

>> >> >> >> > > +++ b/arch/arm64/kernel/efi.c

>> >> >> >> > > @@ -62,10 +62,26 @@ struct screen_info screen_info __section(.data);

>> >> >> >> > >  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)

>> >> >> >> > >  {

>> >> >> >> > >         pteval_t prot_val = create_mapping_protection(md);

>> >> >> >> > > +       phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;

>> >> >> >> > > +       efi_memory_desc_t *next = md;

>> >> >> >> > >

>> >> >> >> > > -       create_pgd_mapping(mm, md->phys_addr, md->virt_addr,

>> >> >> >> > > -                          md->num_pages << EFI_PAGE_SHIFT,

>> >> >> >> > > -                          __pgprot(prot_val | PTE_NG));

>> >> >> >> > > +       /*

>> >> >> >> > > +        * Search for the next EFI runtime map and check for any overlap with

>> >> >> >> > > +        * the current map when aligned to PAGE_SIZE. In such case, defer

>> >> >> >> > > +        * mapping the end of the current range until the next

>> >> >> >> > > +        * efi_create_mapping() call.

>> >> >> >> > > +        */

>> >> >> >> > > +       for_each_efi_memory_desc_continue(next) {

>> >> >> >> > > +               if (!(next->attribute & EFI_MEMORY_RUNTIME))

>> >> >> >> > > +                       continue;

>> >> >> >> > > +               if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))

>> >> >> >> > > +                       length -= (md->phys_addr + length) & ~PAGE_MASK;

>> >> > [...]

>> >> >> Another thing I failed to mention is that the new Memory Attributes

>> >> >> table support may map all of the RuntimeServicesCode regions a second

>> >> >> time, but with a higher granularity, using RO for .text and .rodata

>> >> >> and NX for .data and .bss (and the PE/COFF header).

>> >> >

>> >> > Can this not be done in a single go without multiple passes? That's what

>> >> > we did for the core arm64 code, the only one left being EFI run-time

>> >> > mappings.

>> >>

>> >> Well, we probably could, but it is far from trivial.

>> >>

>> >> >> Due to the higher

>> >> >> granularity, regions that were mapped using the contiguous bit the

>> >> >> first time around may be split into smaller regions. Your current code

>> >> >> does not address that case.

>> >> >

>> >> > If the above doesn't work, the only solution would be to permanently map

>> >> > these ranges as individual pages, no large blocks.

>> >>

>> >> That is not unreasonable, since regions >2MB are unusual.

>> >

>> > We'll have the contiguous bit supported at some point and we won't be

>> > able to use it for EFI run-time mappings. But I don't think that's

>> > essential, minor improvement on a non-critical path.

>> >

>> > I'll post some patches to always use PAGE_SIZE granularity for EFI

>> > run-time mappings.

>>

>> Given that contiguous bit mappings only affect the TLB footprint, I'd

>> be more concerned about not using block mappings for EfiMemoryMappedIo

>> regions (since they may cover fairly sizable NOR flashes like the 64

>> MB one QEMU mach-virt exposes).

>

> Good point.

>

>> So I would recommend to only use PAGE_SIZE granularity for

>> EfiRuntimeServicesCode and EfiRuntimeServicesData regions, since those

>> are the only ones that can be expected to appear in the Memory

>> Attributes table, and all other regions will only be mapped a single

>> time.

>

> Is there a possibility that EfiMemoryMappedIo share the same 64K page

> with EfiRuntimeServicesCode? If it does, it won't help much with

> avoiding splitting.


The spec does not allow it, and it would also imply that memory and
!memory share a 64 KB page frame in the hardware, which seems highly
unlikely as well.

> Unless I keep a combination of these series

> (checking the end/start overlap) with a forced page-only mapping for

> EfiRuntimeServicesCode/Data.

>


If we get rid of the splitting, the only 'issue' that remains is that
the page frame shared between two adjacent unaligned regions is mapped
twice (but the current code will always map them with the same
attribute)

So back to my question I posed a couple of posts ago: if the UEFI page
tables were live at this time (which they are not), could it ever be a
problem that a page table entry is rewritten with the exact same value
it had before (but without bbm?) If not, I think we could educate the
debug routines to allow this case (since it needs to read the entry to
check the valid bit anyway, if it needs to be strict about break
before make)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas June 29, 2016, 12:03 p.m. UTC | #13
On Wed, Jun 29, 2016 at 01:03:34PM +0200, Ard Biesheuvel wrote:
> On 29 June 2016 at 12:50, Catalin Marinas <catalin.marinas@arm.com> wrote:

> > On Wed, Jun 29, 2016 at 12:03:16PM +0200, Ard Biesheuvel wrote:

> >> On 29 June 2016 at 11:39, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> > On Tue, Jun 28, 2016 at 06:12:22PM +0200, Ard Biesheuvel wrote:

> >> >> On 28 June 2016 at 18:05, Catalin Marinas <catalin.marinas@arm.com> wrote:

> >> >> > On Mon, Jun 06, 2016 at 11:18:14PM +0200, Ard Biesheuvel wrote:

> >> >> >> Another thing I failed to mention is that the new Memory Attributes

> >> >> >> table support may map all of the RuntimeServicesCode regions a second

> >> >> >> time, but with a higher granularity, using RO for .text and .rodata

> >> >> >> and NX for .data and .bss (and the PE/COFF header).

> >> >> >

> >> >> > Can this not be done in a single go without multiple passes? That's what

> >> >> > we did for the core arm64 code, the only one left being EFI run-time

> >> >> > mappings.

> >> >>

> >> >> Well, we probably could, but it is far from trivial.

> >> >>

> >> >> >> Due to the higher

> >> >> >> granularity, regions that were mapped using the contiguous bit the

> >> >> >> first time around may be split into smaller regions. Your current code

> >> >> >> does not address that case.

> >> >> >

> >> >> > If the above doesn't work, the only solution would be to permanently map

> >> >> > these ranges as individual pages, no large blocks.

> >> >>

> >> >> That is not unreasonable, since regions >2MB are unusual.

> >> >

> >> > We'll have the contiguous bit supported at some point and we won't be

> >> > able to use it for EFI run-time mappings. But I don't think that's

> >> > essential, minor improvement on a non-critical path.

> >> >

> >> > I'll post some patches to always use PAGE_SIZE granularity for EFI

> >> > run-time mappings.

> >>

> >> Given that contiguous bit mappings only affect the TLB footprint, I'd

> >> be more concerned about not using block mappings for EfiMemoryMappedIo

> >> regions (since they may cover fairly sizable NOR flashes like the 64

> >> MB one QEMU mach-virt exposes).

> >

> > Good point.

> >

> >> So I would recommend to only use PAGE_SIZE granularity for

> >> EfiRuntimeServicesCode and EfiRuntimeServicesData regions, since those

> >> are the only ones that can be expected to appear in the Memory

> >> Attributes table, and all other regions will only be mapped a single

> >> time.

> >

> > Is there a possibility that EfiMemoryMappedIo share the same 64K page

> > with EfiRuntimeServicesCode? If it does, it won't help much with

> > avoiding splitting.

> 

> The spec does not allow it, and it would also imply that memory and

> !memory share a 64 KB page frame in the hardware, which seems highly

> unlikely as well.


I assume there isn't even a workaround if the EFI maps are broken in
this respect. But we still need to gracefully handle it and avoid a
potential kernel panic (like some BUG_ON in the arm64 page table
creation code).

> > Unless I keep a combination of these series

> > (checking the end/start overlap) with a forced page-only mapping for

> > EfiRuntimeServicesCode/Data.

> 

> If we get rid of the splitting, the only 'issue' that remains is that

> the page frame shared between two adjacent unaligned regions is mapped

> twice (but the current code will always map them with the same

> attribute)

> 

> So back to my question I posed a couple of posts ago: if the UEFI page

> tables were live at this time (which they are not), could it ever be a

> problem that a page table entry is rewritten with the exact same value

> it had before (but without bbm?) If not, I think we could educate the

> debug routines to allow this case (since it needs to read the entry to

> check the valid bit anyway, if it needs to be strict about break

> before make)


There wouldn't be any issue, we already do this in other cases like
mark_rodata_ro().

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 78f52488f9ff..0d5753c31c7f 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -62,10 +62,26 @@  struct screen_info screen_info __section(.data);
 int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 {
 	pteval_t prot_val = create_mapping_protection(md);
+	phys_addr_t length = md->num_pages << EFI_PAGE_SHIFT;
+	efi_memory_desc_t *next = md;
 
-	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
-			   md->num_pages << EFI_PAGE_SHIFT,
-			   __pgprot(prot_val | PTE_NG));
+	/*
+	 * Search for the next EFI runtime map and check for any overlap with
+	 * the current map when aligned to PAGE_SIZE. In such case, defer
+	 * mapping the end of the current range until the next
+	 * efi_create_mapping() call.
+	 */
+	for_each_efi_memory_desc_continue(next) {
+		if (!(next->attribute & EFI_MEMORY_RUNTIME))
+			continue;
+		if (next->phys_addr < PAGE_ALIGN(md->phys_addr + length))
+			length -= (md->phys_addr + length) & ~PAGE_MASK;
+		break;
+	}
+
+	if (length)
+		create_pgd_mapping(mm, md->phys_addr, md->virt_addr, length,
+				   __pgprot(prot_val | PTE_NG));
 	return 0;
 }