diff mbox

arm64: mmu: set the contiguous for kernel mappings when appropriate

Message ID 1476123164-10532-1-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel Oct. 10, 2016, 6:12 p.m. UTC
Now that we no longer allow live kernel PMDs to be split, it is safe to
start using the contiguous bit for kernel mappings. So set the contiguous
bit in the kernel page mappings for regions whose size and alignment are
suitable for this.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 arch/arm64/mm/mmu.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Mark Rutland Oct. 11, 2016, 7:44 a.m. UTC | #1
Hi Ard,

On Mon, Oct 10, 2016 at 07:12:44PM +0100, Ard Biesheuvel wrote:
> Now that we no longer allow live kernel PMDs to be split, it is safe to

> start using the contiguous bit for kernel mappings. So set the contiguous

> bit in the kernel page mappings for regions whose size and alignment are

> suitable for this.

>

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


Given the splitting is now gone, using the contiguous bit makes sense to me.

With 16K pages, we can have contiguous PMD entries. Should we handle those,
too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

Otherwise, I have some comments below.

> ---

>  arch/arm64/mm/mmu.c | 23 ++++++++++++++++++++---

>  1 file changed, 20 insertions(+), 3 deletions(-)

> 

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> index 05615a3fdc6f..c491025c6a70 100644

> --- a/arch/arm64/mm/mmu.c

> +++ b/arch/arm64/mm/mmu.c

> @@ -98,8 +98,11 @@ static phys_addr_t __init early_pgtable_alloc(void)

>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,

>  				  unsigned long end, unsigned long pfn,

>  				  pgprot_t prot,

> -				  phys_addr_t (*pgtable_alloc)(void))

> +				  phys_addr_t (*pgtable_alloc)(void),

> +				  bool allow_block_mappings)


Not a big deal, but the 'block' part here and elsewhere is now arguably
misleading (given 'block' is an architectural term).

I haven't come up with a better term, so again, not a big deal. ;)

>  {

> +	pgprot_t prot_cont = __pgprot(pgprot_val(prot) | PTE_CONT);

> +	bool cont = false;

>  	pte_t *pte;

>  

>  	BUG_ON(pmd_sect(*pmd));

> @@ -115,7 +118,20 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,

>  

>  	pte = pte_set_fixmap_offset(pmd, addr);

>  	do {

> -		set_pte(pte, pfn_pte(pfn, prot));

> +		/*

> +		 * Set the contiguous bit for the subsequent group of PTEs if

> +		 * its size and alignment are suitable.

> +		 */

> +		if (((addr | PFN_PHYS(pfn)) & ~CONT_MASK) == 0)

> +			cont = allow_block_mappings && end - addr >= CONT_SIZE;


Given we increment addr by PAGE_SIZE in the loop, isn't this only true for the
first CONT_SIZE aligned entry, and not its (intended-to-be-contiguous)
siblings?

It be better to loop over CONT_SIZE increments, and then within that, pass the
prot (with the contiguous bit set as required) to a loop with a PAGE_SIZE
increment.

Or have I misunderstood something?

> +

> +		/*

> +		 * Ensure that we do not change the contiguous bit once this

> +		 * PTE has been assigned.

> +		 */

> +		BUG_ON(!pte_none(*pte) && (cont ^ !!(pte_val(*pte) & PTE_CONT)));


IIRC, we only ever intended to mess with the AP bits when remapping an existing region.

So we could mask those out and ensure everything else is identical, rather than
checking the cont bit specifically. Likewise at the {PMD,PUD,PGD} level.

> +

> +		set_pte(pte, pfn_pte(pfn, cont ? prot_cont : prot));


It would be clearer if we just assigned to a local pte_prot variable when
checking allow_block_mappings and so on above (or split the loop as above).

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland Oct. 11, 2016, 7:48 a.m. UTC | #2
On Tue, Oct 11, 2016 at 08:44:19AM +0100, Mark Rutland wrote:
> >  {

> > +	pgprot_t prot_cont = __pgprot(pgprot_val(prot) | PTE_CONT);

> > +	bool cont = false;

> >  	pte_t *pte;

> >  

> >  	BUG_ON(pmd_sect(*pmd));

> > @@ -115,7 +118,20 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,

> >  

> >  	pte = pte_set_fixmap_offset(pmd, addr);

> >  	do {

> > -		set_pte(pte, pfn_pte(pfn, prot));

> > +		/*

> > +		 * Set the contiguous bit for the subsequent group of PTEs if

> > +		 * its size and alignment are suitable.

> > +		 */

> > +		if (((addr | PFN_PHYS(pfn)) & ~CONT_MASK) == 0)

> > +			cont = allow_block_mappings && end - addr >= CONT_SIZE;

> 

> Given we increment addr by PAGE_SIZE in the loop, isn't this only true for the

> first CONT_SIZE aligned entry, and not its (intended-to-be-contiguous)

> siblings?


Looking again, I'd mis-read the above; it will work as expected.

Sorry for the noise.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Oct. 11, 2016, 8:21 a.m. UTC | #3
On 11 October 2016 at 08:44, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,

>

> On Mon, Oct 10, 2016 at 07:12:44PM +0100, Ard Biesheuvel wrote:

>> Now that we no longer allow live kernel PMDs to be split, it is safe to

>> start using the contiguous bit for kernel mappings. So set the contiguous

>> bit in the kernel page mappings for regions whose size and alignment are

>> suitable for this.

>>

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>

> Given the splitting is now gone, using the contiguous bit makes sense to me.

>

> With 16K pages, we can have contiguous PMD entries. Should we handle those,

> too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

>


Amusingly, this was exactly the feedback I gave to Jeremy when he
proposed this functionality originally. Yes, I think it makes sense,
especially for the linear mapping of system RAM. However, I think it
makes sense for someone else (with access to actual 16k granule
capable hardware) to contribute this functionality on top of this
patch.

> Otherwise, I have some comments below.

>

>> ---

>>  arch/arm64/mm/mmu.c | 23 ++++++++++++++++++++---

>>  1 file changed, 20 insertions(+), 3 deletions(-)

>>

>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

>> index 05615a3fdc6f..c491025c6a70 100644

>> --- a/arch/arm64/mm/mmu.c

>> +++ b/arch/arm64/mm/mmu.c

>> @@ -98,8 +98,11 @@ static phys_addr_t __init early_pgtable_alloc(void)

>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,

>>                                 unsigned long end, unsigned long pfn,

>>                                 pgprot_t prot,

>> -                               phys_addr_t (*pgtable_alloc)(void))

>> +                               phys_addr_t (*pgtable_alloc)(void),

>> +                               bool allow_block_mappings)

>

> Not a big deal, but the 'block' part here and elsewhere is now arguably

> misleading (given 'block' is an architectural term).

>

> I haven't come up with a better term, so again, not a big deal. ;)

>


Indeed. I could simply call it 'allow_cont_mappings' in the context of
this function, and keep the call below as is.

>>  {

>> +     pgprot_t prot_cont = __pgprot(pgprot_val(prot) | PTE_CONT);

>> +     bool cont = false;

>>       pte_t *pte;

>>

>>       BUG_ON(pmd_sect(*pmd));

>> @@ -115,7 +118,20 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,

>>

>>       pte = pte_set_fixmap_offset(pmd, addr);

>>       do {

>> -             set_pte(pte, pfn_pte(pfn, prot));

>> +             /*

>> +              * Set the contiguous bit for the subsequent group of PTEs if

>> +              * its size and alignment are suitable.

>> +              */

>> +             if (((addr | PFN_PHYS(pfn)) & ~CONT_MASK) == 0)

>> +                     cont = allow_block_mappings && end - addr >= CONT_SIZE;

[...]
>> +

>> +             /*

>> +              * Ensure that we do not change the contiguous bit once this

>> +              * PTE has been assigned.

>> +              */

>> +             BUG_ON(!pte_none(*pte) && (cont ^ !!(pte_val(*pte) & PTE_CONT)));

>

> IIRC, we only ever intended to mess with the AP bits when remapping an existing region.

>

> So we could mask those out and ensure everything else is identical, rather than

> checking the cont bit specifically. Likewise at the {PMD,PUD,PGD} level.

>


Yes, that should be better, I can put that in a separate preparatory patch.

>> +

>> +             set_pte(pte, pfn_pte(pfn, cont ? prot_cont : prot));

>

> It would be clearer if we just assigned to a local pte_prot variable when

> checking allow_block_mappings and so on above (or split the loop as above).

>


Well, the local pte_prot variable's scope should still cover the
entire function, since cont does not change value at each iteration.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Oct. 11, 2016, 9:09 a.m. UTC | #4
On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:
>

>

> On 11 October 2016 at 08:44, Mark Rutland <mark.rutland@arm.com> wrote:

>>

>> Hi Ard,

>>

>> On Mon, Oct 10, 2016 at 07:12:44PM +0100, Ard Biesheuvel wrote:

>> > Now that we no longer allow live kernel PMDs to be split, it is safe to

>> > start using the contiguous bit for kernel mappings. So set the

>> > contiguous

>> > bit in the kernel page mappings for regions whose size and alignment are

>> > suitable for this.

>> >

>> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>>

>> Given the splitting is now gone, using the contiguous bit makes sense to

>> me.

>>

>> With 16K pages, we can have contiguous PMD entries. Should we handle

>> those,

>> too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

>>

>> Otherwise, I have some comments below.

>

>

> Hi,

>

> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

> CONT_PTE_SHIFT

> CONT_PMD_SHIFT

> CONT_PTES

> CONT_PMDS

> CONT_PTE_SIZE

> CONT_PTE_MASK

> ...

>

> which are used by the contiguous hint HugeTLB code.

> Can those be adopted instead of CONT_MASK and CONT_SIZE?

>


That seems more appropriate, yes. I wonder why we have CONT_MASK and
CONT_SIZE in the first place then.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Oct. 11, 2016, 11:17 a.m. UTC | #5
On 11 October 2016 at 10:09, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:

>>

>>

>> On 11 October 2016 at 08:44, Mark Rutland <mark.rutland@arm.com> wrote:

>>>

>>> Hi Ard,

>>>

>>> On Mon, Oct 10, 2016 at 07:12:44PM +0100, Ard Biesheuvel wrote:

>>> > Now that we no longer allow live kernel PMDs to be split, it is safe to

>>> > start using the contiguous bit for kernel mappings. So set the

>>> > contiguous

>>> > bit in the kernel page mappings for regions whose size and alignment are

>>> > suitable for this.

>>> >

>>> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>>>

>>> Given the splitting is now gone, using the contiguous bit makes sense to

>>> me.

>>>

>>> With 16K pages, we can have contiguous PMD entries. Should we handle

>>> those,

>>> too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

>>>

>>> Otherwise, I have some comments below.

>>

>>

>> Hi,

>>

>> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

>> CONT_PTE_SHIFT

>> CONT_PMD_SHIFT

>> CONT_PTES

>> CONT_PMDS

>> CONT_PTE_SIZE

>> CONT_PTE_MASK

>> ...

>>

>> which are used by the contiguous hint HugeTLB code.

>> Can those be adopted instead of CONT_MASK and CONT_SIZE?

>>


Looking at the hugetlb code, it appears to support contiguous PMDs for
4k and 64k pages as well, while the ARM ARM only defines it for 16k
pages. I suppose the contiguous bit is simply ignored for level 2
entries when using 4k or 64k pages kernels, but I think it would be
better for the code to reflect this as well.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Oct. 11, 2016, 12:41 p.m. UTC | #6
On Tue, Oct 11, 2016 at 12:17:54PM +0100, Ard Biesheuvel wrote:
> On 11 October 2016 at 10:09, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> > On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:

> >> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

> >> CONT_PTE_SHIFT

> >> CONT_PMD_SHIFT

> >> CONT_PTES

> >> CONT_PMDS

> >> CONT_PTE_SIZE

> >> CONT_PTE_MASK

> >> ...

> >>

> >> which are used by the contiguous hint HugeTLB code.

> >> Can those be adopted instead of CONT_MASK and CONT_SIZE?

> >>

> 

> Looking at the hugetlb code, it appears to support contiguous PMDs for

> 4k and 64k pages as well, while the ARM ARM only defines it for 16k

> pages. I suppose the contiguous bit is simply ignored for level 2

> entries when using 4k or 64k pages kernels, but I think it would be

> better for the code to reflect this as well.


Which bit in the ARM ARM says that you can't support contiguous PMDs for 4k
and 64k pages? I see that the number of contiguous entries changes between
levels for 16k pages, but that's it.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Oct. 11, 2016, 12:56 p.m. UTC | #7
On 11 October 2016 at 13:41, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Oct 11, 2016 at 12:17:54PM +0100, Ard Biesheuvel wrote:

>> On 11 October 2016 at 10:09, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>> > On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:

>> >> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

>> >> CONT_PTE_SHIFT

>> >> CONT_PMD_SHIFT

>> >> CONT_PTES

>> >> CONT_PMDS

>> >> CONT_PTE_SIZE

>> >> CONT_PTE_MASK

>> >> ...

>> >>

>> >> which are used by the contiguous hint HugeTLB code.

>> >> Can those be adopted instead of CONT_MASK and CONT_SIZE?

>> >>

>>

>> Looking at the hugetlb code, it appears to support contiguous PMDs for

>> 4k and 64k pages as well, while the ARM ARM only defines it for 16k

>> pages. I suppose the contiguous bit is simply ignored for level 2

>> entries when using 4k or 64k pages kernels, but I think it would be

>> better for the code to reflect this as well.

>

> Which bit in the ARM ARM says that you can't support contiguous PMDs for 4k

> and 64k pages? I see that the number of contiguous entries changes between

> levels for 16k pages, but that's it.

>


You are right, the ARM ARM does not say that at all. But given Mark's comment:

"""
With 16K pages, we can have contiguous PMD entries. Should we handle those,
too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?
"""

it seems I am not the only one who is confused about this. In any
case, the fact that the ARM ARM documents levels 2 and 3 explicitly
for 16k pages does very little to clarify at which levels this bit is
defined, and if it is defined at levels < 2, what the granularity is
for 16k pages.

So the v2 I just sent out could be modified to allow contiguous PMDs
(32 MB blocks) on 4 KB kernels, which seems useful. I will take that
into account when I prepare the v3.

-- 
Ard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland Oct. 11, 2016, 3:10 p.m. UTC | #8
On Tue, Oct 11, 2016 at 09:21:14AM +0100, Ard Biesheuvel wrote:
> On 11 October 2016 at 08:44, Mark Rutland <mark.rutland@arm.com> wrote:

> > On Mon, Oct 10, 2016 at 07:12:44PM +0100, Ard Biesheuvel wrote:

> > Not a big deal, but the 'block' part here and elsewhere is now arguably

> > misleading (given 'block' is an architectural term).

> >

> > I haven't come up with a better term, so again, not a big deal. ;)

> 

> Indeed. I could simply call it 'allow_cont_mappings' in the context of

> this function, and keep the call below as is.


I'd prefer that the naming is consistent across functions, even if it's left
as-is (and arguably cont makes it sounds like it doesn't cover block mappings).
So unless we find a suitably-equipped thesaurus, as-is is probably fine.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Oct. 11, 2016, 4:29 p.m. UTC | #9
On Tue, Oct 11, 2016 at 01:56:26PM +0100, Ard Biesheuvel wrote:
> On 11 October 2016 at 13:41, Will Deacon <will.deacon@arm.com> wrote:

> > On Tue, Oct 11, 2016 at 12:17:54PM +0100, Ard Biesheuvel wrote:

> >> On 11 October 2016 at 10:09, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> >> > On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:

> >> >> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

> >> >> CONT_PTE_SHIFT

> >> >> CONT_PMD_SHIFT

> >> >> CONT_PTES

> >> >> CONT_PMDS

> >> >> CONT_PTE_SIZE

> >> >> CONT_PTE_MASK

> >> >> ...

> >> >>

> >> >> which are used by the contiguous hint HugeTLB code.

> >> >> Can those be adopted instead of CONT_MASK and CONT_SIZE?

> >> >>

> >>

> >> Looking at the hugetlb code, it appears to support contiguous PMDs for

> >> 4k and 64k pages as well, while the ARM ARM only defines it for 16k

> >> pages. I suppose the contiguous bit is simply ignored for level 2

> >> entries when using 4k or 64k pages kernels, but I think it would be

> >> better for the code to reflect this as well.

> >

> > Which bit in the ARM ARM says that you can't support contiguous PMDs for 4k

> > and 64k pages? I see that the number of contiguous entries changes between

> > levels for 16k pages, but that's it.

> >

> 

> You are right, the ARM ARM does not say that at all. But given Mark's comment:

> 

> """

> With 16K pages, we can have contiguous PMD entries. Should we handle those,

> too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

> """

> 

> it seems I am not the only one who is confused about this. In any

> case, the fact that the ARM ARM documents levels 2 and 3 explicitly

> for 16k pages does very little to clarify at which levels this bit is

> defined, and if it is defined at levels < 2, what the granularity is

> for 16k pages.


I see you're going to work on a more comprehensive v3 (thanks!), but just
to help clarify this: the contiguous bit is valid whenever a block or page
(i.e. a leaf) entry is valid. The only complication with 16k pages is that
the number of contiguous entries changes between level 2 and level 3,
which makes sense if you think about the TLB entries supported due to
non-contiguous block mappings in other regimes anyway (and brings into
question whether people bother with 16G in practice).

That means you can use the contiguous bit as:

4k: levels 1,2,3 (16G, 32M, 64K)
16k: levels 2,3 (1G, 2M)
64k: levels 2,3 (16G, 2M)

Hopefully my maths is correct and that clears things up,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Oct. 11, 2016, 4:38 p.m. UTC | #10
On 11 October 2016 at 17:29, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Oct 11, 2016 at 01:56:26PM +0100, Ard Biesheuvel wrote:

>> On 11 October 2016 at 13:41, Will Deacon <will.deacon@arm.com> wrote:

>> > On Tue, Oct 11, 2016 at 12:17:54PM +0100, Ard Biesheuvel wrote:

>> >> On 11 October 2016 at 10:09, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>> >> > On 11 October 2016 at 09:48, Steve Capper <steve.capper@linaro.org> wrote:

>> >> >> So in arch/arm64/include/asm/pgtable-hwdef.h, we have:

>> >> >> CONT_PTE_SHIFT

>> >> >> CONT_PMD_SHIFT

>> >> >> CONT_PTES

>> >> >> CONT_PMDS

>> >> >> CONT_PTE_SIZE

>> >> >> CONT_PTE_MASK

>> >> >> ...

>> >> >>

>> >> >> which are used by the contiguous hint HugeTLB code.

>> >> >> Can those be adopted instead of CONT_MASK and CONT_SIZE?

>> >> >>

>> >>

>> >> Looking at the hugetlb code, it appears to support contiguous PMDs for

>> >> 4k and 64k pages as well, while the ARM ARM only defines it for 16k

>> >> pages. I suppose the contiguous bit is simply ignored for level 2

>> >> entries when using 4k or 64k pages kernels, but I think it would be

>> >> better for the code to reflect this as well.

>> >

>> > Which bit in the ARM ARM says that you can't support contiguous PMDs for 4k

>> > and 64k pages? I see that the number of contiguous entries changes between

>> > levels for 16k pages, but that's it.

>> >

>>

>> You are right, the ARM ARM does not say that at all. But given Mark's comment:

>>

>> """

>> With 16K pages, we can have contiguous PMD entries. Should we handle those,

>> too? e.g. have separate {PMD,PTE}_CONT{,_SIZE}?

>> """

>>

>> it seems I am not the only one who is confused about this. In any

>> case, the fact that the ARM ARM documents levels 2 and 3 explicitly

>> for 16k pages does very little to clarify at which levels this bit is

>> defined, and if it is defined at levels < 2, what the granularity is

>> for 16k pages.

>

> I see you're going to work on a more comprehensive v3 (thanks!), but just

> to help clarify this: the contiguous bit is valid whenever a block or page

> (i.e. a leaf) entry is valid. The only complication with 16k pages is that

> the number of contiguous entries changes between level 2 and level 3,

> which makes sense if you think about the TLB entries supported due to

> non-contiguous block mappings in other regimes anyway (and brings into

> question whether people bother with 16G in practice).

>

> That means you can use the contiguous bit as:

>

> 4k: levels 1,2,3 (16G, 32M, 64K)

> 16k: levels 2,3 (1G, 2M)

> 64k: levels 2,3 (16G, 2M)

>

> Hopefully my maths is correct and that clears things up,

>


Yes, that resembles my own calculations. Another complication is that
folded PUDs and PMDs need to be dealt with at the PGD level, but I
think I have worked it out now.

Re 16 GB: are you saying don't bother? Because supporting this would
require ARM64_MEMSTART_SHIFT to be increased to (CONT_PUD_SHIFT +
PUD_SHIFT) or (CONT_PMD_SHIFT + PMD_SHIFT) [for 4k and 16k/64k,
respectively] in order to guarantee that the physical and virtual
addresses are always equal modulo 16 GB (for granules that support
it). It's a nice idea that the linear mapping can be covered by fewer
TLB entries if you have huge amounts of RAM, but if the hardware is
unlikely to honour it, it may not be worth the trouble.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 05615a3fdc6f..c491025c6a70 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -98,8 +98,11 @@  static phys_addr_t __init early_pgtable_alloc(void)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool allow_block_mappings)
 {
+	pgprot_t prot_cont = __pgprot(pgprot_val(prot) | PTE_CONT);
+	bool cont = false;
 	pte_t *pte;
 
 	BUG_ON(pmd_sect(*pmd));
@@ -115,7 +118,20 @@  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 
 	pte = pte_set_fixmap_offset(pmd, addr);
 	do {
-		set_pte(pte, pfn_pte(pfn, prot));
+		/*
+		 * Set the contiguous bit for the subsequent group of PTEs if
+		 * its size and alignment are suitable.
+		 */
+		if (((addr | PFN_PHYS(pfn)) & ~CONT_MASK) == 0)
+			cont = allow_block_mappings && end - addr >= CONT_SIZE;
+
+		/*
+		 * Ensure that we do not change the contiguous bit once this
+		 * PTE has been assigned.
+		 */
+		BUG_ON(!pte_none(*pte) && (cont ^ !!(pte_val(*pte) & PTE_CONT)));
+
+		set_pte(pte, pfn_pte(pfn, cont ? prot_cont : prot));
 		pfn++;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
@@ -166,7 +182,8 @@  static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 			}
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot, pgtable_alloc);
+				       prot, pgtable_alloc,
+				       allow_block_mappings);
 		}
 		phys += next - addr;
 	} while (pmd++, addr = next, addr != end);