diff mbox

arm64: allow vmalloc regions to be set with set_memory_*

Message ID 1453125665-26627-1-git-send-email-ard.biesheuvel@linaro.org
State Superseded
Headers show

Commit Message

Ard Biesheuvel Jan. 18, 2016, 2:01 p.m. UTC
The range of set_memory_* is currently restricted to the module address
range because of difficulties in breaking down larger block sizes.
vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the
function ranges and add a comment explaining why the range is restricted
the way it is.

Suggested-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

-- 
2.5.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Mark Rutland Jan. 18, 2016, 3:05 p.m. UTC | #1
On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:
> The range of set_memory_* is currently restricted to the module address

> range because of difficulties in breaking down larger block sizes.

> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

> function ranges and add a comment explaining why the range is restricted

> the way it is.

> 

> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


Previously we allowed set_memory_* calls on any range in the modules
area (even if that covered multiple VMAs). However, I believe that given
the way vmap area is allocated, no caller can rely on mappings being
adjacent, and thus such calls would be erroneous.

Given that, this looks good to me (with one minor nit below). FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>


> ---

>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

>  1 file changed, 19 insertions(+), 4 deletions(-)

> 

> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c

> index 3571c7309c5e..1360a02d88b7 100644

> --- a/arch/arm64/mm/pageattr.c

> +++ b/arch/arm64/mm/pageattr.c

> @@ -13,6 +13,7 @@

>  #include <linux/kernel.h>

>  #include <linux/mm.h>

>  #include <linux/module.h>

> +#include <linux/vmalloc.h>

>  #include <linux/sched.h>


Nit: please keep alphabetical order here.

Mark.

>  

>  #include <asm/pgtable.h>

> @@ -44,6 +45,7 @@ static int change_memory_common(unsigned long addr, int numpages,

>  	unsigned long end = start + size;

>  	int ret;

>  	struct page_change_data data;

> +	struct vm_struct *area;

>  

>  	if (!PAGE_ALIGNED(addr)) {

>  		start &= PAGE_MASK;

> @@ -51,10 +53,23 @@ static int change_memory_common(unsigned long addr, int numpages,

>  		WARN_ON_ONCE(1);

>  	}

>  

> -	if (start < MODULES_VADDR || start >= MODULES_END)

> -		return -EINVAL;

> -

> -	if (end < MODULES_VADDR || end >= MODULES_END)

> +	/*

> +	 * Kernel VA mappings are always live, and splitting live section

> +	 * mappings into page mappings may cause TLB conflicts. This means

> +	 * we have to ensure that changing the permission bits of the range

> +	 * we are operating on does not result in such splitting.

> +	 *

> +	 * Let's restrict ourselves to mappings created by vmalloc (or vmap).

> +	 * Those are guaranteed to consist entirely of page mappings, and

> +	 * splitting is never needed.

> +	 *

> +	 * So check whether the [addr, addr + size) interval is entirely

> +	 * covered by precisely one VM area that has the VM_ALLOC flag set.

> +	 */

> +	area = find_vm_area((void *)addr);

> +	if (!area ||

> +	    end > (unsigned long)area->addr + area->size ||

> +	    !(area->flags & VM_ALLOC))

>  		return -EINVAL;

>  

>  	data.set_mask = set_mask;

> -- 

> 2.5.0

> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Jan. 28, 2016, 3:08 p.m. UTC | #2
On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:
> The range of set_memory_* is currently restricted to the module address

> range because of difficulties in breaking down larger block sizes.

> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

> function ranges and add a comment explaining why the range is restricted

> the way it is.

> 

> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

>  1 file changed, 19 insertions(+), 4 deletions(-)


What's the user for this? It would be better to apply along with something
that actually needs to change permission for arbitrary vmalloc mappings.

Anyway, with that (and a rebase to -rc2):

Acked-by: Will Deacon <will.deacon@arm.com>


Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Jan. 28, 2016, 4:40 p.m. UTC | #3
On 28 January 2016 at 16:08, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:

>> The range of set_memory_* is currently restricted to the module address

>> range because of difficulties in breaking down larger block sizes.

>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

>> function ranges and add a comment explaining why the range is restricted

>> the way it is.

>>

>> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> ---

>>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

>>  1 file changed, 19 insertions(+), 4 deletions(-)

>

> What's the user for this? It would be better to apply along with something

> that actually needs to change permission for arbitrary vmalloc mappings.

>


BPF already uses set_memory_ro() but doesn't bother to check the return value.

Before the patch, I get:

> Anyway, with that (and a rebase to -rc2):

>

> Acked-by: Will Deacon <will.deacon@arm.com>

>

> Will


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Jan. 28, 2016, 4:43 p.m. UTC | #4
On 28 January 2016 at 16:08, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:

>> The range of set_memory_* is currently restricted to the module address

>> range because of difficulties in breaking down larger block sizes.

>> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

>> function ranges and add a comment explaining why the range is restricted

>> the way it is.

>>

>> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> ---

>>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

>>  1 file changed, 19 insertions(+), 4 deletions(-)

>

> What's the user for this? It would be better to apply along with something

> that actually needs to change permission for arbitrary vmalloc mappings.

>


BPF already uses set_memory_ro() but doesn't bother to check the return value.

Without the patch, I get:

root@ubuntu-arm64:~# grep bpf /proc/vmallocinfo
0xffff000000014000-0xffff000000016000    8192 bpf_prog_alloc+0x3c/0xb8
pages=1 vmalloc
0xffff00000017a000-0xffff00000017c000    8192 bpf_prog_alloc+0x3c/0xb8
pages=1 vmalloc
root@ubuntu-arm64:~# grep -E 0xffff000000014000\|0xffff00000017a000
/sys/kernel/debug/kernel_page_tables
0xffff000000014000-0xffff000000015000           4K     RW NX SHD AF
        UXN MEM/NORMAL
0xffff00000017a000-0xffff00000017b000           4K     RW NX SHD AF
        UXN MEM/NORMAL

and with:

root@ubuntu-arm64:~# grep bpf /proc/vmallocinfo
0xffff000000014000-0xffff000000016000    8192 bpf_prog_alloc+0x3c/0xb8
pages=1 vmalloc
0xffff00000017a000-0xffff00000017c000    8192 bpf_prog_alloc+0x3c/0xb8
pages=1 vmalloc
root@ubuntu-arm64:~# grep -E 0xffff000000014000\|0xffff00000017a000
/sys/kernel/debug/kernel_page_tables
0xffff000000014000-0xffff000000015000           4K     ro NX SHD AF
        UXN MEM/NORMAL
0xffff00000017a000-0xffff00000017b000           4K     ro NX SHD AF
        UXN MEM/NORMAL


> Anyway, with that (and a rebase to -rc2):

>

> Acked-by: Will Deacon <will.deacon@arm.com>

>


Thanks

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Jan. 28, 2016, 6:10 p.m. UTC | #5
On Thu, Jan 28, 2016 at 05:43:41PM +0100, Ard Biesheuvel wrote:
> On 28 January 2016 at 16:08, Will Deacon <will.deacon@arm.com> wrote:

> > On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:

> >> The range of set_memory_* is currently restricted to the module address

> >> range because of difficulties in breaking down larger block sizes.

> >> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

> >> function ranges and add a comment explaining why the range is restricted

> >> the way it is.

> >>

> >> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> >> ---

> >>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

> >>  1 file changed, 19 insertions(+), 4 deletions(-)

> >

> > What's the user for this? It would be better to apply along with something

> > that actually needs to change permission for arbitrary vmalloc mappings.

> >

> 

> BPF already uses set_memory_ro() but doesn't bother to check the return value.


Urgh :(

Then I can take it as a fix, if you like.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Jan. 28, 2016, 7:07 p.m. UTC | #6
On 28 January 2016 at 19:10, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jan 28, 2016 at 05:43:41PM +0100, Ard Biesheuvel wrote:

>> On 28 January 2016 at 16:08, Will Deacon <will.deacon@arm.com> wrote:

>> > On Mon, Jan 18, 2016 at 03:01:05PM +0100, Ard Biesheuvel wrote:

>> >> The range of set_memory_* is currently restricted to the module address

>> >> range because of difficulties in breaking down larger block sizes.

>> >> vmalloc maps PAGE_SIZE pages so it is safe to use as well. Update the

>> >> function ranges and add a comment explaining why the range is restricted

>> >> the way it is.

>> >>

>> >> Suggested-by: Laura Abbott <labbott@fedoraproject.org>

>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> >> ---

>> >>  arch/arm64/mm/pageattr.c | 23 +++++++++++++++++++----

>> >>  1 file changed, 19 insertions(+), 4 deletions(-)

>> >

>> > What's the user for this? It would be better to apply along with something

>> > that actually needs to change permission for arbitrary vmalloc mappings.

>> >

>>

>> BPF already uses set_memory_ro() but doesn't bother to check the return value.

>

> Urgh :(

>

> Then I can take it as a fix, if you like.

>


I will let you decide whether it should be merged now or in v4.6. I
don't think there's any urgency here, though.

BTW I already sent a v2 based on -rc2 here
http://article.gmane.org/gmane.linux.ports.arm.kernel/472555

Thanks,
Ard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 3571c7309c5e..1360a02d88b7 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -13,6 +13,7 @@ 
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/vmalloc.h>
 #include <linux/sched.h>
 
 #include <asm/pgtable.h>
@@ -44,6 +45,7 @@  static int change_memory_common(unsigned long addr, int numpages,
 	unsigned long end = start + size;
 	int ret;
 	struct page_change_data data;
+	struct vm_struct *area;
 
 	if (!PAGE_ALIGNED(addr)) {
 		start &= PAGE_MASK;
@@ -51,10 +53,23 @@  static int change_memory_common(unsigned long addr, int numpages,
 		WARN_ON_ONCE(1);
 	}
 
-	if (start < MODULES_VADDR || start >= MODULES_END)
-		return -EINVAL;
-
-	if (end < MODULES_VADDR || end >= MODULES_END)
+	/*
+	 * Kernel VA mappings are always live, and splitting live section
+	 * mappings into page mappings may cause TLB conflicts. This means
+	 * we have to ensure that changing the permission bits of the range
+	 * we are operating on does not result in such splitting.
+	 *
+	 * Let's restrict ourselves to mappings created by vmalloc (or vmap).
+	 * Those are guaranteed to consist entirely of page mappings, and
+	 * splitting is never needed.
+	 *
+	 * So check whether the [addr, addr + size) interval is entirely
+	 * covered by precisely one VM area that has the VM_ALLOC flag set.
+	 */
+	area = find_vm_area((void *)addr);
+	if (!area ||
+	    end > (unsigned long)area->addr + area->size ||
+	    !(area->flags & VM_ALLOC))
 		return -EINVAL;
 
 	data.set_mask = set_mask;