diff mbox series

arm64: mm: Use READ_ONCE when dereferencing pointer to pte table

Message ID 1506680995-9000-1-git-send-email-will.deacon@arm.com
State Accepted
Commit f069faba688701c4d56b6c3452a130f97bf02e95
Headers show
Series arm64: mm: Use READ_ONCE when dereferencing pointer to pte table | expand

Commit Message

Will Deacon Sept. 29, 2017, 10:29 a.m. UTC
On kernels built with support for transparent huge pages, different CPUs
can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
and they must take care to use READ_ONCE to avoid value tearing or caching
of stale values by the compiler. Unfortunately, these functions call into
our pgtable macros, which don't use READ_ONCE, and compiler caching has
been observed to cause the following crash during ext4 writeback:

PC is at check_pte+0x20/0x170
LR is at page_vma_mapped_walk+0x2e0/0x540
[...]
Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
Call trace:
[<ffff000008233328>] check_pte+0x20/0x170
[<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
[<ffff000008234adc>] page_mkclean_one+0xac/0x278
[<ffff000008234d98>] rmap_walk_file+0xf0/0x238
[<ffff000008236e74>] rmap_walk+0x64/0xa0
[<ffff0000082370c8>] page_mkclean+0x90/0xa8
[<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
[<ffff00000832f984>] mpage_submit_page+0x34/0x98
[<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
[<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
[<ffff00000833530c>] ext4_writepages+0x484/0xe30
[<ffff0000081f6ab4>] do_writepages+0x44/0xe8
[<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
[<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
[<ffff000008324310>] ext4_sync_file+0x80/0x4b8
[<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
[<ffff0000082332b4>] SyS_msync+0x194/0x1e8

This is because page_vma_mapped_walk loads the PMD twice before calling
pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
(where it sees a valid table pointer due to a concurrent pmd_populate).
However, the compiler inlines everything and caches the first value in
a register, which is subsequently used in pte_offset_phys which returns
a junk pointer that is later dereferenced when attempting to access the
relevant pte.

This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
that a stale value is not used. Whilst this is a point fix for a known
failure (and simple to backport), a full fix moving all of our page table
accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
page_vma_mapped_walk is in the works for a future kernel release.

Cc: Jon Masters <jcm@redhat.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: <stable@vger.kernel.org>
Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>

Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 arch/arm64/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.1.4

Comments

Catalin Marinas Sept. 29, 2017, 3:47 p.m. UTC | #1
On Fri, Sep 29, 2017 at 11:29:55AM +0100, Will Deacon wrote:
> On kernels built with support for transparent huge pages, different CPUs

> can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk

> and they must take care to use READ_ONCE to avoid value tearing or caching

> of stale values by the compiler. Unfortunately, these functions call into

> our pgtable macros, which don't use READ_ONCE, and compiler caching has

> been observed to cause the following crash during ext4 writeback:

> 

> PC is at check_pte+0x20/0x170

> LR is at page_vma_mapped_walk+0x2e0/0x540

> [...]

> Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)

> Call trace:

> [<ffff000008233328>] check_pte+0x20/0x170

> [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540

> [<ffff000008234adc>] page_mkclean_one+0xac/0x278

> [<ffff000008234d98>] rmap_walk_file+0xf0/0x238

> [<ffff000008236e74>] rmap_walk+0x64/0xa0

> [<ffff0000082370c8>] page_mkclean+0x90/0xa8

> [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8

> [<ffff00000832f984>] mpage_submit_page+0x34/0x98

> [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170

> [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8

> [<ffff00000833530c>] ext4_writepages+0x484/0xe30

> [<ffff0000081f6ab4>] do_writepages+0x44/0xe8

> [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110

> [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8

> [<ffff000008324310>] ext4_sync_file+0x80/0x4b8

> [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0

> [<ffff0000082332b4>] SyS_msync+0x194/0x1e8

> 

> This is because page_vma_mapped_walk loads the PMD twice before calling

> pte_offset_map: the first time without READ_ONCE (where it gets all zeroes

> due to a concurrent pmdp_invalidate) and the second time with READ_ONCE

> (where it sees a valid table pointer due to a concurrent pmd_populate).

> However, the compiler inlines everything and caches the first value in

> a register, which is subsequently used in pte_offset_phys which returns

> a junk pointer that is later dereferenced when attempting to access the

> relevant pte.

> 

> This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure

> that a stale value is not used. Whilst this is a point fix for a known

> failure (and simple to backport), a full fix moving all of our page table

> accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in

> page_vma_mapped_walk is in the works for a future kernel release.

> 

> Cc: Jon Masters <jcm@redhat.com>

> Cc: Timur Tabi <timur@codeaurora.org>

> Cc: <stable@vger.kernel.org>

> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")

> Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>

> Signed-off-by: Will Deacon <will.deacon@arm.com>


Applied to arm64 fixes/core. Thanks

-- 
Catalin
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index bc4e92337d16..b46e54c2399b 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -401,7 +401,7 @@  static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 /* Find an entry in the third-level page table. */
 #define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
 
-#define pte_offset_phys(dir,addr)	(pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
+#define pte_offset_phys(dir,addr)	(pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
 #define pte_offset_kernel(dir,addr)	((pte_t *)__va(pte_offset_phys((dir), (addr))))
 
 #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))