From patchwork Thu Feb  6 16:18:51 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Steve Capper <steve.capper@linaro.org>
X-Patchwork-Id: 24265
Return-Path: <patchwork-forward+bncBCWKR65P2UCRB7HLZ2LQKGQEGZ5SFII@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-qc0-f198.google.com (mail-qc0-f198.google.com
 [209.85.216.198])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 8FD7820445
 for <linaro@patches.linaro.org>; Thu,  6 Feb 2014 16:19:09 +0000 (UTC)
Received: by mail-qc0-f198.google.com with SMTP id c9sf4364178qcz.5
 for <linaro@patches.linaro.org>; Thu, 06 Feb 2014 08:19:08 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:in-reply-to:references:x-original-sender
 :x-original-authentication-results:precedence:mailing-list:list-id
 :list-post:list-help:list-archive:list-unsubscribe;
 bh=bdMEm6+l6vuxMO27HOvEbLf9pf7uipPWJHnkeyeZvtI=;
 b=ZsVqg8RTiy6S6MjguMpRcrV3DzcARbtczF3nOEiEh++jHfpjb1rTIBP5HrZRnNdeU6
 VCQaIgwQRqWLlwo936yFqAuVy8mksrBJNvo1SjAiAnvB6WoWmMM9X9U/8ggcNL6/BYAX
 X/KdNgKIhhxXPVZqOS9Usuekjm50QeRvpS9usE820PvnLqsnoMPjkY8U/hKADmeNuCDc
 Lj/rs8FHbfa067munu3nnZm+JzB1abnOtzk5m6p9Y4w0fEN0Sit+NqSbRdOeYGr+aPq4
 +ADmfJlBb0q39HwoJAJaELLenPQ5kyyuKF5/lgtHDIWWhJN9stlkXZGNVgq8G2GU/7am
 Kcgw==
X-Gm-Message-State: ALoCoQnEZWd5ETN9XMBHUe/9w6mRlFp10SaQiJ1InkL3JJCSKSCOIz1CDxBZ6wRfzrR+YzR5UuSn
X-Received: by 10.236.0.232 with SMTP id 68mr3047973yhb.16.1391703548531;
 Thu, 06 Feb 2014 08:19:08 -0800 (PST)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.140.35.208 with SMTP id n74ls609639qgn.65.gmail; Thu, 06 Feb
 2014 08:19:08 -0800 (PST)
X-Received: by 10.220.192.71 with SMTP id dp7mr11677vcb.45.1391703548092;
 Thu, 06 Feb 2014 08:19:08 -0800 (PST)
Received: from mail-ve0-f178.google.com (mail-ve0-f178.google.com
 [209.85.128.178])
 by mx.google.com with ESMTPS id tj7si361352vdc.85.2014.02.06.08.19.08
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 06 Feb 2014 08:19:08 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.128.178 is neither permitted nor
 denied by best guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 client-ip=209.85.128.178; 
Received: by mail-ve0-f178.google.com with SMTP id oy12so1658460veb.23
 for <patchwork-forward@linaro.org>;
 Thu, 06 Feb 2014 08:19:08 -0800 (PST)
X-Received: by 10.58.244.71 with SMTP id xe7mr39228vec.42.1391703547999;
 Thu, 06 Feb 2014 08:19:07 -0800 (PST)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patches@linaro.org
Received: by 10.220.174.196 with SMTP id u4csp25218vcz;
 Thu, 6 Feb 2014 08:19:07 -0800 (PST)
X-Received: by 10.180.165.15 with SMTP id yu15mr135666wib.28.1391703546880; 
 Thu, 06 Feb 2014 08:19:06 -0800 (PST)
Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com
 [74.125.82.51]) by mx.google.com with ESMTPS id
 cd13si4213645wib.70.2014.02.06.08.19.06 for <patches@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 06 Feb 2014 08:19:06 -0800 (PST)
Received-SPF: neutral (google.com: 74.125.82.51 is neither permitted nor
 denied by best guess record for domain of
 steve.capper@linaro.org) client-ip=74.125.82.51; 
Received: by mail-wg0-f51.google.com with SMTP id z12so1424882wgg.6
 for <patches@linaro.org>; Thu, 06 Feb 2014 08:19:06 -0800 (PST)
X-Received: by 10.180.106.40 with SMTP id gr8mr18239wib.31.1391703546408;
 Thu, 06 Feb 2014 08:19:06 -0800 (PST)
Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87])
 by mx.google.com with ESMTPSA id d6sm6170359wiz.4.2014.02.06.08.19.05
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 06 Feb 2014 08:19:06 -0800 (PST)
From: Steve Capper <steve.capper@linaro.org>
To: linux-arm-kernel@lists.infradead.org
Cc: will.deacon@arm.com, catalin.marinas@arm.com, linux@arm.linux.org.uk,
 chanho61.park@samsung.com, zishen.lim@linaro.org, patches@linaro.org,
 gary.robertson@linaro.org, michael.hudson@linaro.org,
 christoffer.dall@linaro.org, Steve Capper <steve.capper@linaro.org>
Subject: [RFC PATCH V2 4/4] arm64: mm: implement get_user_pages_fast
Date: Thu,  6 Feb 2014 16:18:51 +0000
Message-Id: <1391703531-12845-5-git-send-email-steve.capper@linaro.org>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1391703531-12845-1-git-send-email-steve.capper@linaro.org>
References: <1391703531-12845-1-git-send-email-steve.capper@linaro.org>
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: steve.capper@linaro.org
X-Original-Authentication-Results: mx.google.com;       spf=neutral
 (google.com: 209.85.128.178 is neither permitted nor denied by best
 guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
List-ID: <patchwork-forward.linaro.org>
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>

An implementation of get_user_pages_fast for arm64. It is based on the
arm implementation (it has the added ability to walk huge puds) which
is loosely on the PowerPC implementation. We disable interrupts in the
walker to prevent the call_rcu_sched pagetable freeing code from
running under us.

We also explicitly fire an IPI in the Transparent HugePage splitting
case to prevent splits from interfering with the fast_gup walker.
As THP splits are relatively rare, this should not have a noticable
overhead.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/include/asm/pgtable.h |   4 +
 arch/arm64/mm/Makefile           |   2 +-
 arch/arm64/mm/gup.c              | 297 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/mm/gup.c
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7f2b60a..8cfa1aa 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -212,6 +212,10 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index b51d364..44b2148 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -1,5 +1,5 @@
 obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   cache.o copypage.o flush.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
-				   context.o tlb.o proc.o
+				   context.o tlb.o proc.o gup.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
diff --git a/arch/arm64/mm/gup.c b/arch/arm64/mm/gup.c
new file mode 100644
index 0000000..45dd908
--- /dev/null
+++ b/arch/arm64/mm/gup.c
@@ -0,0 +1,297 @@
+/*
+ * arch/arm64/mm/gup.c
+ *
+ * Copyright (C) 2014 Linaro Ltd.
+ *
+ * Based on arch/powerpc/mm/gup.c which is:
+ * Copyright (C) 2008 Nick Piggin
+ * Copyright (C) 2008 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/rwsem.h>
+#include <linux/hugetlb.h>
+#include <asm/pgtable.h>
+
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_valid_user(pte) || (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (!pmd_present(orig) || (write && !pmd_write(orig)))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	pmd_t origpmd = __pmd(pud_val(orig));
+	int refs;
+
+	if (!pmd_present(origpmd) || (write && !pmd_write(origpmd)))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(origpmd);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_huge(pmd) || pmd_trans_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		}
+		else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */