From patchwork Fri Mar 28 15:01:26 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 27298 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f197.google.com (mail-qc0-f197.google.com [209.85.216.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 388CC20545 for ; Fri, 28 Mar 2014 15:04:25 +0000 (UTC) Received: by mail-qc0-f197.google.com with SMTP id i8sf10979500qcq.4 for ; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:subject:date:message-id :in-reply-to:references:cc:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:mime-version:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:content-type:content-transfer-encoding; bh=hUZAb7v40o2hEV/S1bY5II28pwLD6zZrlQQWcGQaEIM=; b=FkvH2x68XyFCa5Bq/kCPbRiKK8+CtzyodudJA9MGynS2d3GPjP5eznMWamquyMPM56 23o2cUq+BZ7snudu7yAjBxDZlLGOtq2RJfBImIo2vE+3N5GRceYt9Y58McObrdAlZ0Yg TUBCrts9PqslxLpYfiGMyKz0w0C2XyhEO0TSMu2zzWCIehWHFUTRvUUea9W4722SJ1LA Z562AtOGJTausLfbe3KQGORELHwuROOYgx0tKd9H/64n6yCFjSDdUEfqYuWCJi6hIqEb SF9upQtyRyYn1Rp1k14uAnBcvYIUKvPMCIXhuIe3lO+LIZpkuCvD4vpnyH9Su0JxtfR2 CmCA== X-Gm-Message-State: ALoCoQnfs7Qt2GHknWl2/PUDXN7QP6Pz84XCC2PCZaRe6KGDESWwtLEJ9a2J5KOFwJ+rZINI3r91 X-Received: by 10.236.29.51 with SMTP id h39mr2933389yha.45.1396019064752; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.39.100 with SMTP id u91ls1536761qgu.74.gmail; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) X-Received: by 10.58.188.52 with SMTP id fx20mr749334vec.28.1396019064681; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx.google.com with ESMTPS id e9si1289064vcf.49.2014.03.28.08.04.24 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 28 Mar 2014 08:04:24 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.176; Received: by mail-ve0-f176.google.com with SMTP id db11so2638971veb.21 for ; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) X-Received: by 10.220.162.6 with SMTP id t6mr7722372vcx.12.1396019064568; Fri, 28 Mar 2014 08:04:24 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.12.8 with SMTP id v8csp15551vcv; Fri, 28 Mar 2014 08:04:23 -0700 (PDT) X-Received: by 10.180.77.200 with SMTP id u8mr13467706wiw.48.1396019062923; Fri, 28 Mar 2014 08:04:22 -0700 (PDT) Received: from casper.infradead.org (casper.infradead.org. [2001:770:15f::2]) by mx.google.com with ESMTPS id s13si2352793wiv.5.2014.03.28.08.04.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Mar 2014 08:04:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:770:15f::2 as permitted sender) client-ip=2001:770:15f::2; Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1WTYIx-0005Gl-B4; Fri, 28 Mar 2014 15:02:47 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WTYIp-0008Ew-FM; Fri, 28 Mar 2014 15:02:39 +0000 Received: from mail-we0-f169.google.com ([74.125.82.169]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1WTYIE-00085e-TY for linux-arm-kernel@lists.infradead.org; Fri, 28 Mar 2014 15:02:20 +0000 Received: by mail-we0-f169.google.com with SMTP id w62so2779923wes.28 for ; Fri, 28 Mar 2014 08:01:41 -0700 (PDT) X-Received: by 10.180.98.71 with SMTP id eg7mr13128925wib.31.1396018901326; Fri, 28 Mar 2014 08:01:41 -0700 (PDT) Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id fo6sm8038670wib.7.2014.03.28.08.01.40 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Mar 2014 08:01:40 -0700 (PDT) From: Steve Capper To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-mm@kvack.org, linux-arch@vger.kernel.org Subject: [RFC PATCH V4 1/7] mm: Introduce a general RCU get_user_pages_fast. Date: Fri, 28 Mar 2014 15:01:26 +0000 Message-Id: <1396018892-6773-2-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1396018892-6773-1-git-send-email-steve.capper@linaro.org> References: <1396018892-6773-1-git-send-email-steve.capper@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140328_110203_462429_30897575 X-CRM114-Status: GOOD ( 23.49 ) X-Spam-Score: -2.6 (--) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-2.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [74.125.82.169 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: peterz@infradead.org, gary.robertson@linaro.org, akpm@linux-foundation.org, anders.roxell@linaro.org, Steve Capper X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: steve.capper@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 A general RCU implementation of get_user_pages_fast. It is based heavily on the PowerPC implementation. The lockless page cache protocols are used as this implementation assumes that TLB invalidations do not necessarily need to be broadcast via IPI. This implementation does however assume that THP splits will broadcast an IPI, and this is why interrupts are disabled in the fast_gup walker (otherwise calls to rcu_read_(un)lock would suffice). Signed-off-by: Steve Capper --- This is my first attempt to generalise fast_gup to core code. At the moment there are two implicit assumptions that I know about: o) 64-bit ptes can be atomically read. o) hugetlb pages and thps have a similar bit layout. Any feedback from other architectures maintainers on how this could be tweaked to accommodate them, would be greatly appreciated! Especially as there is a lot of similarity between each architecture's fast_gup. --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 301 insertions(+) create mode 100644 mm/gup.c diff --git a/mm/Kconfig b/mm/Kconfig index 2888024..0151e17 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -134,6 +134,9 @@ config HAVE_MEMBLOCK config HAVE_MEMBLOCK_NODE_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/Makefile b/mm/Makefile index 310c90a..0f19c5f 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -28,6 +28,7 @@ else endif obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o +obj-$(CONFIG_HAVE_RCU_GUP) += gup.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o diff --git a/mm/gup.c b/mm/gup.c new file mode 100644 index 0000000..b35296f --- /dev/null +++ b/mm/gup.c @@ -0,0 +1,297 @@ +/* + * mm/gup.c + * + * Copyright (C) 2014 Linaro Ltd. + * + * Based on arch/powerpc/mm/gup.c which is: + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (!pmd_present(orig) || (write && !pmd_write(orig))) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + pmd_t origpmd = __pmd(pud_val(orig)); + int refs; + + if (!pmd_present(origpmd) || (write && !pmd_write(origpmd))) + return 0; + + refs = 0; + head = pmd_page(origpmd); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +}