From patchwork Thu Aug 21 15:43:27 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 35768 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-pd0-f199.google.com (mail-pd0-f199.google.com [209.85.192.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 909612055F for ; Thu, 21 Aug 2014 15:45:33 +0000 (UTC) Received: by mail-pd0-f199.google.com with SMTP id v10sf70038559pde.2 for ; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:subject:date:message-id :in-reply-to:references:cc:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:mime-version:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:content-type:content-transfer-encoding; bh=A7m0gX60UgUMhBDf25li6i2tAQjnoICZ23Evbzlr+OA=; b=EuZls2et3Zo8V0cPIQiaxB7RZ1xQ/k7iUZtXYPqPOhlj+SR+Z5qfhYYASUSDcaj77K G4DFDKY5frWmb9HopvEX2suw85fYbsuNgflHiREuyUHVKpqBVvaE/B+NPWAwCgKfQexa PZuG/3hCbAbKfAlfKF6n+sqOH1j0MDRV0CDgche6cvFmMYiCNIFYDQXWvcR4Wo2XT0+a 9IKxhBvWAuxSWKVFJ2FUZKQrO8+gBvzMjcBFxw+OpZuGEOJT6N+yQkLeTiO8s51F0kuJ pPvCzCN6Y3sCRAeOjYWtz8G7GRWkrqFDt34vL1ITlF8vi3C83RKvCOibobKZD/wSS5le hMQw== X-Gm-Message-State: ALoCoQl7lRmqMa4Znds12wf8Ah91kKwHyV9HmvuTQpcCuTGyCrFxtmZ6SGV9ue9IgcMSLjMwFjCs X-Received: by 10.66.141.70 with SMTP id rm6mr20748487pab.37.1408635932880; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.37.72 with SMTP id q66ls750841qgq.49.gmail; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) X-Received: by 10.220.112.143 with SMTP id w15mr769569vcp.41.1408635932682; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) Received: from mail-vc0-f178.google.com (mail-vc0-f178.google.com [209.85.220.178]) by mx.google.com with ESMTPS id tc7si12481138vcb.24.2014.08.21.08.45.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 21 Aug 2014 08:45:32 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.178 as permitted sender) client-ip=209.85.220.178; Received: by mail-vc0-f178.google.com with SMTP id la4so10977398vcb.37 for ; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) X-Received: by 10.221.21.8 with SMTP id qq8mr560399vcb.57.1408635932581; Thu, 21 Aug 2014 08:45:32 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.45.67 with SMTP id uj3csp146016vcb; Thu, 21 Aug 2014 08:45:31 -0700 (PDT) X-Received: by 10.70.89.3 with SMTP id bk3mr67090023pdb.41.1408635931150; Thu, 21 Aug 2014 08:45:31 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id u10si37007761pds.179.2014.08.21.08.45.30 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Aug 2014 08:45:31 -0700 (PDT) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XKUXA-0000ch-B8; Thu, 21 Aug 2014 15:44:16 +0000 Received: from mail-wg0-f52.google.com ([74.125.82.52]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XKUWz-0000NG-E1 for linux-arm-kernel@lists.infradead.org; Thu, 21 Aug 2014 15:44:06 +0000 Received: by mail-wg0-f52.google.com with SMTP id a1so9292598wgh.23 for ; Thu, 21 Aug 2014 08:43:43 -0700 (PDT) X-Received: by 10.180.84.165 with SMTP id a5mr1783127wiz.40.1408635823004; Thu, 21 Aug 2014 08:43:43 -0700 (PDT) Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id wi9sm67591585wjc.23.2014.08.21.08.43.41 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Aug 2014 08:43:42 -0700 (PDT) From: Steve Capper To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Subject: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Thu, 21 Aug 2014 16:43:27 +0100 Message-Id: <1408635812-31584-2-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1408635812-31584-1-git-send-email-steve.capper@linaro.org> References: <1408635812-31584-1-git-send-email-steve.capper@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140821_084405_789137_293F09F0 X-CRM114-Status: GOOD ( 19.74 ) X-Spam-Score: -0.7 (/) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [74.125.82.52 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.82.52 listed in wl.mailspike.net] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Cc: mark.rutland@arm.com, anders.roxell@linaro.org, peterz@infradead.org, gary.robertson@linaro.org, will.deacon@arm.com, mgorman@suse.de, dann.frazier@canonical.com, Steve Capper , akpm@linux-foundation.org, christoffer.dall@linaro.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: steve.capper@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.178 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 get_user_pages_fast attempts to pin user pages by walking the page tables directly and avoids taking locks. Thus the walker needs to be protected from page table pages being freed from under it, and needs to block any THP splits. One way to achieve this is to have the walker disable interrupts, and rely on IPIs from the TLB flushing code blocking before the page table pages are freed. On some platforms we have hardware broadcast of TLB invalidations, thus the TLB flushing code doesn't necessarily need to broadcast IPIs; and spuriously broadcasting IPIs can hurt system performance if done too often. This problem has been solved on PowerPC and Sparc by batching up page table pages belonging to more than one mm_user, then scheduling an rcu_sched callback to free the pages. This RCU page table free logic has been promoted to core code and is activated when one enables HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement their own get_user_pages_fast routines. The RCU page table free logic coupled with a an IPI broadcast on THP split (which is a rare event), allows one to protect a page table walker by merely disabling the interrupts during the walk. This patch provides a general RCU implementation of get_user_pages_fast that can be used by architectures that perform hardware broadcast of TLB invalidations. It is based heavily on the PowerPC implementation by Nick Piggin. Signed-off-by: Steve Capper --- mm/Kconfig | 3 + mm/gup.c | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 281 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 886db21..6a4d764 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP config HAVE_MEMBLOCK_PHYS_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/gup.c b/mm/gup.c index 91d044b..2f684fa 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -10,6 +10,10 @@ #include #include +#include +#include +#include + #include "internal.h" static struct page *no_page_table(struct vm_area_struct *vma, @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr) return page; } #endif /* CONFIG_ELF_CORE */ + +#ifdef CONFIG_HAVE_RCU_GUP + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (write && !pmd_write(orig)) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (write && !pud_write(orig)) + return 0; + + refs = 0; + head = pud_page(orig); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +} + +#endif /* CONFIG_HAVE_RCU_GUP */