From patchwork Thu Jul 28 14:51:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julien Grall X-Patchwork-Id: 72982 Delivered-To: patch@linaro.org Received: by 10.140.29.52 with SMTP id a49csp894763qga; Thu, 28 Jul 2016 07:54:39 -0700 (PDT) X-Received: by 10.107.56.70 with SMTP id f67mr40320530ioa.162.1469717677773; Thu, 28 Jul 2016 07:54:37 -0700 (PDT) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org. [192.237.175.120]) by mx.google.com with ESMTPS id o9si33613537itd.68.2016.07.28.07.54.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jul 2016 07:54:37 -0700 (PDT) Received-SPF: neutral (google.com: 192.237.175.120 is neither permitted nor denied by best guess record for domain of xen-devel-bounces@lists.xen.org) client-ip=192.237.175.120; Authentication-Results: mx.google.com; spf=neutral (google.com: 192.237.175.120 is neither permitted nor denied by best guess record for domain of xen-devel-bounces@lists.xen.org) smtp.mailfrom=xen-devel-bounces@lists.xen.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSmfc-0003I6-NM; Thu, 28 Jul 2016 14:52:20 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSmfa-0003E6-UI for xen-devel@lists.xen.org; Thu, 28 Jul 2016 14:52:19 +0000 Received: from [85.158.137.68] by server-6.bemta-3.messagelabs.com id 3D/D9-28813-22C1A975; Thu, 28 Jul 2016 14:52:18 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrOLMWRWlGSWpSXmKPExsVysyfVTVdRZla 4wYrr8hZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa8an5/8ZCw6nVPy6tp+9gXG9excjF4eQwCZG iaW/VzBDOKcZJc50LmTsYuTkYBPQlLjz+RMTiC0iIC1x7fNlRpAiZoF2Rom1/b3MIAlhAT+Jt m+LwGwWAVWJztYLYM28Ai4Sn2/uYAexJQTkJE4em8zaxcjBwQkUf90tBBIWEnCWmD5zKeMERu 4FjAyrGDWKU4vKUot0DQ31kooy0zNKchMzc3QNDYz1clOLixPTU3MSk4r1kvNzNzECPcwABDs YV/92OsQoycGkJMobFjozXIgvKT+lMiOxOCO+qDQntfgQowwHh5IE7y2pWeFCgkWp6akVaZk5 wFCDSUtw8CiJ8MpJA6V5iwsSc4sz0yFSpxgVpcR5a0D6BEASGaV5cG2w8L7EKCslzMsIdIgQT 0FqUW5mCar8K0ZxDkYlYV5mkPE8mXklcNNfAS1mAlpcHDsDZHFJIkJKqoFReCvP7fDpdpL6Pz 4lHD+l9OZZYum2c4G6opvVHf94yX5++qWS+7Pivimyi7fPv91UIbDn7d40pemtGXbTpPYqOLb /Lz54gNft6vz+tydm7fyZu7LSad6ktyov/5zfkGbYtl66fOmr6pSo312b7kbV6CY1dB1c9+HD /qxpfQ0FfhV/3GZ+vVUrpcRSnJFoqMVcVJwIAHLIi7xqAgAA X-Env-Sender: julien.grall@arm.com X-Msg-Ref: server-3.tower-31.messagelabs.com!1469717535!52886556!2 X-Originating-IP: [217.140.101.70] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 8.77; banners=-,-,- X-VirusChecked: Checked Received: (qmail 54700 invoked from network); 28 Jul 2016 14:52:17 -0000 Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by server-3.tower-31.messagelabs.com with SMTP; 28 Jul 2016 14:52:17 -0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 15859BFD; Thu, 28 Jul 2016 07:53:34 -0700 (PDT) Received: from e108454-lin.cambridge.arm.com (e108454-lin.cambridge.arm.com [10.1.218.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D372E3F21A; Thu, 28 Jul 2016 07:52:15 -0700 (PDT) From: Julien Grall To: xen-devel@lists.xen.org Date: Thu, 28 Jul 2016 15:51:41 +0100 Message-Id: <1469717505-8026-19-git-send-email-julien.grall@arm.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1469717505-8026-1-git-send-email-julien.grall@arm.com> References: <1469717505-8026-1-git-send-email-julien.grall@arm.com> Cc: proskurin@sec.in.tum.de, Julien Grall , sstabellini@kernel.org, steve.capper@arm.com, wei.chen@linaro.org Subject: [Xen-devel] [RFC 18/22] xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" The ARM architecture mandates to use of a break-before-make sequence when changing translation entries if the page table is shared between multiple CPUs whenever a valid entry is replaced by another valid entry (see D4.7.1 in ARM DDI 0487A.j for more details). The break-before-make sequence can be divided in the following steps: 1) Invalidate the old entry in the page table 2) Issue a TLB invalidation instruction for the address associated to this entry 3) Write the new entry The current P2M code implemented in apply_one_level does not respect this sequence and may result to break coherency on some processors. Adapting the current implementation to use the break-before-make sequence would imply some code duplication and more TLBs invalidation than necessary. For instance, if we are replacing a 4KB page and the current mapping in the P2M is using a 1GB superpage, the following steps will happen: 1) Shatter the 1GB superpage into a series of 2MB superpages 2) Shatter the 2MB superpage into a series of 4KB pages 3) Replace the 4KB page As the current implementation is shattering while descending and install the mapping, Xen would need to issue 3 TLB invalidation instructions which is clearly inefficient. Furthermore, all the operations which modify the page table are using the same skeleton. It is more complicated to maintain different code paths than having a generic function that set an entry and take care of the break-before-make sequence. The new implementation is based on the x86 EPT one which, I think, fits quite well for the break-before-make sequence whilst keeping the code simple. The main function of the new implementation is __p2m_get_entry. It will only work on mapping that are aligned to a block entry in the page table (i.e 1GB, 2MB, 4KB when using a 4KB granularity). Another function, p2m_get_entry, is provided to break down is region into mapping that is aligned to a block entry. Note that to keep this patch "small", there are no caller of those functions added in this patch (they will be added in follow-up patches). Signed-off-by: Julien Grall --- I need to find the impact of this new implementation on ARM32 because the domheap is not always mapped. This means that Xen needs to map/unmap everytime the page associated to the page table. It might be possible to re-use some caching structure as the current implementation does or rework the way a domheap is mapped/unmapped. Also, this code still contain few TODOs mostly to add sanity checks and few optimization. The IOMMU is not yet supported. --- xen/arch/arm/p2m.c | 335 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 335 insertions(+) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index c93e554..297b176 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -750,6 +750,341 @@ static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type) } } +#if 0 +/* Free lpae sub-tree behind an entry */ +static void p2m_free_entry(struct p2m_domain *p2m, + lpae_t entry, unsigned int level) +{ + unsigned int i; + lpae_t *table; + mfn_t mfn; + + /* Nothing to do if the entry is invalid or a super-page */ + if ( !p2m_valid(entry) || p2m_is_superpage(entry, level) ) + return; + + if ( level == 3 ) + { + p2m_put_l3_page(_mfn(entry.p2m.base), entry.p2m.type); + return; + } + + table = map_domain_page(_mfn(entry.p2m.base)); + for ( i = 0; i < LPAE_ENTRIES; i++ ) + p2m_free_entry(p2m, *(table + i), level + 1); + + unmap_domain_page(table); + + /* + * Make sure all the references in the TLB have been removed before + * freing the intermediate page table. + * XXX: Should we defer the free of the page table to avoid the + * flush? + */ + if ( p2m->need_flush ) + p2m_flush_tlb_sync(p2m); + + mfn = _mfn(entry.p2m.base); + ASSERT(mfn_valid(mfn_x(mfn))); + + free_domheap_page(mfn_to_page(mfn_x(mfn))); +} + +static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry, + unsigned int level, unsigned int target, + const unsigned int *offsets) +{ + struct page_info *page; + unsigned int i; + lpae_t pte, *table; + bool rv = true; + + /* Convenience aliases */ + p2m_type_t t = entry->p2m.type; + mfn_t mfn = _mfn(entry->p2m.base); + + /* Convenience aliases */ + unsigned int next_level = level + 1; + unsigned int level_order = level_orders[next_level]; + + /* + * This should only be called with target != level and the entry is + * a superpage. + */ + ASSERT(level < target); + ASSERT(p2m_is_superpage(*entry, level)); + + page = alloc_domheap_page(NULL, 0); + if ( !page ) + return false; + + page_list_add(page, &p2m->pages); + table = __map_domain_page(page); + + /* + * We are either splitting a first level 1G page into 512 second level + * 2M pages, or a second level 2M page into 512 third level 4K pages. + */ + for ( i = 0; i < LPAE_ENTRIES; i++ ) + { + lpae_t *new_entry = table + i; + + pte = mfn_to_p2m_entry(mfn, t, p2m->default_access); + + mfn = mfn_add(mfn, (1UL << level_order)); + + /* + * First and second level pages set p2m.table = 0, but third + * level entries set p2m.table = 1. + */ + if ( next_level < 3 ) + pte.p2m.table = 0; + + write_pte(new_entry, pte); + } + + /* + * Shatter superpage in the page to the level we want to make the + * changes. + * This is done outside the loop to avoid checking the offset to + * know whether the entry should be shattered for every entry. + */ + if ( next_level != target ) + rv = p2m_split_superpage(p2m, table + offsets[next_level], + level + 1, target, offsets); + + if ( p2m->clean_pte ) + clean_dcache_va_range(table, PAGE_SIZE); + + unmap_domain_page(table); + + pte = mfn_to_p2m_entry(_mfn(page_to_mfn(page)), p2m_invalid, + p2m->default_access); + + p2m_write_pte(entry, pte, p2m->clean_pte); + + /* + * Even if we failed, we should install the newly allocated LPAE + * entry. The caller will be in charge to free the sub-tree. + * XXX: See if we can free entry here. + */ + *entry = pte; + + return rv; +} + +/* + * Insert an entry in the p2m. This should be called with a mapping + * equal to a page/superpage (4K, 2M, 1G). + */ +static int __p2m_set_entry(struct p2m_domain *p2m, + gfn_t sgfn, + unsigned int page_order, + mfn_t smfn, + p2m_type_t t, + p2m_access_t a) +{ + paddr_t addr = pfn_to_paddr(gfn_x(sgfn)); + unsigned int level = 0; + unsigned int target = 3 - (page_order / LPAE_SHIFT); + lpae_t *entry, *table, orig_pte; + int rc; + + /* Convenience aliases */ + const unsigned int offsets[4] = { + zeroeth_table_offset(addr), + first_table_offset(addr), + second_table_offset(addr), + third_table_offset(addr) + }; + + /* TODO: Check the validity for the address */ + + ASSERT(p2m_is_write_locked(p2m)); + + /* + * Check if the level target is valid: we only support + * 4K - 2M - 1G mapping. + */ + ASSERT(target > 0 && target <= 3); + + table = p2m_get_root_pointer(p2m, sgfn); + if ( !table ) + return -EINVAL; + + for ( level = P2M_ROOT_LEVEL; level < target; level++ ) + { + rc = p2m_next_level(p2m, false, &table, offsets[level]); + if ( rc == GUEST_TABLE_MAP_FAILED ) + { + rc = -ENOENT; + goto out; + } + else if ( rc != GUEST_TABLE_NORMAL_PAGE ) + break; + } + + entry = table + offsets[level]; + + /* + * If we are here with level < target, we must be at a leaf node, + * and we need to break up the superpage. + */ + if ( level < target ) + { + /* We need to split the original page. */ + lpae_t split_pte = *entry; + + ASSERT(p2m_is_superpage(*entry, level)); + + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) + { + p2m_free_entry(p2m, split_pte, level); + rc = -ENOMEM; + goto out; + } + + /* + * Follow the break-before-sequence to update the entry. + * For more details see (D4.7.1 in ARM DDI 0487A.j). + * XXX: Can we flush by address? + */ + p2m_remove_pte(entry, p2m->clean_pte); + p2m_flush_tlb_sync(p2m); + + p2m_write_pte(entry, split_pte, p2m->clean_pte); + + /* then move to the level we want to make real changes */ + for ( ; level < target; level++ ) + { + rc = p2m_next_level(p2m, true, &table, offsets[level]); + + /* + * The entry should be found and either be a table + * or a superpage if level 3 is not targeted + */ + ASSERT(rc == GUEST_TABLE_NORMAL_PAGE || + (rc == GUEST_TABLE_SUPER_PAGE && target < 3)); + } + + entry = table + offsets[level]; + } + + /* + * We should always be there with the correct level because + * all the intermediate tables have been installed if necessary. + */ + ASSERT(level == target); + + orig_pte = *entry; + + /* + * The radix-tree can only work on 4KB. This is only used when + * memaccess is enabled. + */ + ASSERT(!p2m->mem_access_enabled || page_order == 0); + /* + * The access type should always be p2m_access_rwx when the mapping + * is removed. + */ + ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx)); + /* + * Update the mem access permission before update the P2M. So we + * don't have to revert the mapping if it has failed. + */ + rc = p2m_mem_access_radix_set(p2m, sgfn, a); + if ( rc ) + goto out; + + /* + * Always remove the entry in order to follow the break-before-make + * sequence when updating the translation table (D4.7.1 in ARM DDI + * 0487A.j). + */ + if ( p2m_valid(orig_pte) ) + p2m_remove_pte(entry, p2m->clean_pte); + + if ( mfn_eq(smfn, INVALID_MFN) ) + /* Flush can be deferred if the entry is removed */ + p2m->need_flush |= !!p2m_valid(orig_pte); + else + { + lpae_t pte; + + /* + * Flush the TLB before write the new one to keep coherency. + * XXX: Can we flush by address? + */ + if ( p2m_valid(orig_pte) ) + p2m_flush_tlb_sync(p2m); + + pte = mfn_to_p2m_entry(smfn, t, a); + if ( level < 3 ) + pte.p2m.table = 0; /* Superpage entry */ + + p2m_write_pte(entry, pte, p2m->clean_pte); + + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn, + gfn_add(sgfn, 1 << page_order)); + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn); + } + + /* + * Free the entry only if the original pte was valid and the base + * is different (to avoid freeing when permission is changed). + */ + if ( p2m_valid(orig_pte) && entry->p2m.base != orig_pte.p2m.base ) + p2m_free_entry(p2m, orig_pte, level); + + /* XXX: Flush iommu */ + + rc = 0; + +out: + unmap_domain_page(table); + + return rc; +} + +static int p2m_set_entry(struct p2m_domain *p2m, + gfn_t sgfn, + unsigned long todo, + mfn_t smfn, + p2m_type_t t, + p2m_access_t a) +{ + int rc = 0; + + while ( todo ) + { + unsigned long mask = gfn_x(sgfn) | mfn_x(smfn) | todo; + unsigned long order; + + /* Always map 4k by 4k when memaccess is enabled */ + if ( unlikely(p2m->mem_access_enabled) ) + order = THIRD_ORDER; + if ( !(mask & ((1UL << FIRST_ORDER) - 1)) ) + order = FIRST_ORDER; + else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) ) + order = SECOND_ORDER; + else + order = THIRD_ORDER; + + rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a); + if ( rc ) + break; + + sgfn = gfn_add(sgfn, (1 << order)); + if ( !mfn_eq(smfn, INVALID_MFN) ) + smfn = mfn_add(smfn, (1 << order)); + + todo -= (1 << order); + } + + return rc; +} +#endif + /* * Returns true if start_gpaddr..end_gpaddr contains at least one * suitably aligned level_size mappping of maddr.