diff mbox series

[1/5] ioremap: Rework pXd_free_pYd_page() API

Message ID 1536747974-25875-2-git-send-email-will.deacon@arm.com
State Superseded
Headers show
Series Clean up huge vmap and ioremap code | expand

Commit Message

Will Deacon Sept. 12, 2018, 10:26 a.m. UTC
The recently merged API for ensuring break-before-make on page-table
entries when installing huge mappings in the vmalloc/ioremap region is
fairly counter-intuitive, resulting in the arch freeing functions
(e.g. pmd_free_pte_page()) being called even on entries that aren't
present. This resulted in a minor bug in the arm64 implementation, giving
rise to spurious VM_WARN messages.

This patch moves the pXd_present() checks out into the core code,
refactoring the callsites at the same time so that we avoid the complex
conjunctions when determining whether or not we can put down a huge
mapping.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 lib/ioremap.c | 56 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 14 deletions(-)

-- 
2.1.4

Comments

Kani, Toshi Sept. 14, 2018, 8:36 p.m. UTC | #1
On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> The recently merged API for ensuring break-before-make on page-table

> entries when installing huge mappings in the vmalloc/ioremap region is

> fairly counter-intuitive, resulting in the arch freeing functions

> (e.g. pmd_free_pte_page()) being called even on entries that aren't

> present. This resulted in a minor bug in the arm64 implementation, giving

> rise to spurious VM_WARN messages.

> 

> This patch moves the pXd_present() checks out into the core code,

> refactoring the callsites at the same time so that we avoid the complex

> conjunctions when determining whether or not we can put down a huge

> mapping.

> 

> Cc: Chintan Pandya <cpandya@codeaurora.org>

> Cc: Toshi Kani <toshi.kani@hpe.com>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Cc: Michal Hocko <mhocko@suse.com>

> Cc: Andrew Morton <akpm@linux-foundation.org>

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>

> Signed-off-by: Will Deacon <will.deacon@arm.com>


Yes, this looks nicer.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>


Thanks,
-Toshi
Kani, Toshi Sept. 14, 2018, 9:10 p.m. UTC | #2
On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:
> On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:

> > The recently merged API for ensuring break-before-make on page-table

> > entries when installing huge mappings in the vmalloc/ioremap region is

> > fairly counter-intuitive, resulting in the arch freeing functions

> > (e.g. pmd_free_pte_page()) being called even on entries that aren't

> > present. This resulted in a minor bug in the arm64 implementation, giving

> > rise to spurious VM_WARN messages.

> > 

> > This patch moves the pXd_present() checks out into the core code,

> > refactoring the callsites at the same time so that we avoid the complex

> > conjunctions when determining whether or not we can put down a huge

> > mapping.

> > 

> > Cc: Chintan Pandya <cpandya@codeaurora.org>

> > Cc: Toshi Kani <toshi.kani@hpe.com>

> > Cc: Thomas Gleixner <tglx@linutronix.de>

> > Cc: Michal Hocko <mhocko@suse.com>

> > Cc: Andrew Morton <akpm@linux-foundation.org>

> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> 

> Yes, this looks nicer.

> 

> Reviewed-by: Toshi Kani <toshi.kani@hpe.com>


Sorry, I take it back since I got a question...

+static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> +				unsigned long end, phys_addr_t

phys_addr,
> +				pgprot_t prot)

> +{

> +	if (!ioremap_pmd_enabled())

> +		return 0;

> +

> +	if ((end - addr) != PMD_SIZE)

> +		return 0;

> +

> +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))

> +		return 0;

> +

> +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))

> +		return 0;


Is pm_present() a proper check here?  We probably do not have this case
for iomap, but I wonder if one can drop p-bit while it has a pte page
underneath.

Thanks,
-Toshi


> +

> +	return pmd_set_huge(pmd, phys_addr, prot);

> +}

> +
Will Deacon Sept. 17, 2018, 11:33 a.m. UTC | #3
On Fri, Sep 14, 2018 at 09:10:49PM +0000, Kani, Toshi wrote:
> On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:

> > On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:

> > > The recently merged API for ensuring break-before-make on page-table

> > > entries when installing huge mappings in the vmalloc/ioremap region is

> > > fairly counter-intuitive, resulting in the arch freeing functions

> > > (e.g. pmd_free_pte_page()) being called even on entries that aren't

> > > present. This resulted in a minor bug in the arm64 implementation, giving

> > > rise to spurious VM_WARN messages.

> > > 

> > > This patch moves the pXd_present() checks out into the core code,

> > > refactoring the callsites at the same time so that we avoid the complex

> > > conjunctions when determining whether or not we can put down a huge

> > > mapping.

> > > 

> > > Cc: Chintan Pandya <cpandya@codeaurora.org>

> > > Cc: Toshi Kani <toshi.kani@hpe.com>

> > > Cc: Thomas Gleixner <tglx@linutronix.de>

> > > Cc: Michal Hocko <mhocko@suse.com>

> > > Cc: Andrew Morton <akpm@linux-foundation.org>

> > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>

> > > Signed-off-by: Will Deacon <will.deacon@arm.com>

> > 

> > Yes, this looks nicer.

> > 

> > Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

> 

> Sorry, I take it back since I got a question...

> 

> +static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,

> > +				unsigned long end, phys_addr_t

> phys_addr,

> > +				pgprot_t prot)

> > +{

> > +	if (!ioremap_pmd_enabled())

> > +		return 0;

> > +

> > +	if ((end - addr) != PMD_SIZE)

> > +		return 0;

> > +

> > +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))

> > +		return 0;

> > +

> > +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))

> > +		return 0;

> 

> Is pm_present() a proper check here?  We probably do not have this case

> for iomap, but I wonder if one can drop p-bit while it has a pte page

> underneath.


For ioremap/vunmap the pXd_present() check is correct, yes. The vunmap()
code only ever clears leaf entries, leaving table entries intact. If it
did clear table entries, you'd be stuck here because you wouldn't have
the address of the table to free.

If somebody called pmd_mknotpresent() on a table entry, we may run into
problems, but it's only used for huge mappings afaict.

Will
Kani, Toshi Sept. 17, 2018, 6:38 p.m. UTC | #4
On Mon, 2018-09-17 at 12:33 +0100, Will Deacon wrote:
> On Fri, Sep 14, 2018 at 09:10:49PM +0000, Kani, Toshi wrote:

> > On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:

> > > On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:

> > > > The recently merged API for ensuring break-before-make on page-table

> > > > entries when installing huge mappings in the vmalloc/ioremap region is

> > > > fairly counter-intuitive, resulting in the arch freeing functions

> > > > (e.g. pmd_free_pte_page()) being called even on entries that aren't

> > > > present. This resulted in a minor bug in the arm64 implementation, giving

> > > > rise to spurious VM_WARN messages.

> > > > 

> > > > This patch moves the pXd_present() checks out into the core code,

> > > > refactoring the callsites at the same time so that we avoid the complex

> > > > conjunctions when determining whether or not we can put down a huge

> > > > mapping.

> > > > 

> > > > Cc: Chintan Pandya <cpandya@codeaurora.org>

> > > > Cc: Toshi Kani <toshi.kani@hpe.com>

> > > > Cc: Thomas Gleixner <tglx@linutronix.de>

> > > > Cc: Michal Hocko <mhocko@suse.com>

> > > > Cc: Andrew Morton <akpm@linux-foundation.org>

> > > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>

> > > > Signed-off-by: Will Deacon <will.deacon@arm.com>

> > > 

> > > Yes, this looks nicer.

> > > 

> > > Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

> > 

> > Sorry, I take it back since I got a question...

> > 

> > +static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,

> > > +				unsigned long end, phys_addr_t

> > 

> > phys_addr,

> > > +				pgprot_t prot)

> > > +{

> > > +	if (!ioremap_pmd_enabled())

> > > +		return 0;

> > > +

> > > +	if ((end - addr) != PMD_SIZE)

> > > +		return 0;

> > > +

> > > +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))

> > > +		return 0;

> > > +

> > > +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))

> > > +		return 0;

> > 

> > Is pm_present() a proper check here?  We probably do not have this case

> > for iomap, but I wonder if one can drop p-bit while it has a pte page

> > underneath.

> 

> For ioremap/vunmap the pXd_present() check is correct, yes. The vunmap()

> code only ever clears leaf entries, leaving table entries intact. 


Right. I was thinking if such case happens in future.

> If it

> did clear table entries, you'd be stuck here because you wouldn't have

> the address of the table to free.

> 

> If somebody called pmd_mknotpresent() on a table entry, we may run into

> problems, but it's only used for huge mappings afaict.


Treating a table entry valid when p-bit is off is risky as well. So, I
agree with the pXd_present() check.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>


Thanks,
-Toshi
diff mbox series

Patch

diff --git a/lib/ioremap.c b/lib/ioremap.c
index 517f5853ffed..6c72764af19c 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -76,6 +76,25 @@  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pmd_enabled())
+		return 0;
+
+	if ((end - addr) != PMD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
+		return 0;
+
+	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
+		return 0;
+
+	return pmd_set_huge(pmd, phys_addr, prot);
+}
+
 static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -89,13 +108,8 @@  static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 	do {
 		next = pmd_addr_end(addr, end);
 
-		if (ioremap_pmd_enabled() &&
-		    ((next - addr) == PMD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd, addr)) {
-			if (pmd_set_huge(pmd, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
+			continue;
 
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
@@ -103,6 +117,25 @@  static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pud_enabled())
+		return 0;
+
+	if ((end - addr) != PUD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PUD_SIZE))
+		return 0;
+
+	if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
+		return 0;
+
+	return pud_set_huge(pud, phys_addr, prot);
+}
+
 static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -116,13 +149,8 @@  static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 	do {
 		next = pud_addr_end(addr, end);
 
-		if (ioremap_pud_enabled() &&
-		    ((next - addr) == PUD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud, addr)) {
-			if (pud_set_huge(pud, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pud(pud, addr, next, phys_addr + addr, prot))
+			continue;
 
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;