diff mbox

arm64: Increase the max granular size

Message ID 20151104123640.GK7637@e104818-lin.cambridge.arm.com
State New
Headers show

Commit Message

Catalin Marinas Nov. 4, 2015, 12:36 p.m. UTC
(+ linux-mm)

On Tue, Nov 03, 2015 at 05:33:25PM -0600, Christoph Lameter wrote:
> On Tue, 3 Nov 2015, Catalin Marinas wrote:

> > (cc'ing Jonsoo and Christoph; summary: slab failure with L1_CACHE_BYTES

> > of 128 and sizeof(kmem_cache_node) of 152)

> 

> Hmmm... Yes that would mean use the 196 sized kmalloc array which is not a

> power of two slab. But the code looks fine to me.


I'm not entirely sure that gets used (or even created).
kmalloc_index(152) returns 8 (INDEX_NODE==8) since KMALLOC_MIN_SIZE==128
and the "kmalloc-node" cache size is 256.

> > If I revert commit 8fc9cf420b36 ("slab: make more slab management

> > structure off the slab") it works but I still need to figure out how

> > slab indices are calculated. The size_index[] array is overridden so

> > that 0..15 are 7 and 16..23 are 8. But the kmalloc_caches[7] has never

> > been populated, hence the BUG_ON. Another option may be to change

> > kmalloc_size and kmalloc_index to cope with KMALLOC_MIN_SIZE of 128.

> >

> > I'll do some more investigation tomorrow.

> 

> The commit allows off slab management for PAGE_SIZE >> 5 that is 128.


This means that the first kmalloc cache to be created, "kmalloc-128", is
off slab.

> After that commit kmem_cache_create would try to allocate an off slab

> management structure which is not available during early boot.

> But the slab_early_init is set which should prevent the use of an off slab

> management infrastructure in kmem_cache_create().

> 

> However, the failure in line 2283 shows that the OFF SLAB flag was

> mistakenly set anyways!!!! Something must havee cleared slab_early_init?


slab_early_init is cleared after "kmem_cache" and "kmalloc-node" caches
are successfully created. Following this, the minimum kmalloc allocation
goes off-slab when KMALLOC_MIN_SIZE == 128.

When trying to create "kmalloc-128" (via create_kmalloc_caches(),
slab_early_init is already 0), __kmem_cache_create() requires an
allocation of 32 bytes (freelist_size) which has index 7, hence exactly
the kmalloc_caches[7] we are trying to create.

The simplest option would be to make sure that off slab isn't allowed
for caches of KMALLOC_MIN_SIZE or smaller, with the drawback that not
only "kmalloc-128" but any other such caches will be on slab.

I think a better option would be to first check that there is a
kmalloc_caches[] entry for freelist_size before deciding to go off-slab.
See below:

-----8<------------------------------

From ce27c5c6d156522ceaff20de8a7af281cf079b6f Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>

Date: Wed, 4 Nov 2015 12:19:00 +0000
Subject: [PATCH] mm: slab: Avoid BUG when KMALLOC_MIN_SIZE == (PAGE_SIZE >> 5)

The slab allocator, following commit 8fc9cf420b36 ("slab: make more slab
management structure off the slab"), tries to place slab management
off-slab when the object size is PAGE_SIZE >> 5 or larger. On arm64 with
KMALLOC_MIN_SIZE = L1_CACHE_BYTES = 128, "kmalloc-128" is the smallest
cache to be created after slab_early_init = 0. The current
__kmem_cache_create() implementation aims to place the management
structure off-slab. However, the kmalloc_slab(freelist_size) has not
been populated yet, triggering a bug on !cachep->freelist_cache.

This patch addresses the problem by keeping the management structure
on-slab if the corresponding kmalloc_caches[] is not populated yet.

Fixes: 8fc9cf420b36 ("slab: make more slab management structure off the slab")
Cc: <stable@vger.kernel.org> # 3.15+
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

---
 mm/slab.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Catalin Marinas Nov. 4, 2015, 2:54 p.m. UTC | #1
On Wed, Nov 04, 2015 at 07:53:50AM -0600, Christoph Lameter wrote:
> On Wed, 4 Nov 2015, Catalin Marinas wrote:

> 

> > The simplest option would be to make sure that off slab isn't allowed

> > for caches of KMALLOC_MIN_SIZE or smaller, with the drawback that not

> > only "kmalloc-128" but any other such caches will be on slab.

> 

> The reason for an off slab configuration is denser object packing.

> 

> > I think a better option would be to first check that there is a

> > kmalloc_caches[] entry for freelist_size before deciding to go off-slab.

> 

> Hmmm.. Yes seems to be an option.

> 

> Maybe we simply revert commit 8fc9cf420b36 instead?


I'm fine with this. Also note that the arm64 commit changing
L1_CACHE_BYTES to 128 hasn't been pushed yet (it's queued for 4.4).

> That does not seem to make too much sense to me and the goal of the

> commit cannot be accomplished on ARM. Your patch essentially reverts

> the effect anyways.


In theory it only reverts the effect for the first kmalloc_cache
("kmalloc-128" in the arm64 case). Any other bigger cache which would
not be mergeable with an existing one still has the potential of
off-slab management.

> Smaller slabs really do not need off slab management anyways since they

> will only loose a few objects per slab page.


IIUC, starting with 128 slab size for a 4KB page, you have 32 objects
per page. The freelist takes 32 bytes (or 31), therefore you waste a
single slab object. However, only 1/4 of it is used for freelist and the
waste gets bigger with 256 slab size, hence the original commit.

BTW, assuming L1_CACHE_BYTES is 512 (I don't ever see this happening but
just in theory), we potentially have the same issue. What would save us
is that INDEX_NODE would match the first "kmalloc-512" cache, so we have
it pre-populated.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/mm/slab.c b/mm/slab.c
index 4fcc5dd8d5a6..d4a21736eb5d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2246,16 +2246,33 @@  __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		freelist_size = calculate_freelist_size(cachep->num, 0);
+		size_t off_freelist_size = calculate_freelist_size(cachep->num, 0);
+
+		cachep->freelist_cache = kmalloc_slab(off_freelist_size, 0u);
+		if (ZERO_OR_NULL_PTR(cachep->freelist_cache)) {
+			/*
+			 * We don't have kmalloc_caches[] populated for
+			 * off_freelist_size yet. This can happen during
+			 * create_kmalloc_caches() when KMALLOC_MIN_SIZE >=
+			 * (PAGE_SHIFT >> 5) and CFLGS_OFF_SLAB is set. Move
+			 * the cache on-slab.
+			 */
+			flags &= ~CFLGS_OFF_SLAB;
+			left_over = calculate_slab_order(cachep, size, cachep->align, flags);
+		} else {
+			freelist_size = off_freelist_size;
 
 #ifdef CONFIG_PAGE_POISONING
-		/* If we're going to use the generic kernel_map_pages()
-		 * poisoning, then it's going to smash the contents of
-		 * the redzone and userword anyhow, so switch them off.
-		 */
-		if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
-			flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
+			/*
+			 * If we're going to use the generic kernel_map_pages()
+			 * poisoning, then it's going to smash the contents of
+			 * the redzone and userword anyhow, so switch them off.
+			 */
+			if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
+				flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
 #endif
+		}
+
 	}
 
 	cachep->colour_off = cache_line_size();
@@ -2271,18 +2288,6 @@  __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
 	cachep->size = size;
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
-	if (flags & CFLGS_OFF_SLAB) {
-		cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
-		/*
-		 * This is a possibility for one of the kmalloc_{dma,}_caches.
-		 * But since we go off slab only for object size greater than
-		 * PAGE_SIZE/8, and kmalloc_{dma,}_caches get created
-		 * in ascending order,this should not happen at all.
-		 * But leave a BUG_ON for some lucky dude.
-		 */
-		BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
-	}
-
 	err = setup_cpu_cache(cachep, gfp);
 	if (err) {
 		__kmem_cache_shutdown(cachep);