diff mbox series

[1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

Message ID 20181025012745.20884-1-rafael.tinoco@linaro.org
State New
Headers show
Series [1/2] mm/zsmalloc.c: check encoded object value overflow for PAE | expand

Commit Message

Rafael David Tinoco Oct. 25, 2018, 1:27 a.m. UTC
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
physical frame number might be so big that zsmalloc obj encoding (to
location) will break IF architecture does not re-define
MAX_PHYSMEM_BITS, causing:

[  497.097843] ==================================================================
[  497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
[  497.105933] Read of size 4 at addr 00000000 by task mkfs.ext4/623
[  497.109684]
[  497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15
[  497.116098] Hardware name: Generic DT based system
[  497.119094] [<c0418f7c>] (unwind_backtrace) from [<c0410ca4>] (show_stack+0x20/0x24)
[  497.123819] [<c0410ca4>] (show_stack) from [<c16bd540>] (dump_stack+0xbc/0xe8)
[  497.128299] [<c16bd540>] (dump_stack) from [<c06cab74>] (kasan_report+0x248/0x390)
[  497.132928] [<c06cab74>] (kasan_report) from [<c06c94e8>] (__asan_load4+0x78/0xb4)
[  497.137530] [<c06c94e8>] (__asan_load4) from [<c06ddc24>] (zs_map_object+0xa4/0x2bc)
[  497.142335] [<c06ddc24>] (zs_map_object) from [<bf0bbbd8>] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
[  497.148294] [<bf0bbbd8>] (zram_bvec_rw.constprop.2 [zram]) from [<bf0bc3cc>] (zram_make_request+0x234/0x46c [zram])
[  497.154653] [<bf0bc3cc>] (zram_make_request [zram]) from [<c09aff9c>] (generic_make_request+0x304/0x63c)
[  497.160413] [<c09aff9c>] (generic_make_request) from [<c09b0320>] (submit_bio+0x4c/0x1c8)
[  497.165379] [<c09b0320>] (submit_bio) from [<c0743570>] (submit_bh_wbc.constprop.15+0x238/0x26c)
[  497.170775] [<c0743570>] (submit_bh_wbc.constprop.15) from [<c0746cf8>] (__block_write_full_page+0x524/0x76c)
[  497.176776] [<c0746cf8>] (__block_write_full_page) from [<c07472c4>] (block_write_full_page+0x1bc/0x1d4)
[  497.182549] [<c07472c4>] (block_write_full_page) from [<c0748eb4>] (blkdev_writepage+0x24/0x28)
[  497.187849] [<c0748eb4>] (blkdev_writepage) from [<c064a780>] (__writepage+0x44/0x78)
[  497.192633] [<c064a780>] (__writepage) from [<c064b50c>] (write_cache_pages+0x3b8/0x800)
[  497.197616] [<c064b50c>] (write_cache_pages) from [<c064dd78>] (generic_writepages+0x74/0xa0)
[  497.202807] [<c064dd78>] (generic_writepages) from [<c0748e64>] (blkdev_writepages+0x18/0x1c)
[  497.208022] [<c0748e64>] (blkdev_writepages) from [<c064e890>] (do_writepages+0x68/0x134)
[  497.213002] [<c064e890>] (do_writepages) from [<c06368a4>] (__filemap_fdatawrite_range+0xb0/0xf4)
[  497.218447] [<c06368a4>] (__filemap_fdatawrite_range) from [<c0636b68>] (file_write_and_wait_range+0x64/0xd0)
[  497.224416] [<c0636b68>] (file_write_and_wait_range) from [<c0747af8>] (blkdev_fsync+0x54/0x84)
[  497.229749] [<c0747af8>] (blkdev_fsync) from [<c0739dac>] (vfs_fsync_range+0x70/0xd4)
[  497.234549] [<c0739dac>] (vfs_fsync_range) from [<c0739e98>] (do_fsync+0x4c/0x80)
[  497.239159] [<c0739e98>] (do_fsync) from [<c073a26c>] (sys_fsync+0x1c/0x20)
[  497.243407] [<c073a26c>] (sys_fsync) from [<c0401000>] (ret_fast_syscall+0x0/0x2c)

like in this ARM 32-bit (LPAE enabled), example.

That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default
to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be
wrong: which may cause obj variable to overflow if PFN is HIGHMEM and
referencing any page above the 4GB watermark.

This commit exposes the BUG IF the architecture supports PAE AND does
not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org>

---
 mm/zsmalloc.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

-- 
2.19.1

Comments

Sergey Senozhatsky Oct. 25, 2018, 5:29 a.m. UTC | #1
On (10/24/18 22:27), Rafael David Tinoco wrote:
>  static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)

>  {

> -	unsigned long obj;

> +	unsigned long obj, pfn;

> +

> +	pfn = page_to_pfn(page);

> +

> +	if (unlikely(OBJ_OVERFLOW(pfn)))

> +		BUG();


The trend these days is to have less BUG/BUG_ON-s in the kernel.

	-ss
Rafael David Tinoco Oct. 25, 2018, 11:03 a.m. UTC | #2
On Thu, Oct 25, 2018 at 2:29 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (10/24/18 22:27), Rafael David Tinoco wrote:

>>  static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)

>>  {

>> -     unsigned long obj;

>> +     unsigned long obj, pfn;

>> +

>> +     pfn = page_to_pfn(page);

>> +

>> +     if (unlikely(OBJ_OVERFLOW(pfn)))

>> +             BUG();

>

> The trend these days is to have less BUG/BUG_ON-s in the kernel.

>

>         -ss


For this case, IMHO, it is worth.

It will avoid a investigation like:
https://bugs.linaro.org/show_bug.cgi?id=3765#c7 and and #c8, where I
had to poison slab allocation - to force both zs_handle and zspage
slabs not to be merged - and to make sure the zspage slab had a good
magic number AND to identify why the bad paging request happened.

If this happens again, for any other arch supporting PAE that does not
declare MAX_POSSIBLE_PHYSMEM_BITS or MAX_PHYSMEM_BITS appropriately,
the kernel will panic, no matter what, by the time it reaches
obj_to_location(). Things can be more complicated about declarations
for PAE if we consider ARM can declare MAX_PHYSMEM_BITS differently in
arch/arm/mach-XXX and/or, for this case, when having, or not SPARSEMEM
set (if I had SPARSEMEM set I would not face this, for example).

If this occurs, the kernel will panic, no matter what, by the time it
reaches obj_to_location()... so why not to BUG() here and let user to
know exactly where it panic-ed and why ? Other option would be to
WARN() here and let it panic naturally because of bad paging request
in a very near future... please advise.

Thanks,
Best Rgds
-Rafael
Russell King (Oracle) Oct. 25, 2018, noon UTC | #3
On Wed, Oct 24, 2018 at 10:27:44PM -0300, Rafael David Tinoco wrote:
> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the

> physical frame number might be so big that zsmalloc obj encoding (to

> location) will break IF architecture does not re-define

> MAX_PHYSMEM_BITS, causing:


I think there's a deeper problem here - a misunderstanding of what
MAX_PHYSMEM_BITS is.

MAX_PHYSMEM_BITS is a definition for sparsemem, and is only visible
when sparsemem is enabled.  When sparsemem is disabled, asm/sparsemem.h
is not included (and should not be included) which means there is no
MAX_PHYSMEM_BITS definition.

I don't think zsmalloc.c should be (ab)using MAX_PHYSMEM_BITS, and
your description above makes it sound like you expect it to always be
defined.

If we want to have a definition for this, we shouldn't be playing
fragile games like:

#ifndef MAX_POSSIBLE_PHYSMEM_BITS
#ifdef MAX_PHYSMEM_BITS
#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
#else
/*
 * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
 * be PAGE_SHIFT
 */
#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif

but instead insist that MAX_PHYSMEM_BITS is defined _everywhere_.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Rafael David Tinoco Oct. 25, 2018, 12:37 p.m. UTC | #4
> MAX_PHYSMEM_BITS is a definition for sparsemem, and is only visible

> when sparsemem is enabled.  When sparsemem is disabled, asm/sparsemem.h

> is not included (and should not be included) which means there is no

> MAX_PHYSMEM_BITS definition.


Missed that part :\, tks.

> I don't think zsmalloc.c should be (ab)using MAX_PHYSMEM_BITS, and

> your description above makes it sound like you expect it to always be

> defined.

>

> If we want to have a definition for this, we shouldn't be playing

> fragile games like:

>

> #ifndef MAX_POSSIBLE_PHYSMEM_BITS

> #ifdef MAX_PHYSMEM_BITS

> #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS

> #else

> /*

>  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just

>  * be PAGE_SHIFT

>  */

> #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG

> #endif

> #endif

>

> but instead insist that MAX_PHYSMEM_BITS is defined _everywhere_.


Is it okay to propose using only MAX_PHYSMEM_BITS for zsmalloc (like
it was before commit 02390b87) instead, and make sure *at least* ARM
32/64 and x86/x64, for now, have it defined outside sparsemem headers
as well ? This way I can WARN_ONCE(), instead of BUG(), when specific
arch does not define it - enforcing behavior - showing BITS_PER_LONG
is being used instead of MAX_PHYSMEM_BITS (warning, at least once, for
the possibility of an overflow, like the issue showed in here).
kernel test robot Oct. 25, 2018, 12:42 p.m. UTC | #5
Hi Rafael,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux-sof-driver/master]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Rafael-David-Tinoco/mm-zsmalloc-c-check-encoded-object-value-overflow-for-PAE/20181025-110258
base:   https://github.com/thesofproject/linux master
config: um-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/module.h:9,
                    from mm/zsmalloc.c:33:
   mm/zsmalloc.c: In function 'location_to_obj':
>> mm/zsmalloc.c:129:17: warning: left shift count >= width of type [-Wshift-count-overflow]

     ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
                    ^
   include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
    # define unlikely(x) __builtin_expect(!!(x), 0)
                                             ^
>> mm/zsmalloc.c:886:15: note: in expansion of macro 'OBJ_OVERFLOW'

     if (unlikely(OBJ_OVERFLOW(pfn)))
                  ^~~~~~~~~~~~
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:clear_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:clear_bit_unlock
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit_lock
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 1 include/linux/kernel.h:___might_sleep
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 2 include/linux/list.h:__list_add
   Cyclomatic Complexity 1 include/linux/list.h:list_add
   Cyclomatic Complexity 1 include/linux/list.h:__list_del
   Cyclomatic Complexity 2 include/linux/list.h:__list_del_entry
   Cyclomatic Complexity 1 include/linux/list.h:list_del
   Cyclomatic Complexity 1 include/linux/list.h:list_del_init
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 include/linux/list.h:__list_splice
   Cyclomatic Complexity 2 include/linux/list.h:list_splice_init
   Cyclomatic Complexity 1 arch/um/include/shared/mem.h:to_virt
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 arch/um/include/asm/thread_info.h:current_thread_info
   Cyclomatic Complexity 1 include/asm-generic/preempt.h:preempt_count
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_set
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_inc
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_dec_and_test
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_sub
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_inc
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_dec
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_read
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_set
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_inc
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_inc
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_dec
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_add
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_sub
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_dec_and_test
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_read
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_inc
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_dec
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_add
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_sub
   Cyclomatic Complexity 1 arch/x86/um/asm/processor.h:rep_nop
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
   Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count
   Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false
   Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
   Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
   Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
   Cyclomatic Complexity 1 include/linux/workqueue.h:queue_work
   Cyclomatic Complexity 1 include/linux/workqueue.h:schedule_work
   Cyclomatic Complexity 1 include/linux/topology.h:numa_node_id
   Cyclomatic Complexity 1 include/linux/topology.h:numa_mem_id
   Cyclomatic Complexity 1 include/linux/gfp.h:__alloc_pages
   Cyclomatic Complexity 4 include/linux/gfp.h:__alloc_pages_node
   Cyclomatic Complexity 2 include/linux/gfp.h:alloc_pages_node
   Cyclomatic Complexity 4 include/linux/bit_spinlock.h:bit_spin_lock
   Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_trylock
   Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_unlock
   Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_is_locked
   Cyclomatic Complexity 1 include/linux/fs.h:mount_pseudo
   Cyclomatic Complexity 2 include/linux/page-flags.h:compound_head
   Cyclomatic Complexity 1 include/linux/page-flags.h:PagePoisoned
   Cyclomatic Complexity 1 include/linux/page-flags.h:PageLocked
   Cyclomatic Complexity 1 include/linux/page-flags.h:PagePrivate
   Cyclomatic Complexity 1 include/linux/page-flags.h:SetPagePrivate
   Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPagePrivate
   Cyclomatic Complexity 1 include/linux/page-flags.h:PageOwnerPriv1
   Cyclomatic Complexity 1 include/linux/page-flags.h:SetPageOwnerPriv1
   Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPageOwnerPriv1
   Cyclomatic Complexity 1 include/linux/page-flags.h:PageIsolated
   Cyclomatic Complexity 1 include/linux/page_ref.h:page_ref_count
   Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_inc
   Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_dec_and_test
   Cyclomatic Complexity 1 include/linux/mm.h:put_page_testzero
   Cyclomatic Complexity 1 include/linux/mm.h:page_mapcount_reset
   Cyclomatic Complexity 1 include/linux/mm.h:page_zonenum
   Cyclomatic Complexity 1 include/linux/mm.h:get_page
   Cyclomatic Complexity 2 include/linux/mm.h:put_page
   Cyclomatic Complexity 1 include/linux/mm.h:page_zone
   Cyclomatic Complexity 1 include/linux/vmstat.h:__inc_zone_state
   Cyclomatic Complexity 1 include/linux/vmstat.h:__dec_zone_state
   Cyclomatic Complexity 1 include/linux/vmstat.h:__inc_zone_page_state
   Cyclomatic Complexity 1 include/linux/vmstat.h:__dec_zone_page_state
   Cyclomatic Complexity 1 include/linux/mm.h:lowmem_page_address
   Cyclomatic Complexity 1 include/linux/uaccess.h:pagefault_disabled_inc
   Cyclomatic Complexity 1 include/linux/uaccess.h:pagefault_disabled_dec

vim +129 mm/zsmalloc.c

    32	
  > 33	#include <linux/module.h>

    34	#include <linux/kernel.h>
    35	#include <linux/sched.h>
    36	#include <linux/magic.h>
    37	#include <linux/bitops.h>
    38	#include <linux/errno.h>
    39	#include <linux/highmem.h>
    40	#include <linux/string.h>
    41	#include <linux/slab.h>
    42	#include <asm/tlbflush.h>
    43	#include <asm/pgtable.h>
    44	#include <linux/cpumask.h>
    45	#include <linux/cpu.h>
    46	#include <linux/vmalloc.h>
    47	#include <linux/preempt.h>
    48	#include <linux/spinlock.h>
    49	#include <linux/shrinker.h>
    50	#include <linux/types.h>
    51	#include <linux/debugfs.h>
    52	#include <linux/zsmalloc.h>
    53	#include <linux/zpool.h>
    54	#include <linux/mount.h>
    55	#include <linux/migrate.h>
    56	#include <linux/pagemap.h>
    57	#include <linux/fs.h>
    58	
    59	#define ZSPAGE_MAGIC	0x58
    60	
    61	/*
    62	 * This must be power of 2 and greater than of equal to sizeof(link_free).
    63	 * These two conditions ensure that any 'struct link_free' itself doesn't
    64	 * span more than 1 page which avoids complex case of mapping 2 pages simply
    65	 * to restore link_free pointer values.
    66	 */
    67	#define ZS_ALIGN		8
    68	
    69	/*
    70	 * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
    71	 * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
    72	 */
    73	#define ZS_MAX_ZSPAGE_ORDER 2
    74	#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
    75	
    76	#define ZS_HANDLE_SIZE (sizeof(unsigned long))
    77	
    78	/*
    79	 * Object location (<PFN>, <obj_idx>) is encoded as
    80	 * as single (unsigned long) handle value.
    81	 *
    82	 * Note that object index <obj_idx> starts from 0.
    83	 *
    84	 * This is made more complicated by various memory models and PAE.
    85	 */
    86	
    87	#ifndef MAX_POSSIBLE_PHYSMEM_BITS
    88	#ifdef MAX_PHYSMEM_BITS
    89	#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
    90	#else
    91	/*
    92	 * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
    93	 * be PAGE_SHIFT
    94	 */
    95	#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
    96	#endif
    97	#endif
    98	
    99	#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
   100	
   101	/*
   102	 * Memory for allocating for handle keeps object position by
   103	 * encoding <page, obj_idx> and the encoded value has a room
   104	 * in least bit(ie, look at obj_to_location).
   105	 * We use the bit to synchronize between object access by
   106	 * user and migration.
   107	 */
   108	#define HANDLE_PIN_BIT	0
   109	
   110	/*
   111	 * Head in allocated object should have OBJ_ALLOCATED_TAG
   112	 * to identify the object was allocated or not.
   113	 * It's okay to add the status bit in the least bit because
   114	 * header keeps handle which is 4byte-aligned address so we
   115	 * have room for two bit at least.
   116	 */
   117	#define OBJ_ALLOCATED_TAG 1
   118	#define OBJ_TAG_BITS 1
   119	#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
   120	#define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
   121	
   122	/*
   123	 * When using PAE, the obj encoding might overflow if arch does
   124	 * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
   125	 * This checks for a future bad page access, when de-coding obj.
   126	 */
   127	#define OBJ_OVERFLOW(_pfn) \
   128		(((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
 > 129		((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)

   130	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Russell King (Oracle) Oct. 25, 2018, 1:43 p.m. UTC | #6
On Thu, Oct 25, 2018 at 09:37:59AM -0300, Rafael David Tinoco wrote:
> Is it okay to propose using only MAX_PHYSMEM_BITS for zsmalloc (like

> it was before commit 02390b87) instead, and make sure *at least* ARM

> 32/64 and x86/x64, for now, have it defined outside sparsemem headers

> as well ?


It looks to me like this has been broken on ARM for quite some time,
predating that commit.  The original was:

#ifndef MAX_PHYSMEM_BITS
#ifdef CONFIG_HIGHMEM64G
#define MAX_PHYSMEM_BITS 36
#else /* !CONFIG_HIGHMEM64G */
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif
#define _PFN_BITS              (MAX_PHYSMEM_BITS - PAGE_SHIFT)

On ARM, CONFIG_HIGHMEM64G is never defined (it's an x86 private symbol)
which means that the above sets MAX_PHYSMEM_BITS to 32 on non-sparsemem
ARM LPAE platforms.  So commit 02390b87 hasn't really changed anything
as far as ARM LPAE is concerned - and this looks to be a bug that goes
all the way back to when zsmalloc.c was moved out of staging in 2014.

Digging further back, it seems this brokenness was introduced with:

commit 6e00ec00b1a76a199b8c0acae401757b795daf57
Author: Seth Jennings <sjenning@linux.vnet.ibm.com>
Date:   Mon Mar 5 11:33:22 2012 -0600

    staging: zsmalloc: calculate MAX_PHYSMEM_BITS if not defined

    This patch provides a way to determine or "set a
    reasonable value for" MAX_PHYSMEM_BITS in the case that
    it is not defined (i.e. !SPARSEMEM)

    Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>

    Acked-by: Nitin Gupta <ngupta@vflare.org>

    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


which, at the time, realised the problem with SPARSEMEM, but decided
that in the absense of SPARSEMEM, that MAX_PHYSMEM_BITS shall be
BITS_PER_LONG which seems absurd (see below.)

> This way I can WARN_ONCE(), instead of BUG(), when specific

> arch does not define it - enforcing behavior - showing BITS_PER_LONG

> is being used instead of MAX_PHYSMEM_BITS (warning, at least once, for

> the possibility of an overflow, like the issue showed in here).


Assuming that the maximum number of physical memory bits are
BITS_PER_LONG in the absense of MAX_POSSIBLE_PHYSMEM_BITS is a nonsense
- we have had the potential for PAE systems for a long time, and to
introduce new code that makes this assumption was plainly wrong.

We know when there's the potential for PAE, and thus more than
BITS_PER_LONG bits of physical memory address, through
CONFIG_PHYS_ADDR_T_64BIT.  So if we have the situation where
MAX_POSSIBLE_PHYSMEM_BITS (or the older case of MAX_PHYSMEM_BITS) not
being defined, but CONFIG_PHYS_ADDR_T_64BIT set, we should've been
erroring or something based on not knowing how many physical memory
bits are possible - it would be more than BITS_PER_LONG but less
than some unknown number of bits.

This is why I think any fallback here to BITS_PER_LONG is wrong.

What I suggested is to not fall back to BITS_PER_LONG in any case, but
always define MAX_PHYSMEM_BITS.  However, I now see that won't work for
x86 because MAX_PHYSMEM_BITS is not a constant anymore.

So I suggest everything that uses zsmalloc.c should instead define
MAX_POSSIBLE_PHYSMEM_BITS.

Note that there should _also_ be some protection in zsmalloc.c against
MAX_POSSIBLE_PHYSMEM_BITS being too large:

#define OBJ_INDEX_BITS  (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_TAG_BITS 1
#define _PFN_BITS               (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)

which means there's an implicit limitation on _PFN_BITS being less than
BITS_PER_LONG - OBJ_TAG_BITS (where, if it's equal to this, and hence
OBJ_INDEX_BITS will be zero.)  This imples that MAX_POSSIBLE_PHYSMEM_BITS
must be smaller than BITS_PER_LONG + PAGE_SHIFT - OBJ_TAG_BITS, or
43 bits on a 32 bit system.  If you want to guarantee a minimum number
of objects, then that limitation needs to be reduced further.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
diff mbox series

Patch

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9da65552e7ca..9c3ff8c2ccbc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -119,6 +119,15 @@ 
 #define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
+/*
+ * When using PAE, the obj encoding might overflow if arch does
+ * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
+ * This checks for a future bad page access, when de-coding obj.
+ */
+#define OBJ_OVERFLOW(_pfn) \
+	(((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
+	((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
+
 #define FULLNESS_BITS	2
 #define CLASS_BITS	8
 #define ISOLATED_BITS	3
@@ -871,9 +880,14 @@  static void obj_to_location(unsigned long obj, struct page **page,
  */
 static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
 {
-	unsigned long obj;
+	unsigned long obj, pfn;
+
+	pfn = page_to_pfn(page);
+
+	if (unlikely(OBJ_OVERFLOW(pfn)))
+		BUG();
 
-	obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+	obj = pfn << OBJ_INDEX_BITS;
 	obj |= obj_idx & OBJ_INDEX_MASK;
 	obj <<= OBJ_TAG_BITS;