From patchwork Sat Jun 5 03:01:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 455048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56BC9C4743E for ; Sat, 5 Jun 2021 03:01:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FDC761412 for ; Sat, 5 Jun 2021 03:01:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230481AbhFEDDN (ORCPT ); Fri, 4 Jun 2021 23:03:13 -0400 Received: from mail.kernel.org ([198.145.29.99]:52484 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231132AbhFEDDM (ORCPT ); Fri, 4 Jun 2021 23:03:12 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id DFCF2613F4; Sat, 5 Jun 2021 03:01:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1622862072; bh=5Glz9eV9ywxNKU5L2Ppl13Cqd/TaZRXGBw4LPAe1YCI=; h=Date:From:To:Subject:In-Reply-To:From; b=s5Nuh6uRQcJ7kkWtdssPcHdR3r7gXqNv0JNFp5Hfb6om5JWXKU7jW06oRACTIyLTx 5Nqpni+pFZy77evRCNskaPmnD5kF6kWZmfSgCL3jfmmzNmmK7tzt1p6xUpekUDPt2y Es1oCbfbIV6NueInMWaOsuaVzOIj3J8RJuNu9Kws= Date: Fri, 04 Jun 2021 20:01:11 -0700 From: Andrew Morton To: akpm@linux-foundation.org, David.Laight@ACULAB.COM, dvyukov@google.com, elver@google.com, glider@google.com, hdanton@sina.com, linux-mm@kvack.org, mgorman@suse.de, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 02/13] kfence: use TASK_IDLE when awaiting allocation Message-ID: <20210605030111.zE60Ybm3T%akpm@linux-foundation.org> In-Reply-To: <20210604200040.d8d0406caf195525620c0f3d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Marco Elver Subject: kfence: use TASK_IDLE when awaiting allocation Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an allocation counts towards load. However, for KFENCE, this does not make any sense, since there is no busy work we're awaiting. Instead, use TASK_IDLE via wait_event_idle() to not count towards load. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565 Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com Fixes: 407f1d8c1b5f ("kfence: await for allocation using wait_event") Signed-off-by: Marco Elver Cc: Mel Gorman Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: David Laight Cc: Hillf Danton Cc: [5.12+] Signed-off-by: Andrew Morton --- mm/kfence/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/kfence/core.c~kfence-use-task_idle-when-awaiting-allocation +++ a/mm/kfence/core.c @@ -627,10 +627,10 @@ static void toggle_allocation_gate(struc * During low activity with no allocations we might wait a * while; let's avoid the hung task warning. */ - wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), - sysctl_hung_task_timeout_secs * HZ / 2); + wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), + sysctl_hung_task_timeout_secs * HZ / 2); } else { - wait_event(allocation_wait, atomic_read(&kfence_allocation_gate)); + wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate)); } /* Disable static key and reset timer. */ From patchwork Sat Jun 5 03:01:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 454686 Delivered-To: patch@linaro.org Received: by 2002:a02:735a:0:0:0:0:0 with SMTP id a26csp942460jae; Fri, 4 Jun 2021 20:02:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwUCOsXDA6ArSHKsQgdnrpelzTCs/CB8oLoRKO8z75M3BrxHj+xPoSlSU5XXOY8bEtSIYK2 X-Received: by 2002:a05:6402:268f:: with SMTP id w15mr7839296edd.321.1622862167867; Fri, 04 Jun 2021 20:02:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622862167; cv=none; d=google.com; s=arc-20160816; b=X1ybBTxcaNdUEEb1FVkcLiHEKiMnXr2EFJSTE9YFf25Eaq4kj6wXZ1pGWNxWt5nXBC DtEg+6+na5smethG0QOvR1ia9nspeabCeAUKClB4fN9OoMutxmtjkfwMEY1NhhHtXGEf Fx6Uloed/MCq1yxugaC84Rop0i7aa8Qs8Iy5mN8dQNP2quhgTYT8WLYRdOtgzBgh0/kM PY7l9ZLeDEPRLq7rW3OsRyQwlvbSbHcvcJey+aRh+T/V7T6hSqCvIFcvGxOhXr7hJuUD KWW9kDE+BR0Rz+pu9/KqjNmRKWmT9VnzOWsoV5cNqOxai4VqU2OejNoY3MSIoqvv17LX yegA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:message-id:subject:to :from:date:dkim-signature; bh=cUXXj5UfxdYfUaIahD5K2wjUbGYlqjgEwd13VhiMrw8=; b=w7HUxks/S0vGVYTl4CchPL/2EXpxaXev6Z0MK9emNOY8Pu8k3IXtSDItdE5nFCjoN5 zzZggbm0nOW29NsqxBHarruoVQuou3lpUzjeTqGBXdpfLKGmUtsCsjrSoLQmYvou22kZ Z6yiLqrqX5FJ2aO/MeSJzsHEK/sv68MsXmYXgfhYPmCdudLH8oD4j/L7cUzuDPGegrnv KW/mezpkjX8LndnhCMzHNcJhYjTu8ZU9N1vuxKyEPTVdBZMTQ/iJTBgZLwtOKHp2uBqF wTNe7U8gBya7FkxoxDSjGpGNYIdjKJ4h0DFG5eAJB5lNv42TOtojJohbubQ0EgPHRAwT Jwyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=c4ejkzgH; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h93si7079072edd.605.2021.06.04.20.02.47; Fri, 04 Jun 2021 20:02:47 -0700 (PDT) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=c4ejkzgH; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231325AbhFEDDQ (ORCPT + 12 others); Fri, 4 Jun 2021 23:03:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:52506 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231132AbhFEDDQ (ORCPT ); Fri, 4 Jun 2021 23:03:16 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 54C58613FE; Sat, 5 Jun 2021 03:01:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1622862075; bh=xN24bw5gXooEM2IRZG521MnKs5Y6w5rTkRSYc0gKwr8=; h=Date:From:To:Subject:In-Reply-To:From; b=c4ejkzgHkjOm8HkpHQssd90DIQFRsn4O2WO8hjj856f4VYaHuj9uypMh1GNvg1++O xniKHBRqoodTlgRNtDs3dTsO4A+Vejsf7l3VxcWS6+Pgg72jX82SqaPi8MmB5/wUHE /ytxKu1Jc2zatvZ0eMJrLF55+Yc23tF/pa7rmwZg= Date: Fri, 04 Jun 2021 20:01:14 -0700 From: Andrew Morton To: akpm@linux-foundation.org, christian.brauner@ubuntu.com, christian@brauner.io, clg@fr.ibm.com, ebiederm@xmission.com, keescook@chromium.org, linux-mm@kvack.org, mark.rutland@arm.com, mm-commits@vger.kernel.org, paulus@samba.org, schwidefsky@de.ibm.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 03/13] pid: take a reference when initializing `cad_pid` Message-ID: <20210605030114.E2boaQwOv%akpm@linux-foundation.org> In-Reply-To: <20210604200040.d8d0406caf195525620c0f3d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mark Rutland Subject: pid: take a reference when initializing `cad_pid` During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals). This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") ... from the pre-KASAN stone age of v2.6.19. Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`. Full KASAN splat below. ================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273 CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x4a8 arch/arm64/kernel/stacktrace.c:105 show_stack+0x34/0x48 arch/arm64/kernel/stacktrace.c:191 __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x1d4/0x2a0 lib/dump_stack.c:120 print_address_description.constprop.11+0x60/0x3a8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report+0x1e8/0x200 mm/kasan/report.c:416 __asan_report_load4_noabort+0x30/0x48 mm/kasan/report_generic.c:308 ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc Allocated by task 0: kasan_save_stack+0x28/0x58 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] __kasan_slab_alloc+0x88/0xa8 mm/kasan/common.c:460 kasan_slab_alloc include/linux/kasan.h:223 [inline] slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0 Freed by task 270: kasan_save_stack+0x28/0x58 mm/kasan/common.c:38 kasan_set_track+0x28/0x40 mm/kasan/common.c:46 kasan_set_free_info+0x28/0x50 mm/kasan/generic.c:357 ____kasan_slab_free mm/kasan/common.c:360 [inline] ____kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xf4/0x148 mm/kasan/common.c:367 kasan_slab_free include/linux/kasan.h:199 [inline] slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701 The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: Mark Rutland Acked-by: Christian Brauner Cc: Cedric Le Goater Cc: Christian Brauner Cc: Eric W. Biederman Cc: Kees Cook Cc: Paul Mackerras Cc: Signed-off-by: Andrew Morton --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/init/main.c~pid-take-a-reference-when-initializing-cad_pid +++ a/init/main.c @@ -1537,7 +1537,7 @@ static noinline void __init kernel_init_ */ set_mems_allowed(node_states[N_MEMORY]); - cad_pid = task_pid(current); + cad_pid = get_pid(task_pid(current)); smp_prepare_cpus(setup_max_cpus); From patchwork Sat Jun 5 03:01:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 455046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F084C4743C for ; Sat, 5 Jun 2021 03:01:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 369E06141B for ; Sat, 5 Jun 2021 03:01:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231671AbhFEDDb (ORCPT ); Fri, 4 Jun 2021 23:03:31 -0400 Received: from mail.kernel.org ([198.145.29.99]:52912 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231645AbhFEDDa (ORCPT ); Fri, 4 Jun 2021 23:03:30 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C5F7D61359; Sat, 5 Jun 2021 03:01:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1622862103; bh=NdIejY8GZLloHA2zoJuDtW2olAnJ7qko9YZvTePC8cE=; h=Date:From:To:Subject:In-Reply-To:From; b=ulOZomKMv4KEdc82gI9AFyvhS8TXhiScTyP7lr3UjI8yjrcKDEpzcVQ3uFlnh9yNf z9zNtiVBWmHFyAlvNpuSD1T79lMp9+0J/taUFsg7Yb1s6414/4h1IzHp4nhLad+1UH Ac9Ped1uet0ZTF1+ND1X2MY+pRpGTdqDhl8ONm8w= Date: Fri, 04 Jun 2021 20:01:42 -0700 From: Andrew Morton To: akpm@linux-foundation.org, gechangwei@live.cn, ghe@suse.com, jack@suse.cz, jlbec@evilplan.org, joseph.qi@linux.alibaba.com, junxiao.bi@oracle.com, linux-mm@kvack.org, mark@fasheh.com, mm-commits@vger.kernel.org, piaojun@huawei.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 12/13] ocfs2: fix data corruption by fallocate Message-ID: <20210605030142.pXgHl9E4K%akpm@linux-foundation.org> In-Reply-To: <20210604200040.d8d0406caf195525620c0f3d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Junxiao Bi Subject: ocfs2: fix data corruption by fallocate When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/ Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Reviewed-by: Joseph Qi Cc: Jan Kara Cc: Mark Fasheh Cc: Joel Becker Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton --- fs/ocfs2/file.c | 55 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 5 deletions(-) --- a/fs/ocfs2/file.c~ocfs2-fix-data-corruption-by-fallocate +++ a/fs/ocfs2/file.c @@ -1856,6 +1856,45 @@ out: } /* + * zero out partial blocks of one cluster. + * + * start: file offset where zero starts, will be made upper block aligned. + * len: it will be trimmed to the end of current cluster if "start + len" + * is bigger than it. + */ +static int ocfs2_zeroout_partial_cluster(struct inode *inode, + u64 start, u64 len) +{ + int ret; + u64 start_block, end_block, nr_blocks; + u64 p_block, offset; + u32 cluster, p_cluster, nr_clusters; + struct super_block *sb = inode->i_sb; + u64 end = ocfs2_align_bytes_to_clusters(sb, start); + + if (start + len < end) + end = start + len; + + start_block = ocfs2_blocks_for_bytes(sb, start); + end_block = ocfs2_blocks_for_bytes(sb, end); + nr_blocks = end_block - start_block; + if (!nr_blocks) + return 0; + + cluster = ocfs2_bytes_to_clusters(sb, start); + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, + &nr_clusters, NULL); + if (ret) + return ret; + if (!p_cluster) + return 0; + + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); +} + +/* * Parts of this function taken from xfs_change_file_space() */ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(str { int ret; s64 llen; - loff_t size; + loff_t size, orig_isize; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct buffer_head *di_bh = NULL; handle_t *handle; @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; } + orig_isize = i_size_read(inode); switch (sr->l_whence) { case 0: /*SEEK_SET*/ break; @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(str sr->l_start += f_pos; break; case 2: /*SEEK_END*/ - sr->l_start += i_size_read(inode); + sr->l_start += orig_isize; break; default: ret = -EINVAL; @@ -1957,6 +1997,14 @@ static int __ocfs2_change_file_space(str default: ret = -EINVAL; } + + /* zeroout eof blocks in the cluster. */ + if (!ret && change_size && orig_isize < size) { + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, + size - orig_isize); + if (!ret) + i_size_write(inode, size); + } up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); @@ -1973,9 +2021,6 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; } - if (change_size && i_size_read(inode) < size) - i_size_write(inode, size); - inode->i_ctime = inode->i_mtime = current_time(inode); ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); if (ret < 0)