diff mbox

mm: Fix a NULL dereference crash while accessing bdev->bd_disk

Message ID 20161130095104.GB20030@quack2.suse.cz
State New
Headers show

Commit Message

Jan Kara Nov. 30, 2016, 9:51 a.m. UTC
On Tue 29-11-16 09:58:31, Wei Fang wrote:
> Hi, Jan,

> 

> On 2016/11/28 18:07, Jan Kara wrote:

> > Good catch but I don't like sprinkling checks like this into the writeback

> > code and furthermore we don't want to call into writeback code when block

> > device is in the process of being destroyed which is what would happen with

> > your patch. That is a bug waiting to happen...

> 

> Agreed. Need another way to fix this problem. I looked through the

> writeback cgroup code in __filemap_fdatawrite_range(), found if we

> turn on CONFIG_CGROUP_WRITEBACK, a new crash will happen.


OK, can you test with attached patch please? Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Comments

Wei Fang Dec. 1, 2016, 2:30 a.m. UTC | #1
Hi, Jan,

On 2016/11/30 17:51, Jan Kara wrote:
> On Tue 29-11-16 09:58:31, Wei Fang wrote:

>> Hi, Jan,

>>

>> On 2016/11/28 18:07, Jan Kara wrote:

>>> Good catch but I don't like sprinkling checks like this into the writeback

>>> code and furthermore we don't want to call into writeback code when block

>>> device is in the process of being destroyed which is what would happen with

>>> your patch. That is a bug waiting to happen...

>>

>> Agreed. Need another way to fix this problem. I looked through the

>> writeback cgroup code in __filemap_fdatawrite_range(), found if we

>> turn on CONFIG_CGROUP_WRITEBACK, a new crash will happen.

> 

> OK, can you test with attached patch please? Thanks!


I've tested this patch with linux-next about 2 hours, and all goes well.
Without this patch, kernel crashes in minutes.

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Dec. 1, 2016, 8:18 a.m. UTC | #2
On Thu 01-12-16 10:30:05, Wei Fang wrote:
> On 2016/11/30 17:51, Jan Kara wrote:

> > On Tue 29-11-16 09:58:31, Wei Fang wrote:

> >> Hi, Jan,

> >>

> >> On 2016/11/28 18:07, Jan Kara wrote:

> >>> Good catch but I don't like sprinkling checks like this into the writeback

> >>> code and furthermore we don't want to call into writeback code when block

> >>> device is in the process of being destroyed which is what would happen with

> >>> your patch. That is a bug waiting to happen...

> >>

> >> Agreed. Need another way to fix this problem. I looked through the

> >> writeback cgroup code in __filemap_fdatawrite_range(), found if we

> >> turn on CONFIG_CGROUP_WRITEBACK, a new crash will happen.

> > 

> > OK, can you test with attached patch please? Thanks!

> 

> I've tested this patch with linux-next about 2 hours, and all goes well.

> Without this patch, kernel crashes in minutes.


Good. Thanks for testing! I'll send the patch for inclusion.

								hONZA
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From ef10e4f52d2d05982fbeba09e48a4253b5fd1119 Mon Sep 17 00:00:00 2001
From: Rabin Vincent <rabinv@axis.com>
Date: Thu, 10 Mar 2016 13:26:03 +0100
Subject: [PATCH] block: protect iterate_bdevs() against concurrent close

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
  [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
  [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
  [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
  [<ffffffff811b2763>] sys_sync+0x63/0x90
  [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
 RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
  RSP <ffff880009f5fe68>
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Cc: stable@vger.kernel.org
Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
Reported-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 05b553368bb4..899fa8ccc347 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1950,6 +1950,7 @@  void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 	spin_lock(&blockdev_superblock->s_inode_list_lock);
 	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
 		struct address_space *mapping = inode->i_mapping;
+		struct block_device *bdev;
 
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
@@ -1970,8 +1971,12 @@  void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 		 */
 		iput(old_inode);
 		old_inode = inode;
+		bdev = I_BDEV(inode);
 
-		func(I_BDEV(inode), arg);
+		mutex_lock(&bdev->bd_mutex);
+		if (bdev->bd_openers)
+			func(bdev, arg);
+		mutex_unlock(&bdev->bd_mutex);
 
 		spin_lock(&blockdev_superblock->s_inode_list_lock);
 	}
-- 
2.6.6