diff mbox series

radix-tree: avoid NULL dereference

Message ID 20180706134144.48446-1-mark.rutland@arm.com
State New
Headers show
Series radix-tree: avoid NULL dereference | expand

Commit Message

Mark Rutland July 6, 2018, 1:41 p.m. UTC
When idr_alloc() is called for the first time on an IDR (which has no
nodes in its radix tree), we end up with calculate_count() calling
get_slot_offset() with a NULL node, leading to a NULL pointer
dereference caught by UBSAN:

-- 
2.11.0

Comments

Matthew Wilcox (Oracle) July 6, 2018, 2:25 p.m. UTC | #1
On Fri, Jul 06, 2018 at 02:41:44PM +0100, Mark Rutland wrote:
> When idr_alloc() is called for the first time on an IDR (which has no

> nodes in its radix tree), we end up with calculate_count() calling

> get_slot_offset() with a NULL node, leading to a NULL pointer

> dereference caught by UBSAN:


Thanks!

> The result of the load is passed into node_tag_get(), which ignores the

> value when node is NULL. Typically, the compiler inlines both

> get_slot_offset() and node_tag_get() into calculate_count(), optimizing

> away the NULL-pointer dereference, and hence this doesn't typically

> result in a boot failure.

> 

> We can't rely on the compiler always doing this, and must avoid

> dereferencing fields from node when it is potentially NULL.

> 

> To do so, this patch folds the generation of offset into tag_get(), such

> that this only happens when node is not NULL. Callers are updated to

> pass the relevant slot, rather than the offset derived from it.


I did think about this ... honest!  My reasoning clearly glitched at some
point.  I think this is the right way to fix the problem.

Acked-by: Matthew Wilcox <willy@infradead.org>
Mark Rutland July 6, 2018, 2:36 p.m. UTC | #2
On Fri, Jul 06, 2018 at 07:25:40AM -0700, Matthew Wilcox wrote:
> On Fri, Jul 06, 2018 at 02:41:44PM +0100, Mark Rutland wrote:

> > When idr_alloc() is called for the first time on an IDR (which has no

> > nodes in its radix tree), we end up with calculate_count() calling

> > get_slot_offset() with a NULL node, leading to a NULL pointer

> > dereference caught by UBSAN:

> 

> Thanks!

> 

> > The result of the load is passed into node_tag_get(), which ignores the

> > value when node is NULL. Typically, the compiler inlines both

> > get_slot_offset() and node_tag_get() into calculate_count(), optimizing

> > away the NULL-pointer dereference, and hence this doesn't typically

> > result in a boot failure.

> > 

> > We can't rely on the compiler always doing this, and must avoid

> > dereferencing fields from node when it is potentially NULL.

> > 

> > To do so, this patch folds the generation of offset into tag_get(), such

> > that this only happens when node is not NULL. Callers are updated to

> > pass the relevant slot, rather than the offset derived from it.

> 

> I did think about this ... honest!  My reasoning clearly glitched at some

> point.  I think this is the right way to fix the problem.

> 

> Acked-by: Matthew Wilcox <willy@infradead.org>


Cheers!

I assume that Andrew will pick this up, if he's also happy with it.

Thanks,
Mark.
Valdis Kl=?utf-8?Q?=c4=93?=tnieks July 6, 2018, 2:51 p.m. UTC | #3
On Fri, 06 Jul 2018 14:41:44 +0100, Mark Rutland said:

> I beleive this is what Valdis hit [1] back in March. I spotted this while

> booting an arm64 machine.


Yes, the stack trace is the same.  The odd part is that I was consistently
seeing it until next-20180626, but it evaporated in sometime between that and
next-20180703. I suspect it has to do with the fact that -0626 was compiled
with gcc 8.1.1-3.fc29, but -0703 with 8.1.1.-4.fc29. As such, I probably won't
be able to test the patch without downgrading gcc....
Mark Rutland July 6, 2018, 3:36 p.m. UTC | #4
On Fri, Jul 06, 2018 at 03:36:04PM +0100, Mark Rutland wrote:
> On Fri, Jul 06, 2018 at 07:25:40AM -0700, Matthew Wilcox wrote:

> > On Fri, Jul 06, 2018 at 02:41:44PM +0100, Mark Rutland wrote:

> > > When idr_alloc() is called for the first time on an IDR (which has no

> > > nodes in its radix tree), we end up with calculate_count() calling

> > > get_slot_offset() with a NULL node, leading to a NULL pointer

> > > dereference caught by UBSAN:

> > 

> > Thanks!

> > 

> > > The result of the load is passed into node_tag_get(), which ignores the

> > > value when node is NULL. Typically, the compiler inlines both

> > > get_slot_offset() and node_tag_get() into calculate_count(), optimizing

> > > away the NULL-pointer dereference, and hence this doesn't typically

> > > result in a boot failure.

> > > 

> > > We can't rely on the compiler always doing this, and must avoid

> > > dereferencing fields from node when it is potentially NULL.

> > > 

> > > To do so, this patch folds the generation of offset into tag_get(), such

> > > that this only happens when node is not NULL. Callers are updated to

> > > pass the relevant slot, rather than the offset derived from it.

> > 

> > I did think about this ... honest!  My reasoning clearly glitched at some

> > point.  I think this is the right way to fix the problem.

> > 

> > Acked-by: Matthew Wilcox <willy@infradead.org>

> 

> Cheers!

> 

> I assume that Andrew will pick this up, if he's also happy with it.


I've just started fuzzing, and found this also applies with
node_tag_set(). I'll spin a v2 with that fixed up, too.

Thanks,
Mark.
Andrew Morton July 6, 2018, 9:45 p.m. UTC | #5
On Fri, 6 Jul 2018 16:36:41 +0100 Mark Rutland <mark.rutland@arm.com> wrote:

> > > 

> > > Acked-by: Matthew Wilcox <willy@infradead.org>

> > 

> > Cheers!

> > 

> > I assume that Andrew will pick this up, if he's also happy with it.

> 

> I've just started fuzzing, and found this also applies with

> node_tag_set(). I'll spin a v2 with that fixed up, too.


Thanks.  Please also give some thought to whether the fix should be
backported and if so, which patch(es) it Fixes:.
Matthew Wilcox (Oracle) July 7, 2018, 12:14 a.m. UTC | #6
On Fri, Jul 06, 2018 at 02:45:34PM -0700, Andrew Morton wrote:
> On Fri, 6 Jul 2018 16:36:41 +0100 Mark Rutland <mark.rutland@arm.com> wrote:

> 

> > > > 

> > > > Acked-by: Matthew Wilcox <willy@infradead.org>

> > > 

> > > Cheers!

> > > 

> > > I assume that Andrew will pick this up, if he's also happy with it.

> > 

> > I've just started fuzzing, and found this also applies with

> > node_tag_set(). I'll spin a v2 with that fixed up, too.

> 

> Thanks.  Please also give some thought to whether the fix should be

> backported and if so, which patch(es) it Fixes:.


I would say this goes all the way back to
0a835c4f090 ("Reimplement IDR and IDA using the radix tree")

I don't know if it's worth backporting though.  If the compiler does
the normal optimisation, nobody will hit it.  If it doesn't, the first
use of an IDR will trip the bug.  So it's not like a user-exploitable bug.
diff mbox series

Patch

================================================================================
UBSAN: Undefined behaviour in lib/radix-tree.c:123:14
member access within null pointer of type 'const struct radix_tree_node'
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc3+ #14
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x458
 show_stack+0x20/0x30
 dump_stack+0x18c/0x248
 ubsan_epilogue+0x18/0x94
 handle_null_ptr_deref+0x1d4/0x228
 __ubsan_handle_type_mismatch_v1+0x188/0x1e0
 __radix_tree_replace+0x29c/0x2a0
 radix_tree_iter_replace+0x58/0x88
 idr_alloc_u32+0x240/0x340
 idr_alloc+0xe8/0x188
 worker_pool_assign_id+0x74/0x128
 workqueue_init_early+0x588/0xae4
 start_kernel+0x54c/0xb9c
================================================================================

The result of the load is passed into node_tag_get(), which ignores the
value when node is NULL. Typically, the compiler inlines both
get_slot_offset() and node_tag_get() into calculate_count(), optimizing
away the NULL-pointer dereference, and hence this doesn't typically
result in a boot failure.

We can't rely on the compiler always doing this, and must avoid
dereferencing fields from node when it is potentially NULL.

To do so, this patch folds the generation of offset into tag_get(), such
that this only happens when node is not NULL. Callers are updated to
pass the relevant slot, rather than the offset derived from it.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: valdis.kletnieks@vt.edu
---
 lib/radix-tree.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

I beleive this is what Valdis hit [1] back in March. I spotted this while
booting an arm64 machine.

Mark.

[1] https://lkml.kernel.org/r/22378.1520883323@turing-police.cc.vt.edu

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index a9e41aed6de4..4c6f409582cb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1137,10 +1137,10 @@  static void replace_slot(void __rcu **slot, void *item,
 
 static bool node_tag_get(const struct radix_tree_root *root,
 				const struct radix_tree_node *node,
-				unsigned int tag, unsigned int offset)
+				unsigned int tag, void __rcu **slot)
 {
 	if (node)
-		return tag_get(node, tag, offset);
+		return tag_get(node, tag, get_slot_offset(node, slot));
 	return root_tag_get(root, tag);
 }
 
@@ -1156,8 +1156,7 @@  static int calculate_count(struct radix_tree_root *root,
 				void *item, void *old)
 {
 	if (is_idr(root)) {
-		unsigned offset = get_slot_offset(node, slot);
-		bool free = node_tag_get(root, node, IDR_FREE, offset);
+		bool free = node_tag_get(root, node, IDR_FREE, slot);
 		if (!free)
 			return 0;
 		if (!old)
@@ -2041,7 +2040,7 @@  void *radix_tree_delete_item(struct radix_tree_root *root,
 	if (!slot)
 		return NULL;
 	if (!entry && (!is_idr(root) || node_tag_get(root, node, IDR_FREE,
-						get_slot_offset(node, slot))))
+						     slot)))
 		return NULL;
 
 	if (item && entry != item)