diff mbox series

[5.4,5.10] btrfs: tree-checker: do not error out if extent ref hash doesn't match

Message ID 20210604155304.24315-1-dsterba@suse.com
State New
Headers show
Series [5.4,5.10] btrfs: tree-checker: do not error out if extent ref hash doesn't match | expand

Commit Message

David Sterba June 4, 2021, 3:53 p.m. UTC
From: Josef Bacik <josef@toxicpanda.com>

commit 1119a72e223f3073a604f8fccb3a470ccd8a4416 upstream.

The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
---

Generated on 5.4 where it applies cleanly and on 5.10 with line offset
but otherwise clean.

 fs/btrfs/tree-checker.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

Comments

Sasha Levin June 4, 2021, 4:35 p.m. UTC | #1
On Fri, Jun 04, 2021 at 05:53:04PM +0200, David Sterba wrote:
>From: Josef Bacik <josef@toxicpanda.com>
>
>commit 1119a72e223f3073a604f8fccb3a470ccd8a4416 upstream.
>
>The tree checker checks the extent ref hash at read and write time to
>make sure we do not corrupt the file system.  Generally extent
>references go inline, but if we have enough of them we need to make an
>item, which looks like
>
>key.objectid	= <bytenr>
>key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
>key.offset	= hash(tree, owner, offset)
>
>However if key.offset collide with an unrelated extent reference we'll
>simply key.offset++ until we get something that doesn't collide.
>Obviously this doesn't match at tree checker time, and thus we error
>while writing out the transaction.  This is relatively easy to
>reproduce, simply do something like the following
>
>  xfs_io -f -c "pwrite 0 1M" file
>  offset=2
>
>  for i in {0..10000}
>  do
>	  xfs_io -c "reflink file 0 ${offset}M 1M" file
>	  offset=$(( offset + 2 ))
>  done
>
>  xfs_io -c "reflink file 0 17999258914816 1M" file
>  xfs_io -c "reflink file 0 35998517829632 1M" file
>  xfs_io -c "reflink file 0 53752752058368 1M" file
>
>  btrfs filesystem sync
>
>And the sync will error out because we'll abort the transaction.  The
>magic values above are used because they generate hash collisions with
>the first file in the main subvol.
>
>The fix for this is to remove the hash value check from tree checker, as
>we have no idea which offset ours should belong to.
>
>Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
>Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
>CC: stable@vger.kernel.org # 5.4+
>Reviewed-by: Filipe Manana <fdmanana@suse.com>
>Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>Reviewed-by: David Sterba <dsterba@suse.com>
>[ add comment]
>Signed-off-by: David Sterba <dsterba@suse.com>
>---
>
>Generated on 5.4 where it applies cleanly and on 5.10 with line offset
>but otherwise clean.

Queued up, thanks!
diff mbox series

Patch

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 7d06842a3d74..368c43c6cbd0 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1285,22 +1285,14 @@  static int check_extent_data_ref(struct extent_buffer *leaf,
 		return -EUCLEAN;
 	}
 	for (; ptr < end; ptr += sizeof(*dref)) {
-		u64 root_objectid;
-		u64 owner;
 		u64 offset;
-		u64 hash;
 
+		/*
+		 * We cannot check the extent_data_ref hash due to possible
+		 * overflow from the leaf due to hash collisions.
+		 */
 		dref = (struct btrfs_extent_data_ref *)ptr;
-		root_objectid = btrfs_extent_data_ref_root(leaf, dref);
-		owner = btrfs_extent_data_ref_objectid(leaf, dref);
 		offset = btrfs_extent_data_ref_offset(leaf, dref);
-		hash = hash_extent_data_ref(root_objectid, owner, offset);
-		if (hash != key->offset) {
-			extent_err(leaf, slot,
-	"invalid extent data ref hash, item has 0x%016llx key has 0x%016llx",
-				   hash, key->offset);
-			return -EUCLEAN;
-		}
 		if (!IS_ALIGNED(offset, leaf->fs_info->sectorsize)) {
 			extent_err(leaf, slot,
 	"invalid extent data backref offset, have %llu expect aligned to %u",