diff mbox series

ceph: fail the open_by_handle_at() if the dentry is being unlinked

Message ID 20220804080624.14768-1-xiubli@redhat.com
State New
Headers show
Series ceph: fail the open_by_handle_at() if the dentry is being unlinked | expand

Commit Message

Xiubo Li Aug. 4, 2022, 8:06 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

When unlinking a file the kclient will send a unlink request to MDS
by holding the dentry reference, and then the MDS will return 2 replies,
which are unsafe reply and a deferred safe reply.

After the unsafe reply received the kernel will return and succeed
the unlink request to user space apps.

Only when the safe reply received the dentry's reference will be
released. Or the dentry will only be unhashed from dcache. But when
the open_by_handle_at() begins to open the unlinked files it will
succeed.

URL: https://tracker.ceph.com/issues/56524
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/export.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Jeff Layton Aug. 4, 2022, 4:25 p.m. UTC | #1
On Thu, 2022-08-04 at 16:06 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> When unlinking a file the kclient will send a unlink request to MDS
> by holding the dentry reference, and then the MDS will return 2 replies,
> which are unsafe reply and a deferred safe reply.
> 
> After the unsafe reply received the kernel will return and succeed
> the unlink request to user space apps.
> 
> Only when the safe reply received the dentry's reference will be
> released. Or the dentry will only be unhashed from dcache. But when
> the open_by_handle_at() begins to open the unlinked files it will
> succeed.
> 
> URL: https://tracker.ceph.com/issues/56524
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/export.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> index 0ebf2bd93055..7d2ae977b8c9 100644
> --- a/fs/ceph/export.c
> +++ b/fs/ceph/export.c
> @@ -182,6 +182,7 @@ struct inode *ceph_lookup_inode(struct super_block *sb, u64 ino)
>  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
>  {
>  	struct inode *inode = __lookup_inode(sb, ino);
> +	struct dentry *dentry;
>  	int err;
>  
>  	if (IS_ERR(inode))
> @@ -197,7 +198,15 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
>  		iput(inode);
>  		return ERR_PTR(-ESTALE);
>  	}
> -	return d_obtain_alias(inode);
> +
> +	/* -ESTALE if the dentry is unhashed, which should being released */
> +	dentry = d_obtain_alias(inode);
> +	if (d_unhashed(dentry)) {
> +		dput(dentry);
> +		return ERR_PTR(-ESTALE);
> +	}
> +
> +	return dentry;
>  }
>  
>  static struct dentry *__snapfh_to_dentry(struct super_block *sb,

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Sebastian Hasler Sept. 6, 2023, 4:58 p.m. UTC | #2
While reviewing the implementation of __fh_to_dentry (in the CephFS 
client), I noticed a possible race condition.

Linux has a syscall linkat(2) which allows, given an open file 
descriptor, to create a link for the file. So an inode that is unlinked 
can become linked.

Now the problem: The line ((inode->i_nlink == 0) && 
!__ceph_is_file_opened(ci)) performs two checks. If, in between those 
checks, the file goes from the unlinked and open state to the linked and 
closed state, then we return -ESTALE even though the inode is linked. I 
don't think this is the intended behavior. I guess this (going from 
unlinked and open to linked and closed) can happen when a concurrent 
process calls linkat() and then close().
Xiubo Li Sept. 7, 2023, 2:05 a.m. UTC | #3
On 9/7/23 00:58, Sebastian Hasler wrote:
> While reviewing the implementation of __fh_to_dentry (in the CephFS 
> client), I noticed a possible race condition.
>
> Linux has a syscall linkat(2) which allows, given an open file 
> descriptor, to create a link for the file. So an inode that is 
> unlinked can become linked.
>
> Now the problem: The line ((inode->i_nlink == 0) && 
> !__ceph_is_file_opened(ci)) performs two checks. If, in between those 
> checks, the file goes from the unlinked and open state to the linked 
> and closed state, then we return -ESTALE even though the inode is 
> linked. I don't think this is the intended behavior. I guess this 
> (going from unlinked and open to linked and closed) can happen when a 
> concurrent process calls linkat() and then close().
>
Hi Sebastian,

Thanks for your reporting.

int linkat(int olddirfd, const char *oldpath, int newdirfd, const char 
*newpath, int flags);

BTW, for "an open file descripter", do you mean "olddirfd" ? Because 
"olddirfd" is a dir's open file descripter, how is that possible it can 
become linked again ?

Correct me if I'm misreading it.

Thanks

- Xiubo
Sebastian Hasler Sept. 7, 2023, 11:57 a.m. UTC | #4
On 07/09/2023 04:05, Xiubo Li wrote:
>
> int linkat(int olddirfd, const char *oldpath, int newdirfd, const char 
> *newpath, int flags);
>
> BTW, for "an open file descripter", do you mean "olddirfd" ? Because 
> "olddirfd" is a dir's open file descripter, how is that possible it 
> can become linked again ?

Yes, I mean olddirfd, and the manual says: "If oldpath is an empty 
string, create a link to the file referenced by olddirfd".
Sebastian Hasler Sept. 11, 2023, 10:07 a.m. UTC | #5
On 11/09/2023 05:21, Xiubo Li wrote:

> "This will generally not work if the file  has  a  link  count of 
> zero  (files  created  with  O_TMPFILE  and without O_EXCL are an 
> exception)."
In this sentence, "generally" probably means "typically", so it depends 
on the specific file system's behavior. I don't know the CephFS behavior 
in this case. However, at least files created with O_TMPFILE are an 
exception. Does CephFS support creating files with O_TMPFILE? Such files 
can (and often will) go from the unlinked and open state to the linked 
and closed state.
Xiubo Li Sept. 12, 2023, 12:51 p.m. UTC | #6
On 9/11/23 18:07, Sebastian Hasler wrote:
> On 11/09/2023 05:21, Xiubo Li wrote:
>
>> "This will generally not work if the file has  a  link  count of 
>> zero  (files  created  with  O_TMPFILE and without O_EXCL are an 
>> exception)."
> In this sentence, "generally" probably means "typically", so it 
> depends on the specific file system's behavior. I don't know the 
> CephFS behavior in this case. However, at least files created with 
> O_TMPFILE are an exception. Does CephFS support creating files with 
> O_TMPFILE? Such files can (and often will) go from the unlinked and 
> open state to the linked and closed state.
>
Hmm, yeah. I didn't see anywhere is disallowing to create with 
O_TMPFILE. Let me have a look at this case carefully later.

Thanks very much for your reporting Sebastian.

- Xiubo
diff mbox series

Patch

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index 0ebf2bd93055..7d2ae977b8c9 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -182,6 +182,7 @@  struct inode *ceph_lookup_inode(struct super_block *sb, u64 ino)
 static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
 {
 	struct inode *inode = __lookup_inode(sb, ino);
+	struct dentry *dentry;
 	int err;
 
 	if (IS_ERR(inode))
@@ -197,7 +198,15 @@  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
 		iput(inode);
 		return ERR_PTR(-ESTALE);
 	}
-	return d_obtain_alias(inode);
+
+	/* -ESTALE if the dentry is unhashed, which should being released */
+	dentry = d_obtain_alias(inode);
+	if (d_unhashed(dentry)) {
+		dput(dentry);
+		return ERR_PTR(-ESTALE);
+	}
+
+	return dentry;
 }
 
 static struct dentry *__snapfh_to_dentry(struct super_block *sb,