Message ID | 20230227032813.337906-26-xiubli@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | ceph+fscrypt: full support | expand |
xiubli@redhat.com writes: > From: Jeff Layton <jlayton@kernel.org> > > If we have a dentry which represents a no-key name, then we need to test > whether the parent directory's encryption key has since been added. Do > that before we test anything else about the dentry. > > Reviewed-by: Xiubo Li <xiubli@redhat.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/dir.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > index d3c2853bb0f1..5ead9f59e693 100644 > --- a/fs/ceph/dir.c > +++ b/fs/ceph/dir.c > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) > struct inode *dir, *inode; > struct ceph_mds_client *mdsc; > > + valid = fscrypt_d_revalidate(dentry, flags); > + if (valid <= 0) > + return valid; > + This patch has confused me in the past, and today I found myself scratching my head again looking at it. So, I've started seeing generic/123 test failing when running it with test_dummy_encryption. I was almost sure that this test used to run fine before, but I couldn't find any evidence (somehow I lost my old testing logs...). Anyway, the test is quite simple: 1. Creates a directory with write permissions for root only 2. Writes into a file in that directory 3. Uses 'su' to try to modify that file as a different user, and gets -EPERM All these steps run fine, and the test should pass. *However*, in the test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' and then -ENOTEMPTY for the directory. Some digging allowed me to figure out that running commands with 'su' will drop caches (I see 'su (874): drop_caches: 2' in the log). And this is how I ended up looking at this patch. fscrypt_d_revalidate() will return '0' if the parent directory does has a key (fscrypt_has_encryption_key()). Can we really say here that the dentry is *not* valid in that case? Or should that '<= 0' be a '< 0'? (But again, this patch has confused me before...) Cheers,
On 08/03/2023 02:53, Luís Henriques wrote: > xiubli@redhat.com writes: > >> From: Jeff Layton <jlayton@kernel.org> >> >> If we have a dentry which represents a no-key name, then we need to test >> whether the parent directory's encryption key has since been added. Do >> that before we test anything else about the dentry. >> >> Reviewed-by: Xiubo Li <xiubli@redhat.com> >> Signed-off-by: Jeff Layton <jlayton@kernel.org> >> --- >> fs/ceph/dir.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >> index d3c2853bb0f1..5ead9f59e693 100644 >> --- a/fs/ceph/dir.c >> +++ b/fs/ceph/dir.c >> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >> struct inode *dir, *inode; >> struct ceph_mds_client *mdsc; >> >> + valid = fscrypt_d_revalidate(dentry, flags); >> + if (valid <= 0) >> + return valid; >> + > This patch has confused me in the past, and today I found myself > scratching my head again looking at it. > > So, I've started seeing generic/123 test failing when running it with > test_dummy_encryption. I was almost sure that this test used to run fine > before, but I couldn't find any evidence (somehow I lost my old testing > logs...). > > Anyway, the test is quite simple: > > 1. Creates a directory with write permissions for root only > 2. Writes into a file in that directory > 3. Uses 'su' to try to modify that file as a different user, and > gets -EPERM > > All these steps run fine, and the test should pass. *However*, in the > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' > and then -ENOTEMPTY for the directory. > > Some digging allowed me to figure out that running commands with 'su' will > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is > how I ended up looking at this patch. fscrypt_d_revalidate() will return > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). > Can we really say here that the dentry is *not* valid in that case? Or > should that '<= 0' be a '< 0'? > > (But again, this patch has confused me before...) Luis, Could you reproduce it with the latest testing branch ? I never seen the generic/123 failure yet. And just now I ran the test for many times locally it worked fine. From the generic/123 test code it will never touch the key while testing, that means the dentries under the test dir will always have the keyed name. And then the 'fscrypt_d_revalidate()' should return 1 always. Only when we remove the key will it trigger evicting the inodes and then when we add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the 'fscrypt_has_encryption_key()'. As I remembered we have one or more fixes about this those days, not sure whether you were hitting those bugs we have already fixed ? Thanks - Xiubo > Cheers,
Xiubo Li <xiubli@redhat.com> writes: > On 08/03/2023 02:53, Luís Henriques wrote: >> xiubli@redhat.com writes: >> >>> From: Jeff Layton <jlayton@kernel.org> >>> >>> If we have a dentry which represents a no-key name, then we need to test >>> whether the parent directory's encryption key has since been added. Do >>> that before we test anything else about the dentry. >>> >>> Reviewed-by: Xiubo Li <xiubli@redhat.com> >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >>> fs/ceph/dir.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >>> index d3c2853bb0f1..5ead9f59e693 100644 >>> --- a/fs/ceph/dir.c >>> +++ b/fs/ceph/dir.c >>> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >>> struct inode *dir, *inode; >>> struct ceph_mds_client *mdsc; >>> + valid = fscrypt_d_revalidate(dentry, flags); >>> + if (valid <= 0) >>> + return valid; >>> + >> This patch has confused me in the past, and today I found myself >> scratching my head again looking at it. >> >> So, I've started seeing generic/123 test failing when running it with >> test_dummy_encryption. I was almost sure that this test used to run fine >> before, but I couldn't find any evidence (somehow I lost my old testing >> logs...). >> >> Anyway, the test is quite simple: >> >> 1. Creates a directory with write permissions for root only >> 2. Writes into a file in that directory >> 3. Uses 'su' to try to modify that file as a different user, and >> gets -EPERM >> >> All these steps run fine, and the test should pass. *However*, in the >> test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >> 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >> and then -ENOTEMPTY for the directory. >> >> Some digging allowed me to figure out that running commands with 'su' will >> drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >> how I ended up looking at this patch. fscrypt_d_revalidate() will return >> '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >> Can we really say here that the dentry is *not* valid in that case? Or >> should that '<= 0' be a '< 0'? >> >> (But again, this patch has confused me before...) > > Luis, > > Could you reproduce it with the latest testing branch ? Yes, I'm seeing this with the latest code. > I never seen the generic/123 failure yet. And just now I ran the test for many > times locally it worked fine. That's odd. With 'test_dummy_encryption' mount option I can reproduce it every time. > From the generic/123 test code it will never touch the key while testing, that > means the dentries under the test dir will always have the keyed name. And then > the 'fscrypt_d_revalidate()' should return 1 always. > > Only when we remove the key will it trigger evicting the inodes and then when we > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the > 'fscrypt_has_encryption_key()'. > > As I remembered we have one or more fixes about this those days, not sure > whether you were hitting those bugs we have already fixed ? Yeah, I remember now, and I guess there's yet another one here! I'll look closer into this and see if I can find out something else. I'm definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the bug is in the error paths, when the 'fsgqa' user tries to write into the file. Thanks for your feedback, Xiubo. Cheers,
On 08/03/2023 17:29, Luís Henriques wrote: > Xiubo Li <xiubli@redhat.com> writes: > >> On 08/03/2023 02:53, Luís Henriques wrote: >>> xiubli@redhat.com writes: >>> >>>> From: Jeff Layton <jlayton@kernel.org> >>>> >>>> If we have a dentry which represents a no-key name, then we need to test >>>> whether the parent directory's encryption key has since been added. Do >>>> that before we test anything else about the dentry. >>>> >>>> Reviewed-by: Xiubo Li <xiubli@redhat.com> >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>> --- >>>> fs/ceph/dir.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >>>> index d3c2853bb0f1..5ead9f59e693 100644 >>>> --- a/fs/ceph/dir.c >>>> +++ b/fs/ceph/dir.c >>>> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >>>> struct inode *dir, *inode; >>>> struct ceph_mds_client *mdsc; >>>> + valid = fscrypt_d_revalidate(dentry, flags); >>>> + if (valid <= 0) >>>> + return valid; >>>> + >>> This patch has confused me in the past, and today I found myself >>> scratching my head again looking at it. >>> >>> So, I've started seeing generic/123 test failing when running it with >>> test_dummy_encryption. I was almost sure that this test used to run fine >>> before, but I couldn't find any evidence (somehow I lost my old testing >>> logs...). >>> >>> Anyway, the test is quite simple: >>> >>> 1. Creates a directory with write permissions for root only >>> 2. Writes into a file in that directory >>> 3. Uses 'su' to try to modify that file as a different user, and >>> gets -EPERM >>> >>> All these steps run fine, and the test should pass. *However*, in the >>> test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >>> 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >>> and then -ENOTEMPTY for the directory. >>> >>> Some digging allowed me to figure out that running commands with 'su' will >>> drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >>> how I ended up looking at this patch. fscrypt_d_revalidate() will return >>> '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >>> Can we really say here that the dentry is *not* valid in that case? Or >>> should that '<= 0' be a '< 0'? >>> >>> (But again, this patch has confused me before...) >> Luis, >> >> Could you reproduce it with the latest testing branch ? > Yes, I'm seeing this with the latest code. Okay. That's odd. BTW, are you using the non-root user to run the test ? Locally I am using the root user and still couldn't reproduce it. > >> I never seen the generic/123 failure yet. And just now I ran the test for many >> times locally it worked fine. > That's odd. With 'test_dummy_encryption' mount option I can reproduce it > every time. > >> From the generic/123 test code it will never touch the key while testing, that >> means the dentries under the test dir will always have the keyed name. And then >> the 'fscrypt_d_revalidate()' should return 1 always. >> >> Only when we remove the key will it trigger evicting the inodes and then when we >> add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >> 'fscrypt_has_encryption_key()'. >> >> As I remembered we have one or more fixes about this those days, not sure >> whether you were hitting those bugs we have already fixed ? > Yeah, I remember now, and I guess there's yet another one here! > > I'll look closer into this and see if I can find out something else. I'm > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the > bug is in the error paths, when the 'fsgqa' user tries to write into the > file. Please add some debug logs in the code. Thanks - Xiubo > Thanks for your feedback, Xiubo. > > Cheers,
Xiubo Li <xiubli@redhat.com> writes: > On 08/03/2023 17:29, Luís Henriques wrote: >> Xiubo Li <xiubli@redhat.com> writes: >> >>> On 08/03/2023 02:53, Luís Henriques wrote: >>>> xiubli@redhat.com writes: >>>> >>>>> From: Jeff Layton <jlayton@kernel.org> >>>>> >>>>> If we have a dentry which represents a no-key name, then we need to test >>>>> whether the parent directory's encryption key has since been added. Do >>>>> that before we test anything else about the dentry. >>>>> >>>>> Reviewed-by: Xiubo Li <xiubli@redhat.com> >>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>> --- >>>>> fs/ceph/dir.c | 8 ++++++-- >>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >>>>> index d3c2853bb0f1..5ead9f59e693 100644 >>>>> --- a/fs/ceph/dir.c >>>>> +++ b/fs/ceph/dir.c >>>>> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >>>>> struct inode *dir, *inode; >>>>> struct ceph_mds_client *mdsc; >>>>> + valid = fscrypt_d_revalidate(dentry, flags); >>>>> + if (valid <= 0) >>>>> + return valid; >>>>> + >>>> This patch has confused me in the past, and today I found myself >>>> scratching my head again looking at it. >>>> >>>> So, I've started seeing generic/123 test failing when running it with >>>> test_dummy_encryption. I was almost sure that this test used to run fine >>>> before, but I couldn't find any evidence (somehow I lost my old testing >>>> logs...). >>>> >>>> Anyway, the test is quite simple: >>>> >>>> 1. Creates a directory with write permissions for root only >>>> 2. Writes into a file in that directory >>>> 3. Uses 'su' to try to modify that file as a different user, and >>>> gets -EPERM >>>> >>>> All these steps run fine, and the test should pass. *However*, in the >>>> test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >>>> 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >>>> and then -ENOTEMPTY for the directory. >>>> >>>> Some digging allowed me to figure out that running commands with 'su' will >>>> drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >>>> how I ended up looking at this patch. fscrypt_d_revalidate() will return >>>> '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >>>> Can we really say here that the dentry is *not* valid in that case? Or >>>> should that '<= 0' be a '< 0'? >>>> >>>> (But again, this patch has confused me before...) >>> Luis, >>> >>> Could you reproduce it with the latest testing branch ? >> Yes, I'm seeing this with the latest code. > > Okay. That's odd. > > BTW, are you using the non-root user to run the test ? > > Locally I am using the root user and still couldn't reproduce it. Yes, I'm running the tests as root but I've also 'fsgqa' user in the system (which is used by this test. Anyway, for reference, here's what I'm using in my fstests configuration: TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> >>> I never seen the generic/123 failure yet. And just now I ran the test for many >>> times locally it worked fine. >> That's odd. With 'test_dummy_encryption' mount option I can reproduce it >> every time. >> >>> From the generic/123 test code it will never touch the key while testing, that >>> means the dentries under the test dir will always have the keyed name. And then >>> the 'fscrypt_d_revalidate()' should return 1 always. >>> >>> Only when we remove the key will it trigger evicting the inodes and then when we >>> add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >>> 'fscrypt_has_encryption_key()'. >>> >>> As I remembered we have one or more fixes about this those days, not sure >>> whether you were hitting those bugs we have already fixed ? >> Yeah, I remember now, and I guess there's yet another one here! >> >> I'll look closer into this and see if I can find out something else. I'm >> definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the >> bug is in the error paths, when the 'fsgqa' user tries to write into the >> file. > > Please add some debug logs in the code. I *think* I've something. The problem seems to be that, after the drop_caches, the test directory is evicted and ceph_evict_inode() will call fscrypt_put_encryption_info(). This last function will clear the inode fscrypt info. Later on, when the test tries to write to the file with: _user_do "echo goo >> $my_test_subdir/data_coherency.txt" function ceph_atomic_open() will correctly identify that '$my_test_subdir' is encrypted, but the key isn't set because the inode was evicted. This means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME will be *incorrectly* added to the 'data_coherency.txt' dentry flags. Later on, ceph_d_revalidate() will see the problem I initially described. The (RFC) patch bellow seems to fix the issue. Basically, it will force the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() and the fscrypt_has_encryption_key() will then return 'true'. Cheers
On Wed, 2023-03-08 at 17:14 +0000, Luís Henriques wrote: > Xiubo Li <xiubli@redhat.com> writes: > > > On 08/03/2023 17:29, Luís Henriques wrote: > > > Xiubo Li <xiubli@redhat.com> writes: > > > > > > > On 08/03/2023 02:53, Luís Henriques wrote: > > > > > xiubli@redhat.com writes: > > > > > > > > > > > From: Jeff Layton <jlayton@kernel.org> > > > > > > > > > > > > If we have a dentry which represents a no-key name, then we need to test > > > > > > whether the parent directory's encryption key has since been added. Do > > > > > > that before we test anything else about the dentry. > > > > > > > > > > > > Reviewed-by: Xiubo Li <xiubli@redhat.com> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > --- > > > > > > fs/ceph/dir.c | 8 ++++++-- > > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > > > > > > index d3c2853bb0f1..5ead9f59e693 100644 > > > > > > --- a/fs/ceph/dir.c > > > > > > +++ b/fs/ceph/dir.c > > > > > > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) > > > > > > struct inode *dir, *inode; > > > > > > struct ceph_mds_client *mdsc; > > > > > > + valid = fscrypt_d_revalidate(dentry, flags); > > > > > > + if (valid <= 0) > > > > > > + return valid; > > > > > > + > > > > > This patch has confused me in the past, and today I found myself > > > > > scratching my head again looking at it. > > > > > > > > > > So, I've started seeing generic/123 test failing when running it with > > > > > test_dummy_encryption. I was almost sure that this test used to run fine > > > > > before, but I couldn't find any evidence (somehow I lost my old testing > > > > > logs...). > > > > > > > > > > Anyway, the test is quite simple: > > > > > > > > > > 1. Creates a directory with write permissions for root only > > > > > 2. Writes into a file in that directory > > > > > 3. Uses 'su' to try to modify that file as a different user, and > > > > > gets -EPERM > > > > > > > > > > All these steps run fine, and the test should pass. *However*, in the > > > > > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. > > > > > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' > > > > > and then -ENOTEMPTY for the directory. > > > > > > > > > > Some digging allowed me to figure out that running commands with 'su' will > > > > > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is > > > > > how I ended up looking at this patch. fscrypt_d_revalidate() will return > > > > > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). > > > > > Can we really say here that the dentry is *not* valid in that case? Or > > > > > should that '<= 0' be a '< 0'? > > > > > > > > > > (But again, this patch has confused me before...) > > > > Luis, > > > > > > > > Could you reproduce it with the latest testing branch ? > > > Yes, I'm seeing this with the latest code. > > > > Okay. That's odd. > > > > BTW, are you using the non-root user to run the test ? > > > > Locally I am using the root user and still couldn't reproduce it. > > Yes, I'm running the tests as root but I've also 'fsgqa' user in the > system (which is used by this test. Anyway, for reference, here's what > I'm using in my fstests configuration: > > TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > > > > > > > > I never seen the generic/123 failure yet. And just now I ran the test for many > > > > times locally it worked fine. > > > That's odd. With 'test_dummy_encryption' mount option I can reproduce it > > > every time. > > > > > > > From the generic/123 test code it will never touch the key while testing, that > > > > means the dentries under the test dir will always have the keyed name. And then > > > > the 'fscrypt_d_revalidate()' should return 1 always. > > > > > > > > Only when we remove the key will it trigger evicting the inodes and then when we > > > > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the > > > > 'fscrypt_has_encryption_key()'. > > > > > > > > As I remembered we have one or more fixes about this those days, not sure > > > > whether you were hitting those bugs we have already fixed ? > > > Yeah, I remember now, and I guess there's yet another one here! > > > > > > I'll look closer into this and see if I can find out something else. I'm > > > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the > > > bug is in the error paths, when the 'fsgqa' user tries to write into the > > > file. > > > > Please add some debug logs in the code. > > I *think* I've something. The problem seems to be that, after the > drop_caches, the test directory is evicted and ceph_evict_inode() will > call fscrypt_put_encryption_info(). This last function will clear the > inode fscrypt info. Later on, when the test tries to write to the file > with: > > _user_do "echo goo >> $my_test_subdir/data_coherency.txt" > > function ceph_atomic_open() will correctly identify that '$my_test_subdir' > is encrypted, but the key isn't set because the inode was evicted. This > means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME > will be *incorrectly* added to the 'data_coherency.txt' dentry flags. > > Later on, ceph_d_revalidate() will see the problem I initially described. > > The (RFC) patch bellow seems to fix the issue. Basically, it will force > the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() > and the fscrypt_has_encryption_key() will then return 'true'. > > Cheers > -- > Luís > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index dee3b445f415..3f2df84a6323 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -795,7 +795,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, > ihold(dir); > if (IS_ENCRYPTED(dir)) { > set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); > - if (!fscrypt_has_encryption_key(dir)) { > + err = __fscrypt_prepare_readdir(dir); I want to say that i had something like this in place during an earlier version of this series, but for different reasons. I think I convinced myself later though that it wasn't needed? Oh well... > + if (err || (!err && !fscrypt_has_encryption_key(dir))) { > spin_lock(&dentry->d_lock); > dentry->d_flags |= DCACHE_NOKEY_NAME; > spin_unlock(&dentry->d_lock); Once an inode is evicted, my understanding was that it won't end up being used anymore. It's on its way out of the cache and it's not hashed anymore at that point. How does a new atomic open after drop_caches end up with the inode struct that existed before it?
Jeff Layton <jlayton@kernel.org> writes: > On Wed, 2023-03-08 at 17:14 +0000, Luís Henriques wrote: >> Xiubo Li <xiubli@redhat.com> writes: >> >> > On 08/03/2023 17:29, Luís Henriques wrote: >> > > Xiubo Li <xiubli@redhat.com> writes: >> > > >> > > > On 08/03/2023 02:53, Luís Henriques wrote: >> > > > > xiubli@redhat.com writes: >> > > > > >> > > > > > From: Jeff Layton <jlayton@kernel.org> >> > > > > > >> > > > > > If we have a dentry which represents a no-key name, then we need to test >> > > > > > whether the parent directory's encryption key has since been added. Do >> > > > > > that before we test anything else about the dentry. >> > > > > > >> > > > > > Reviewed-by: Xiubo Li <xiubli@redhat.com> >> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> >> > > > > > --- >> > > > > > fs/ceph/dir.c | 8 ++++++-- >> > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) >> > > > > > >> > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >> > > > > > index d3c2853bb0f1..5ead9f59e693 100644 >> > > > > > --- a/fs/ceph/dir.c >> > > > > > +++ b/fs/ceph/dir.c >> > > > > > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >> > > > > > struct inode *dir, *inode; >> > > > > > struct ceph_mds_client *mdsc; >> > > > > > + valid = fscrypt_d_revalidate(dentry, flags); >> > > > > > + if (valid <= 0) >> > > > > > + return valid; >> > > > > > + >> > > > > This patch has confused me in the past, and today I found myself >> > > > > scratching my head again looking at it. >> > > > > >> > > > > So, I've started seeing generic/123 test failing when running it with >> > > > > test_dummy_encryption. I was almost sure that this test used to run fine >> > > > > before, but I couldn't find any evidence (somehow I lost my old testing >> > > > > logs...). >> > > > > >> > > > > Anyway, the test is quite simple: >> > > > > >> > > > > 1. Creates a directory with write permissions for root only >> > > > > 2. Writes into a file in that directory >> > > > > 3. Uses 'su' to try to modify that file as a different user, and >> > > > > gets -EPERM >> > > > > >> > > > > All these steps run fine, and the test should pass. *However*, in the >> > > > > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >> > > > > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >> > > > > and then -ENOTEMPTY for the directory. >> > > > > >> > > > > Some digging allowed me to figure out that running commands with 'su' will >> > > > > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >> > > > > how I ended up looking at this patch. fscrypt_d_revalidate() will return >> > > > > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >> > > > > Can we really say here that the dentry is *not* valid in that case? Or >> > > > > should that '<= 0' be a '< 0'? >> > > > > >> > > > > (But again, this patch has confused me before...) >> > > > Luis, >> > > > >> > > > Could you reproduce it with the latest testing branch ? >> > > Yes, I'm seeing this with the latest code. >> > >> > Okay. That's odd. >> > >> > BTW, are you using the non-root user to run the test ? >> > >> > Locally I am using the root user and still couldn't reproduce it. >> >> Yes, I'm running the tests as root but I've also 'fsgqa' user in the >> system (which is used by this test. Anyway, for reference, here's what >> I'm using in my fstests configuration: >> >> TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> >> > > >> > > > I never seen the generic/123 failure yet. And just now I ran the test for many >> > > > times locally it worked fine. >> > > That's odd. With 'test_dummy_encryption' mount option I can reproduce it >> > > every time. >> > > >> > > > From the generic/123 test code it will never touch the key while testing, that >> > > > means the dentries under the test dir will always have the keyed name. And then >> > > > the 'fscrypt_d_revalidate()' should return 1 always. >> > > > >> > > > Only when we remove the key will it trigger evicting the inodes and then when we >> > > > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >> > > > 'fscrypt_has_encryption_key()'. >> > > > >> > > > As I remembered we have one or more fixes about this those days, not sure >> > > > whether you were hitting those bugs we have already fixed ? >> > > Yeah, I remember now, and I guess there's yet another one here! >> > > >> > > I'll look closer into this and see if I can find out something else. I'm >> > > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the >> > > bug is in the error paths, when the 'fsgqa' user tries to write into the >> > > file. >> > >> > Please add some debug logs in the code. >> >> I *think* I've something. The problem seems to be that, after the >> drop_caches, the test directory is evicted and ceph_evict_inode() will >> call fscrypt_put_encryption_info(). This last function will clear the >> inode fscrypt info. Later on, when the test tries to write to the file >> with: >> >> _user_do "echo goo >> $my_test_subdir/data_coherency.txt" >> >> function ceph_atomic_open() will correctly identify that '$my_test_subdir' >> is encrypted, but the key isn't set because the inode was evicted. This >> means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME >> will be *incorrectly* added to the 'data_coherency.txt' dentry flags. >> >> Later on, ceph_d_revalidate() will see the problem I initially described. >> >> The (RFC) patch bellow seems to fix the issue. Basically, it will force >> the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() >> and the fscrypt_has_encryption_key() will then return 'true'. >> > > >> Cheers >> -- >> Luís >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index dee3b445f415..3f2df84a6323 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -795,7 +795,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, >> ihold(dir); >> if (IS_ENCRYPTED(dir)) { >> set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); >> - if (!fscrypt_has_encryption_key(dir)) { >> + err = __fscrypt_prepare_readdir(dir); > > I want to say that i had something like this in place during an earlier > version of this series, but for different reasons. I think I convinced > myself later though that it wasn't needed? Oh well... Ah, good to know it _may_ make sense :-) >> + if (err || (!err && !fscrypt_has_encryption_key(dir))) { >> spin_lock(&dentry->d_lock); >> dentry->d_flags |= DCACHE_NOKEY_NAME; >> spin_unlock(&dentry->d_lock); > > Once an inode is evicted, my understanding was that it won't end up > being used anymore. It's on its way out of the cache and it's not hashed > anymore at that point. > > How does a new atomic open after drop_caches end up with the inode > struct that existed before it? Hmm... so, I *think* that what's happening is that it is a new inode but the key is still available. Looking at the code it seems that fscrypt will get the context (->get_context()) from ceph code and then fscrypt_setup_encryption_info() should initialize everything in the inode. And at that point fscrypt_has_encryption_key() will finally return 'true'. Does this make sense? Cheers,
On Wed, 2023-03-08 at 18:30 +0000, Luís Henriques wrote: > Jeff Layton <jlayton@kernel.org> writes: > > > On Wed, 2023-03-08 at 17:14 +0000, Luís Henriques wrote: > > > Xiubo Li <xiubli@redhat.com> writes: > > > > > > > On 08/03/2023 17:29, Luís Henriques wrote: > > > > > Xiubo Li <xiubli@redhat.com> writes: > > > > > > > > > > > On 08/03/2023 02:53, Luís Henriques wrote: > > > > > > > xiubli@redhat.com writes: > > > > > > > > > > > > > > > From: Jeff Layton <jlayton@kernel.org> > > > > > > > > > > > > > > > > If we have a dentry which represents a no-key name, then we need to test > > > > > > > > whether the parent directory's encryption key has since been added. Do > > > > > > > > that before we test anything else about the dentry. > > > > > > > > > > > > > > > > Reviewed-by: Xiubo Li <xiubli@redhat.com> > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > > > --- > > > > > > > > fs/ceph/dir.c | 8 ++++++-- > > > > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > > > > > > > > index d3c2853bb0f1..5ead9f59e693 100644 > > > > > > > > --- a/fs/ceph/dir.c > > > > > > > > +++ b/fs/ceph/dir.c > > > > > > > > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) > > > > > > > > struct inode *dir, *inode; > > > > > > > > struct ceph_mds_client *mdsc; > > > > > > > > + valid = fscrypt_d_revalidate(dentry, flags); > > > > > > > > + if (valid <= 0) > > > > > > > > + return valid; > > > > > > > > + > > > > > > > This patch has confused me in the past, and today I found myself > > > > > > > scratching my head again looking at it. > > > > > > > > > > > > > > So, I've started seeing generic/123 test failing when running it with > > > > > > > test_dummy_encryption. I was almost sure that this test used to run fine > > > > > > > before, but I couldn't find any evidence (somehow I lost my old testing > > > > > > > logs...). > > > > > > > > > > > > > > Anyway, the test is quite simple: > > > > > > > > > > > > > > 1. Creates a directory with write permissions for root only > > > > > > > 2. Writes into a file in that directory > > > > > > > 3. Uses 'su' to try to modify that file as a different user, and > > > > > > > gets -EPERM > > > > > > > > > > > > > > All these steps run fine, and the test should pass. *However*, in the > > > > > > > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. > > > > > > > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' > > > > > > > and then -ENOTEMPTY for the directory. > > > > > > > > > > > > > > Some digging allowed me to figure out that running commands with 'su' will > > > > > > > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is > > > > > > > how I ended up looking at this patch. fscrypt_d_revalidate() will return > > > > > > > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). > > > > > > > Can we really say here that the dentry is *not* valid in that case? Or > > > > > > > should that '<= 0' be a '< 0'? > > > > > > > > > > > > > > (But again, this patch has confused me before...) > > > > > > Luis, > > > > > > > > > > > > Could you reproduce it with the latest testing branch ? > > > > > Yes, I'm seeing this with the latest code. > > > > > > > > Okay. That's odd. > > > > > > > > BTW, are you using the non-root user to run the test ? > > > > > > > > Locally I am using the root user and still couldn't reproduce it. > > > > > > Yes, I'm running the tests as root but I've also 'fsgqa' user in the > > > system (which is used by this test. Anyway, for reference, here's what > > > I'm using in my fstests configuration: > > > > > > TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > > > MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > > > > > > > > > > > > > > I never seen the generic/123 failure yet. And just now I ran the test for many > > > > > > times locally it worked fine. > > > > > That's odd. With 'test_dummy_encryption' mount option I can reproduce it > > > > > every time. > > > > > > > > > > > From the generic/123 test code it will never touch the key while testing, that > > > > > > means the dentries under the test dir will always have the keyed name. And then > > > > > > the 'fscrypt_d_revalidate()' should return 1 always. > > > > > > > > > > > > Only when we remove the key will it trigger evicting the inodes and then when we > > > > > > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the > > > > > > 'fscrypt_has_encryption_key()'. > > > > > > > > > > > > As I remembered we have one or more fixes about this those days, not sure > > > > > > whether you were hitting those bugs we have already fixed ? > > > > > Yeah, I remember now, and I guess there's yet another one here! > > > > > > > > > > I'll look closer into this and see if I can find out something else. I'm > > > > > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the > > > > > bug is in the error paths, when the 'fsgqa' user tries to write into the > > > > > file. > > > > > > > > Please add some debug logs in the code. > > > > > > I *think* I've something. The problem seems to be that, after the > > > drop_caches, the test directory is evicted and ceph_evict_inode() will > > > call fscrypt_put_encryption_info(). This last function will clear the > > > inode fscrypt info. Later on, when the test tries to write to the file > > > with: > > > > > > _user_do "echo goo >> $my_test_subdir/data_coherency.txt" > > > > > > function ceph_atomic_open() will correctly identify that '$my_test_subdir' > > > is encrypted, but the key isn't set because the inode was evicted. This > > > means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME > > > will be *incorrectly* added to the 'data_coherency.txt' dentry flags. > > > > > > Later on, ceph_d_revalidate() will see the problem I initially described. > > > > > > The (RFC) patch bellow seems to fix the issue. Basically, it will force > > > the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() > > > and the fscrypt_has_encryption_key() will then return 'true'. > > > > > > > > > > Cheers > > > -- > > > Luís > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > > index dee3b445f415..3f2df84a6323 100644 > > > --- a/fs/ceph/file.c > > > +++ b/fs/ceph/file.c > > > @@ -795,7 +795,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, > > > ihold(dir); > > > if (IS_ENCRYPTED(dir)) { > > > set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); > > > - if (!fscrypt_has_encryption_key(dir)) { > > > + err = __fscrypt_prepare_readdir(dir); > > > > I want to say that i had something like this in place during an earlier > > version of this series, but for different reasons. I think I convinced > > myself later though that it wasn't needed? Oh well... > > Ah, good to know it _may_ make sense :-) > > > > + if (err || (!err && !fscrypt_has_encryption_key(dir))) { > > > spin_lock(&dentry->d_lock); > > > dentry->d_flags |= DCACHE_NOKEY_NAME; > > > spin_unlock(&dentry->d_lock); > > > > Once an inode is evicted, my understanding was that it won't end up > > being used anymore. It's on its way out of the cache and it's not hashed > > anymore at that point. > > > > How does a new atomic open after drop_caches end up with the inode > > struct that existed before it? > > Hmm... so, I *think* that what's happening is that it is a new inode but > the key is still available. Looking at the code it seems that fscrypt > will get the context (->get_context()) from ceph code and then > fscrypt_setup_encryption_info() should initialize everything in the > inode. And at that point fscrypt_has_encryption_key() will finally return > 'true'. > > Does this make sense? > Yeah, I think so. This is also coming back to me a bit too... Basically none of the existing fscrypt-supporting filesystems deal with atomic_open, so we need to do *something* in this codepath to ensure that the key is available if the parent is encrypted. The regular open path, we call fscrypt_file_open to ensure that, but we don't have the inode for the thing yet at this point. __fscrypt_preapre_readdir is what we need here (though that really needs a new name since it's not just for readdir). Reviewed-by: Jeff Layton <jlayton@kernel.org>
On 09/03/2023 01:14, Luís Henriques wrote: > Xiubo Li <xiubli@redhat.com> writes: > >> On 08/03/2023 17:29, Luís Henriques wrote: >>> Xiubo Li <xiubli@redhat.com> writes: >>> >>>> On 08/03/2023 02:53, Luís Henriques wrote: >>>>> xiubli@redhat.com writes: >>>>> >>>>>> From: Jeff Layton <jlayton@kernel.org> >>>>>> >>>>>> If we have a dentry which represents a no-key name, then we need to test >>>>>> whether the parent directory's encryption key has since been added. Do >>>>>> that before we test anything else about the dentry. >>>>>> >>>>>> Reviewed-by: Xiubo Li <xiubli@redhat.com> >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>>> --- >>>>>> fs/ceph/dir.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >>>>>> index d3c2853bb0f1..5ead9f59e693 100644 >>>>>> --- a/fs/ceph/dir.c >>>>>> +++ b/fs/ceph/dir.c >>>>>> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >>>>>> struct inode *dir, *inode; >>>>>> struct ceph_mds_client *mdsc; >>>>>> + valid = fscrypt_d_revalidate(dentry, flags); >>>>>> + if (valid <= 0) >>>>>> + return valid; >>>>>> + >>>>> This patch has confused me in the past, and today I found myself >>>>> scratching my head again looking at it. >>>>> >>>>> So, I've started seeing generic/123 test failing when running it with >>>>> test_dummy_encryption. I was almost sure that this test used to run fine >>>>> before, but I couldn't find any evidence (somehow I lost my old testing >>>>> logs...). >>>>> >>>>> Anyway, the test is quite simple: >>>>> >>>>> 1. Creates a directory with write permissions for root only >>>>> 2. Writes into a file in that directory >>>>> 3. Uses 'su' to try to modify that file as a different user, and >>>>> gets -EPERM >>>>> >>>>> All these steps run fine, and the test should pass. *However*, in the >>>>> test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >>>>> 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >>>>> and then -ENOTEMPTY for the directory. >>>>> >>>>> Some digging allowed me to figure out that running commands with 'su' will >>>>> drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >>>>> how I ended up looking at this patch. fscrypt_d_revalidate() will return >>>>> '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >>>>> Can we really say here that the dentry is *not* valid in that case? Or >>>>> should that '<= 0' be a '< 0'? >>>>> >>>>> (But again, this patch has confused me before...) >>>> Luis, >>>> >>>> Could you reproduce it with the latest testing branch ? >>> Yes, I'm seeing this with the latest code. >> Okay. That's odd. >> >> BTW, are you using the non-root user to run the test ? >> >> Locally I am using the root user and still couldn't reproduce it. > Yes, I'm running the tests as root but I've also 'fsgqa' user in the > system (which is used by this test. Anyway, for reference, here's what > I'm using in my fstests configuration: > > TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > >>>> I never seen the generic/123 failure yet. And just now I ran the test for many >>>> times locally it worked fine. >>> That's odd. With 'test_dummy_encryption' mount option I can reproduce it >>> every time. >>> >>>> From the generic/123 test code it will never touch the key while testing, that >>>> means the dentries under the test dir will always have the keyed name. And then >>>> the 'fscrypt_d_revalidate()' should return 1 always. >>>> >>>> Only when we remove the key will it trigger evicting the inodes and then when we >>>> add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >>>> 'fscrypt_has_encryption_key()'. >>>> >>>> As I remembered we have one or more fixes about this those days, not sure >>>> whether you were hitting those bugs we have already fixed ? >>> Yeah, I remember now, and I guess there's yet another one here! >>> >>> I'll look closer into this and see if I can find out something else. I'm >>> definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the >>> bug is in the error paths, when the 'fsgqa' user tries to write into the >>> file. >> Please add some debug logs in the code. > I *think* I've something. The problem seems to be that, after the > drop_caches, the test directory is evicted and ceph_evict_inode() will > call fscrypt_put_encryption_info(). This last function will clear the > inode fscrypt info. Later on, when the test tries to write to the file > with: > > _user_do "echo goo >> $my_test_subdir/data_coherency.txt" > > function ceph_atomic_open() will correctly identify that '$my_test_subdir' > is encrypted, but the key isn't set because the inode was evicted. This > means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME > will be *incorrectly* added to the 'data_coherency.txt' dentry flags. > > Later on, ceph_d_revalidate() will see the problem I initially described. > > The (RFC) patch bellow seems to fix the issue. Basically, it will force > the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() > and the fscrypt_has_encryption_key() will then return 'true'. Interesting. It's worth to add one separated commit to fix this. Luis, could you send one patch to the mail list ? And please add the detail comments in the code to explain it. This will help us to under stand the code and to debug potential similar bugs in future. Thank @Jeff for your confirm. - Xiubo > Cheers
Xiubo Li <xiubli@redhat.com> writes: > On 09/03/2023 01:14, Luís Henriques wrote: >> Xiubo Li <xiubli@redhat.com> writes: >> >>> On 08/03/2023 17:29, Luís Henriques wrote: >>>> Xiubo Li <xiubli@redhat.com> writes: >>>> >>>>> On 08/03/2023 02:53, Luís Henriques wrote: >>>>>> xiubli@redhat.com writes: >>>>>> >>>>>>> From: Jeff Layton <jlayton@kernel.org> >>>>>>> >>>>>>> If we have a dentry which represents a no-key name, then we need to test >>>>>>> whether the parent directory's encryption key has since been added. Do >>>>>>> that before we test anything else about the dentry. >>>>>>> >>>>>>> Reviewed-by: Xiubo Li <xiubli@redhat.com> >>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>>>> --- >>>>>>> fs/ceph/dir.c | 8 ++++++-- >>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >>>>>>> index d3c2853bb0f1..5ead9f59e693 100644 >>>>>>> --- a/fs/ceph/dir.c >>>>>>> +++ b/fs/ceph/dir.c >>>>>>> @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >>>>>>> struct inode *dir, *inode; >>>>>>> struct ceph_mds_client *mdsc; >>>>>>> + valid = fscrypt_d_revalidate(dentry, flags); >>>>>>> + if (valid <= 0) >>>>>>> + return valid; >>>>>>> + >>>>>> This patch has confused me in the past, and today I found myself >>>>>> scratching my head again looking at it. >>>>>> >>>>>> So, I've started seeing generic/123 test failing when running it with >>>>>> test_dummy_encryption. I was almost sure that this test used to run fine >>>>>> before, but I couldn't find any evidence (somehow I lost my old testing >>>>>> logs...). >>>>>> >>>>>> Anyway, the test is quite simple: >>>>>> >>>>>> 1. Creates a directory with write permissions for root only >>>>>> 2. Writes into a file in that directory >>>>>> 3. Uses 'su' to try to modify that file as a different user, and >>>>>> gets -EPERM >>>>>> >>>>>> All these steps run fine, and the test should pass. *However*, in the >>>>>> test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >>>>>> 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >>>>>> and then -ENOTEMPTY for the directory. >>>>>> >>>>>> Some digging allowed me to figure out that running commands with 'su' will >>>>>> drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >>>>>> how I ended up looking at this patch. fscrypt_d_revalidate() will return >>>>>> '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >>>>>> Can we really say here that the dentry is *not* valid in that case? Or >>>>>> should that '<= 0' be a '< 0'? >>>>>> >>>>>> (But again, this patch has confused me before...) >>>>> Luis, >>>>> >>>>> Could you reproduce it with the latest testing branch ? >>>> Yes, I'm seeing this with the latest code. >>> Okay. That's odd. >>> >>> BTW, are you using the non-root user to run the test ? >>> >>> Locally I am using the root user and still couldn't reproduce it. >> Yes, I'm running the tests as root but I've also 'fsgqa' user in the >> system (which is used by this test. Anyway, for reference, here's what >> I'm using in my fstests configuration: >> >> TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> >>>>> I never seen the generic/123 failure yet. And just now I ran the test for many >>>>> times locally it worked fine. >>>> That's odd. With 'test_dummy_encryption' mount option I can reproduce it >>>> every time. >>>> >>>>> From the generic/123 test code it will never touch the key while testing, that >>>>> means the dentries under the test dir will always have the keyed name. And then >>>>> the 'fscrypt_d_revalidate()' should return 1 always. >>>>> >>>>> Only when we remove the key will it trigger evicting the inodes and then when we >>>>> add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >>>>> 'fscrypt_has_encryption_key()'. >>>>> >>>>> As I remembered we have one or more fixes about this those days, not sure >>>>> whether you were hitting those bugs we have already fixed ? >>>> Yeah, I remember now, and I guess there's yet another one here! >>>> >>>> I'll look closer into this and see if I can find out something else. I'm >>>> definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the >>>> bug is in the error paths, when the 'fsgqa' user tries to write into the >>>> file. >>> Please add some debug logs in the code. >> I *think* I've something. The problem seems to be that, after the >> drop_caches, the test directory is evicted and ceph_evict_inode() will >> call fscrypt_put_encryption_info(). This last function will clear the >> inode fscrypt info. Later on, when the test tries to write to the file >> with: >> >> _user_do "echo goo >> $my_test_subdir/data_coherency.txt" >> >> function ceph_atomic_open() will correctly identify that '$my_test_subdir' >> is encrypted, but the key isn't set because the inode was evicted. This >> means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME >> will be *incorrectly* added to the 'data_coherency.txt' dentry flags. >> >> Later on, ceph_d_revalidate() will see the problem I initially described. >> >> The (RFC) patch bellow seems to fix the issue. Basically, it will force >> the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() >> and the fscrypt_has_encryption_key() will then return 'true'. > > Interesting. > > It's worth to add one separated commit to fix this. > > Luis, could you send one patch to the mail list ? And please add the detail > comments in the code to explain it. > > This will help us to under stand the code and to debug potential similar bugs in > future. Sure, I'll do that. In fact, I'll probably send out two patches as Jeff also suggested a better name for __fscrypt_prepare_readdir(). Not sure Eric will take it, but it's worth trying. Cheers,
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index d3c2853bb0f1..5ead9f59e693 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) struct inode *dir, *inode; struct ceph_mds_client *mdsc; + valid = fscrypt_d_revalidate(dentry, flags); + if (valid <= 0) + return valid; + if (flags & LOOKUP_RCU) { parent = READ_ONCE(dentry->d_parent); dir = d_inode_rcu(parent); @@ -1782,8 +1786,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) inode = d_inode(dentry); } - dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry, - dentry, inode, ceph_dentry(dentry)->offset); + dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry, + dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME)); mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;