diff mbox series

[3/3] ceph: do no update snapshot context when there is no new snapshot

Message ID 20220215122316.7625-4-xiubli@redhat.com
State New
Headers show
Series ceph: fix cephfs rsync kworker high load issue | expand

Commit Message

Xiubo Li Feb. 15, 2022, 12:23 p.m. UTC
From: Xiubo Li <xiubli@redhat.com>

No need to update snapshot context when any of the following two
cases happens:
1: if my context seq matches realm's seq and realm has no parent.
2: if my context seq equals or is larger than my parent's, this
   works because we rebuild_snap_realms() works _downward_ in
   hierarchy after each update.

This fix will avoid those inodes which accidently calling
ceph_queue_cap_snap() and make no sense, for exmaple:

There have 6 directories like:

/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/

Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then
make a root snapshot under /.snap/root_snap. And every time when
we make snapshots under /dir_Y1/..., the kclient will always try
to rebuild the snap context for snap_X2 realm and finally will
always try to queue cap snaps for dir_Y2 and dir_Y3, which makes
no sense.

That's because the snap_X2's seq is 2 and root_snap's seq is 3.
So when creating a new snapshot under /dir_Y1/... the new seq
will be 4, and then the mds will send kclient a snapshot backtrace
in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace()
it will always rebuild the from the last realm, that's the root_snap.
So later when rebuilding the snap context it will always rebuild
the snap_X2 realm and then try to queue cap snaps for all the inodes
related in snap_X2 realm, and we are seeing the logs like:

"ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/snap.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Yan, Zheng Feb. 17, 2022, 3:28 p.m. UTC | #1
On Thu, Feb 17, 2022 at 6:55 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2022-02-17 at 11:03 +0800, Yan, Zheng wrote:
> > On Tue, Feb 15, 2022 at 11:04 PM <xiubli@redhat.com> wrote:
> > >
> > > From: Xiubo Li <xiubli@redhat.com>
> > >
> > > No need to update snapshot context when any of the following two
> > > cases happens:
> > > 1: if my context seq matches realm's seq and realm has no parent.
> > > 2: if my context seq equals or is larger than my parent's, this
> > >    works because we rebuild_snap_realms() works _downward_ in
> > >    hierarchy after each update.
> > >
> > > This fix will avoid those inodes which accidently calling
> > > ceph_queue_cap_snap() and make no sense, for exmaple:
> > >
> > > There have 6 directories like:
> > >
> > > /dir_X1/dir_X2/dir_X3/
> > > /dir_Y1/dir_Y2/dir_Y3/
> > >
> > > Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then
> > > make a root snapshot under /.snap/root_snap. And every time when
> > > we make snapshots under /dir_Y1/..., the kclient will always try
> > > to rebuild the snap context for snap_X2 realm and finally will
> > > always try to queue cap snaps for dir_Y2 and dir_Y3, which makes
> > > no sense.
> > >
> > > That's because the snap_X2's seq is 2 and root_snap's seq is 3.
> > > So when creating a new snapshot under /dir_Y1/... the new seq
> > > will be 4, and then the mds will send kclient a snapshot backtrace
> > > in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace()
> > > it will always rebuild the from the last realm, that's the root_snap.
> > > So later when rebuilding the snap context it will always rebuild
> > > the snap_X2 realm and then try to queue cap snaps for all the inodes
> > > related in snap_X2 realm, and we are seeing the logs like:
> > >
> > > "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
> > >
> > > URL: https://tracker.ceph.com/issues/44100
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >  fs/ceph/snap.c | 16 +++++++++-------
> > >  1 file changed, 9 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > index d075d3ce5f6d..1f24a5de81e7 100644
> > > --- a/fs/ceph/snap.c
> > > +++ b/fs/ceph/snap.c
> > > @@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm,
> > >                 num += parent->cached_context->num_snaps;
> > >         }
> > >
> > > -       /* do i actually need to update?  not if my context seq
> > > -          matches realm seq, and my parents' does to.  (this works
> > > -          because we rebuild_snap_realms() works _downward_ in
> > > -          hierarchy after each update.) */
> > > +       /* do i actually need to update? No need when any of the following
> > > +        * two cases:
> > > +        * #1: if my context seq matches realm's seq and realm has no parent.
> > > +        * #2: if my context seq equals or is larger than my parent's, this
> > > +        *     works because we rebuild_snap_realms() works _downward_ in
> > > +        *     hierarchy after each update.
> > > +        */
> > >         if (realm->cached_context &&
> > > -           realm->cached_context->seq == realm->seq &&
> > > -           (!parent ||
> > > -            realm->cached_context->seq >= parent->cached_context->seq)) {
> > > +           ((realm->cached_context->seq == realm->seq && !parent) ||
> > > +            (parent && realm->cached_context->seq >= parent->cached_context->seq))) {
> >
> > With this change. When you mksnap on  /dir_Y1/, its snap context keeps
> > unchanged. In ceph_update_snap_trace, reset the 'invalidate' variable
> > for each realm should fix this issue.
> >
>
> This comment is terribly vague. "invalidate" is a local variable in that
> function and isn't set on a per-realm basis.
>
> Could you suggest a patch on top of Xiubo's patch instead?
>

something like this (not tested)

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index af502a8245f0..6ef41764008b 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -704,7 +704,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
        __le64 *prior_parent_snaps;        /* encoded */
        struct ceph_snap_realm *realm = NULL;
        struct ceph_snap_realm *first_realm = NULL;
-       int invalidate = 0;
+       struct ceph_snap_realm *realm_to_inval = NULL;
+       int invalidate;
        int err = -ENOMEM;
        LIST_HEAD(dirty_realms);

@@ -712,6 +713,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,

        dout("update_snap_trace deletion=%d\n", deletion);
 more:
+       invalidate = 0;
        ceph_decode_need(&p, e, sizeof(*ri), bad);
        ri = p;
        p += sizeof(*ri);
@@ -774,8 +776,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
             realm, invalidate, p, e);

        /* invalidate when we reach the _end_ (root) of the trace */
-       if (invalidate && p >= e)
-               rebuild_snap_realms(realm, &dirty_realms);
+       if (invalidate)
+               realm_to_inval = realm;
+       if (realm_to_inval && p >= e)
+               rebuild_snap_realms(realm_to_inval, &dirty_realms);

        if (!first_realm)
                first_realm = realm;



>
> > >                 dout("build_snap_context %llx %p: %p seq %lld (%u snaps),
> > >                      " (unchanged)\n",
> > >                      realm->ino, realm, realm->cached_context,
> > > --
> > > 2.27.0
> > >
>
> --
> Jeff Layton <jlayton@kernel.org>
Xiubo Li Feb. 18, 2022, 1:46 a.m. UTC | #2
On 2/17/22 11:28 PM, Yan, Zheng wrote:
> On Thu, Feb 17, 2022 at 6:55 PM Jeff Layton <jlayton@kernel.org> wrote:
>> On Thu, 2022-02-17 at 11:03 +0800, Yan, Zheng wrote:
>>> On Tue, Feb 15, 2022 at 11:04 PM <xiubli@redhat.com> wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> No need to update snapshot context when any of the following two
>>>> cases happens:
>>>> 1: if my context seq matches realm's seq and realm has no parent.
>>>> 2: if my context seq equals or is larger than my parent's, this
>>>>     works because we rebuild_snap_realms() works _downward_ in
>>>>     hierarchy after each update.
>>>>
>>>> This fix will avoid those inodes which accidently calling
>>>> ceph_queue_cap_snap() and make no sense, for exmaple:
>>>>
>>>> There have 6 directories like:
>>>>
>>>> /dir_X1/dir_X2/dir_X3/
>>>> /dir_Y1/dir_Y2/dir_Y3/
>>>>
>>>> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then
>>>> make a root snapshot under /.snap/root_snap. And every time when
>>>> we make snapshots under /dir_Y1/..., the kclient will always try
>>>> to rebuild the snap context for snap_X2 realm and finally will
>>>> always try to queue cap snaps for dir_Y2 and dir_Y3, which makes
>>>> no sense.
>>>>
>>>> That's because the snap_X2's seq is 2 and root_snap's seq is 3.
>>>> So when creating a new snapshot under /dir_Y1/... the new seq
>>>> will be 4, and then the mds will send kclient a snapshot backtrace
>>>> in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace()
>>>> it will always rebuild the from the last realm, that's the root_snap.
>>>> So later when rebuilding the snap context it will always rebuild
>>>> the snap_X2 realm and then try to queue cap snaps for all the inodes
>>>> related in snap_X2 realm, and we are seeing the logs like:
>>>>
>>>> "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
>>>>
>>>> URL: https://tracker.ceph.com/issues/44100
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>   fs/ceph/snap.c | 16 +++++++++-------
>>>>   1 file changed, 9 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>>>> index d075d3ce5f6d..1f24a5de81e7 100644
>>>> --- a/fs/ceph/snap.c
>>>> +++ b/fs/ceph/snap.c
>>>> @@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm,
>>>>                  num += parent->cached_context->num_snaps;
>>>>          }
>>>>
>>>> -       /* do i actually need to update?  not if my context seq
>>>> -          matches realm seq, and my parents' does to.  (this works
>>>> -          because we rebuild_snap_realms() works _downward_ in
>>>> -          hierarchy after each update.) */
>>>> +       /* do i actually need to update? No need when any of the following
>>>> +        * two cases:
>>>> +        * #1: if my context seq matches realm's seq and realm has no parent.
>>>> +        * #2: if my context seq equals or is larger than my parent's, this
>>>> +        *     works because we rebuild_snap_realms() works _downward_ in
>>>> +        *     hierarchy after each update.
>>>> +        */
>>>>          if (realm->cached_context &&
>>>> -           realm->cached_context->seq == realm->seq &&
>>>> -           (!parent ||
>>>> -            realm->cached_context->seq >= parent->cached_context->seq)) {
>>>> +           ((realm->cached_context->seq == realm->seq && !parent) ||
>>>> +            (parent && realm->cached_context->seq >= parent->cached_context->seq))) {
>>> With this change. When you mksnap on  /dir_Y1/, its snap context keeps
>>> unchanged. In ceph_update_snap_trace, reset the 'invalidate' variable
>>> for each realm should fix this issue.
>>>
Thanks Zheng for your feedback.

Yeah, there has one case this will happen. Your approach is simpler I 
will post a V2 for this.

-- Xiubo






>> This comment is terribly vague. "invalidate" is a local variable in that
>> function and isn't set on a per-realm basis.
>>
>> Could you suggest a patch on top of Xiubo's patch instead?
>>
> something like this (not tested)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index af502a8245f0..6ef41764008b 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -704,7 +704,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>          __le64 *prior_parent_snaps;        /* encoded */
>          struct ceph_snap_realm *realm = NULL;
>          struct ceph_snap_realm *first_realm = NULL;
> -       int invalidate = 0;
> +       struct ceph_snap_realm *realm_to_inval = NULL;
> +       int invalidate;
>          int err = -ENOMEM;
>          LIST_HEAD(dirty_realms);
>
> @@ -712,6 +713,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>
>          dout("update_snap_trace deletion=%d\n", deletion);
>   more:
> +       invalidate = 0;
>          ceph_decode_need(&p, e, sizeof(*ri), bad);
>          ri = p;
>          p += sizeof(*ri);
> @@ -774,8 +776,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>               realm, invalidate, p, e);
>
>          /* invalidate when we reach the _end_ (root) of the trace */
> -       if (invalidate && p >= e)
> -               rebuild_snap_realms(realm, &dirty_realms);
> +       if (invalidate)
> +               realm_to_inval = realm;
> +       if (realm_to_inval && p >= e)
> +               rebuild_snap_realms(realm_to_inval, &dirty_realms);
>
>          if (!first_realm)
>                  first_realm = realm;
>
>
>
>>>>                  dout("build_snap_context %llx %p: %p seq %lld (%u snaps),
>>>>                       " (unchanged)\n",
>>>>                       realm->ino, realm, realm->cached_context,
>>>> --
>>>> 2.27.0
>>>>
>> --
>> Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index d075d3ce5f6d..1f24a5de81e7 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -341,14 +341,16 @@  static int build_snap_context(struct ceph_snap_realm *realm,
 		num += parent->cached_context->num_snaps;
 	}
 
-	/* do i actually need to update?  not if my context seq
-	   matches realm seq, and my parents' does to.  (this works
-	   because we rebuild_snap_realms() works _downward_ in
-	   hierarchy after each update.) */
+	/* do i actually need to update? No need when any of the following
+	 * two cases:
+	 * #1: if my context seq matches realm's seq and realm has no parent.
+	 * #2: if my context seq equals or is larger than my parent's, this
+	 *     works because we rebuild_snap_realms() works _downward_ in
+	 *     hierarchy after each update.
+	 */
 	if (realm->cached_context &&
-	    realm->cached_context->seq == realm->seq &&
-	    (!parent ||
-	     realm->cached_context->seq >= parent->cached_context->seq)) {
+	    ((realm->cached_context->seq == realm->seq && !parent) ||
+	     (parent && realm->cached_context->seq >= parent->cached_context->seq))) {
 		dout("build_snap_context %llx %p: %p seq %lld (%u snaps)"
 		     " (unchanged)\n",
 		     realm->ino, realm, realm->cached_context,