diff mbox series

ceph: wait on async create before checking caps for syncfs

Message ID 20220606233142.150457-1-jlayton@kernel.org
State New
Headers show
Series ceph: wait on async create before checking caps for syncfs | expand

Commit Message

Jeff Layton June 6, 2022, 11:31 p.m. UTC
Currently, we'll call ceph_check_caps, but if we're still waiting on the
reply, we'll end up spinning around on the same inode in
flush_dirty_session_caps. Wait for the async create reply before
flushing caps.

Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
URL: https://tracker.ceph.com/issues/55823
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 1 +
 1 file changed, 1 insertion(+)

I don't know if this will fix the tx queue stalls completely, but I
haven't seen one with this patch in place. I think it makes sense on its
own, either way.

Comments

Yan, Zheng June 9, 2022, 3:20 p.m. UTC | #1
On Thu, Jun 9, 2022 at 12:18 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/22 12:02 PM, Yan, Zheng wrote:
> > On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 6/9/22 11:29 AM, Yan, Zheng wrote:
> >>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>> On 6/9/22 10:15 AM, Yan, Zheng wrote:
> >>>>> The recent series of patches that add "wait on async xxxx" at various
> >>>>> places do not seem correct. The correct fix should make mds avoid any
> >>>>> wait when handling async requests.
> >>>>>
> >>>> In this case I am thinking what will happen if the async create request
> >>>> is deferred, then the cap flush related request should fail to find the
> >>>> ino.
> >>>>
> >>>> Should we wait ? Then how to distinguish from migrating a subtree and a
> >>>> deferred async create cases ?
> >>>>
> >>> async op caps are revoked at freezingtree stage of subtree migration.
> >>> see Locker::invalidate_lock_caches
> >>>
> >> Sorry I may not totally understand this issue.
> >>
> >> I think you mean in case of migration and then the MDS will revoke caps
> >> for the async create files and then the kclient will send a MclientCap
> >> request to mds, right ?
> >>
> >> If my understanding is correct, there is another case that:
> >>
> >> 1, async create a fileA
> >>
> >> 2, then write a lot of data to it and then release the Fw cap ref, and
> >> if we should report the size to MDS, it will send a MclientCap request
> >> to MDS too.
> >>
> >> 3, what if the async create of fileA was deferred due to some reason,
> >> then the MclientCap request will fail to find the ino ?
> >>
> > Async op should not be deferred in any case.
>
> Recently we have hit a similar bug, caused by deferring a request and
> requeuing it and the following request was executed before it.
>
that bug is mds bug.

> >>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> >>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> >>>>>> reply, we'll end up spinning around on the same inode in
> >>>>>> flush_dirty_session_caps. Wait for the async create reply before
> >>>>>> flushing caps.
> >>>>>>
> >>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> >>>>>> URL: https://tracker.ceph.com/issues/55823
> >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> >>>>>> ---
> >>>>>>     fs/ceph/caps.c | 1 +
> >>>>>>     1 file changed, 1 insertion(+)
> >>>>>>
> >>>>>> I don't know if this will fix the tx queue stalls completely, but I
> >>>>>> haven't seen one with this patch in place. I think it makes sense on its
> >>>>>> own, either way.
> >>>>>>
> >>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >>>>>> index 0a48bf829671..5ecfff4b37c9 100644
> >>>>>> --- a/fs/ceph/caps.c
> >>>>>> +++ b/fs/ceph/caps.c
> >>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >>>>>>                    ihold(inode);
> >>>>>>                    dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >>>>>>                    spin_unlock(&mdsc->cap_dirty_lock);
> >>>>>> +               ceph_wait_on_async_create(inode);
> >>>>>>                    ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >>>>>>                    iput(inode);
> >>>>>>                    spin_lock(&mdsc->cap_dirty_lock);
> >>>>>> --
> >>>>>> 2.36.1
> >>>>>>
>
Jeff Layton June 29, 2022, 12:08 p.m. UTC | #2
Are you suggesting that the MDS ought to hold a cap message for an inode
before its create request is processed? Note that the MDS won't even be
aware that the inode even _exists_ at that point. As far as the MDS
knows, it's just be a delegated inode number to the client. At what
point does the MDS give up on holding such a cap request if the create
request never comes in for some reason?

I don't see the harm in making the client wait until it gets a create
reply before sending a cap message. If we want to revert fbed7045f552
instead, we can do that, but it'll cause a regression until the MDS is
fixed [1]. Regardless, we need to either take this patch or revert that
one. 

I move that we take this patch for now to address the softlockups. Once
the MDS is fixed we could revert this and fbed7045f552 without causing a
regression.

[1]: https://tracker.ceph.com/issues/54107


On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> The recent series of patches that add "wait on async xxxx" at various
> places do not seem correct. The correct fix should make mds avoid any
> wait when handling async requests.
> 
> 
> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > reply, we'll end up spinning around on the same inode in
> > flush_dirty_session_caps. Wait for the async create reply before
> > flushing caps.
> > 
> > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > URL: https://tracker.ceph.com/issues/55823
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > I don't know if this will fix the tx queue stalls completely, but I
> > haven't seen one with this patch in place. I think it makes sense on its
> > own, either way.
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 0a48bf829671..5ecfff4b37c9 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >                 ihold(inode);
> >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >                 spin_unlock(&mdsc->cap_dirty_lock);
> > +               ceph_wait_on_async_create(inode);
> >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >                 iput(inode);
> >                 spin_lock(&mdsc->cap_dirty_lock);
> > --
> > 2.36.1
> >
Yan, Zheng June 30, 2022, 2:33 a.m. UTC | #3
On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Are you suggesting that the MDS ought to hold a cap message for an inode
> before its create request is processed? Note that the MDS won't even be
> aware that the inode even _exists_ at that point. As far as the MDS
> knows, it's just be a delegated inode number to the client. At what
> point does the MDS give up on holding such a cap request if the create
> request never comes in for some reason?
>
For an async request, MDS should not process it immediately.  If there
is any wait when handling async request, it's mds bug. I suggest
tracking down any wait, and fix it.


> I don't see the harm in making the client wait until it gets a create
> reply before sending a cap message. If we want to revert fbed7045f552
> instead, we can do that, but it'll cause a regression until the MDS is
> fixed [1]. Regardless, we need to either take this patch or revert that
> one.
>
> I move that we take this patch for now to address the softlockups. Once
> the MDS is fixed we could revert this and fbed7045f552 without causing a
> regression.
>
> [1]: https://tracker.ceph.com/issues/54107
>
>
> On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> > The recent series of patches that add "wait on async xxxx" at various
> > places do not seem correct. The correct fix should make mds avoid any
> > wait when handling async requests.
> >
> >
> > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > > reply, we'll end up spinning around on the same inode in
> > > flush_dirty_session_caps. Wait for the async create reply before
> > > flushing caps.
> > >
> > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > > URL: https://tracker.ceph.com/issues/55823
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/caps.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > I don't know if this will fix the tx queue stalls completely, but I
> > > haven't seen one with this patch in place. I think it makes sense on its
> > > own, either way.
> > >
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index 0a48bf829671..5ecfff4b37c9 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> > >                 ihold(inode);
> > >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> > >                 spin_unlock(&mdsc->cap_dirty_lock);
> > > +               ceph_wait_on_async_create(inode);
> > >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> > >                 iput(inode);
> > >                 spin_lock(&mdsc->cap_dirty_lock);
> > > --
> > > 2.36.1
> > >
>
> --
> Jeff Layton <jlayton@kernel.org>
Yan, Zheng June 30, 2022, 3:32 a.m. UTC | #4
On Thu, Jun 30, 2022 at 10:33 AM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > Are you suggesting that the MDS ought to hold a cap message for an inode
> > before its create request is processed? Note that the MDS won't even be
> > aware that the inode even _exists_ at that point. As far as the MDS
> > knows, it's just be a delegated inode number to the client. At what
> > point does the MDS give up on holding such a cap request if the create
> > request never comes in for some reason?
> >
> For an async request, MDS should not process it immediately.  If there
> is any wait when handling async request, it's mds bug. I suggest
> tracking down any wait, and fix it.
>

I mean: For an async request, MDS should process it immediately. This
is important for preserving request ordering.

>
> > I don't see the harm in making the client wait until it gets a create
> > reply before sending a cap message. If we want to revert fbed7045f552
> > instead, we can do that, but it'll cause a regression until the MDS is
> > fixed [1]. Regardless, we need to either take this patch or revert that
> > one.
> >
> > I move that we take this patch for now to address the softlockups. Once
> > the MDS is fixed we could revert this and fbed7045f552 without causing a
> > regression.
> >
> > [1]: https://tracker.ceph.com/issues/54107
> >
> >
> > On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> > > The recent series of patches that add "wait on async xxxx" at various
> > > places do not seem correct. The correct fix should make mds avoid any
> > > wait when handling async requests.
> > >
> > >
> > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > > > reply, we'll end up spinning around on the same inode in
> > > > flush_dirty_session_caps. Wait for the async create reply before
> > > > flushing caps.
> > > >
> > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > > > URL: https://tracker.ceph.com/issues/55823
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/caps.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > I don't know if this will fix the tx queue stalls completely, but I
> > > > haven't seen one with this patch in place. I think it makes sense on its
> > > > own, either way.
> > > >
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index 0a48bf829671..5ecfff4b37c9 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> > > >                 ihold(inode);
> > > >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> > > >                 spin_unlock(&mdsc->cap_dirty_lock);
> > > > +               ceph_wait_on_async_create(inode);
> > > >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> > > >                 iput(inode);
> > > >                 spin_lock(&mdsc->cap_dirty_lock);
> > > > --
> > > > 2.36.1
> > > >
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 0a48bf829671..5ecfff4b37c9 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4389,6 +4389,7 @@  static void flush_dirty_session_caps(struct ceph_mds_session *s)
 		ihold(inode);
 		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
 		spin_unlock(&mdsc->cap_dirty_lock);
+		ceph_wait_on_async_create(inode);
 		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
 		iput(inode);
 		spin_lock(&mdsc->cap_dirty_lock);