diff mbox series

fs/ceph/file: fix memory leaks in __ceph_sync_read()

Message ID 20241127212027.2704515-1-max.kellermann@ionos.com
State New
Headers show
Series fs/ceph/file: fix memory leaks in __ceph_sync_read() | expand

Commit Message

Max Kellermann Nov. 27, 2024, 9:20 p.m. UTC
In two `break` statements, the call to ceph_release_page_vector() was
missing, leaking the allocation from ceph_alloc_page_vector().

Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/ceph/file.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Max Kellermann Dec. 5, 2024, 8:32 a.m. UTC | #1
On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze <amarkuze@redhat.com> wrote:
> > Pages are freed in `ceph_osdc_put_request`, trying to release them
> > this way will end badly.
>
> Is there anybody else who can explain this to me?
> I believe Alex is wrong and my patch is correct, but maybe I'm missing
> something.

It's been a week. Is there really nobody who understands this piece of
code? I believe I do understand it, but my understanding conflicts
with Alex's, and he's the expert (and I'm not).
Ilya Dryomov Dec. 5, 2024, 11:24 a.m. UTC | #2
On Thu, Dec 5, 2024 at 9:32 AM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann <max.kellermann@ionos.com> wrote:
> >
> > On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze <amarkuze@redhat.com> wrote:
> > > Pages are freed in `ceph_osdc_put_request`, trying to release them
> > > this way will end badly.
> >
> > Is there anybody else who can explain this to me?
> > I believe Alex is wrong and my patch is correct, but maybe I'm missing
> > something.
>
> It's been a week. Is there really nobody who understands this piece of
> code? I believe I do understand it, but my understanding conflicts
> with Alex's, and he's the expert (and I'm not).

Hi Max,

Your understanding is correct.  Pages would be freed automatically
together with the request only if the ownership is transferred by
passing true for own_pages to osd_req_op_extent_osd_data_pages(), which
__ceph_sync_read() doesn't do.

These error path leaks were introduced in commits 03bc06c7b0bd ("ceph:
add new mount option to enable sparse reads") and f0fe1e54cfcf ("ceph:
plumb in decryption during reads") with support for fscrypt.  Looking
at the former commit, it looks like a similar leak was introduced in
ceph_direct_read_write() too -- on bvecs instead of pages.

I have applied this patch and will take care of the leak on bvecs
myself because I think I see other issues there.

Thanks,

                Ilya
Alex Markuze Dec. 5, 2024, 11:31 a.m. UTC | #3
This is a bad patch, I don't appreciate partial fixes that introduce
unnecessary complications to the code, and it conflicts with the
complete fix in the other thread.
Please coordinate with me in the future.

On Thu, Dec 5, 2024 at 1:25 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Thu, Dec 5, 2024 at 9:32 AM Max Kellermann <max.kellermann@ionos.com> wrote:
> >
> > On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann <max.kellermann@ionos.com> wrote:
> > >
> > > On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze <amarkuze@redhat.com> wrote:
> > > > Pages are freed in `ceph_osdc_put_request`, trying to release them
> > > > this way will end badly.
> > >
> > > Is there anybody else who can explain this to me?
> > > I believe Alex is wrong and my patch is correct, but maybe I'm missing
> > > something.
> >
> > It's been a week. Is there really nobody who understands this piece of
> > code? I believe I do understand it, but my understanding conflicts
> > with Alex's, and he's the expert (and I'm not).
>
> Hi Max,
>
> Your understanding is correct.  Pages would be freed automatically
> together with the request only if the ownership is transferred by
> passing true for own_pages to osd_req_op_extent_osd_data_pages(), which
> __ceph_sync_read() doesn't do.
>
> These error path leaks were introduced in commits 03bc06c7b0bd ("ceph:
> add new mount option to enable sparse reads") and f0fe1e54cfcf ("ceph:
> plumb in decryption during reads") with support for fscrypt.  Looking
> at the former commit, it looks like a similar leak was introduced in
> ceph_direct_read_write() too -- on bvecs instead of pages.
>
> I have applied this patch and will take care of the leak on bvecs
> myself because I think I see other issues there.
>
> Thanks,
>
>                 Ilya
>
Max Kellermann Dec. 5, 2024, 12:02 p.m. UTC | #4
On Thu, Dec 5, 2024 at 12:31 PM Alex Markuze <amarkuze@redhat.com> wrote:
> This is a bad patch, I don't appreciate partial fixes that introduce
> unnecessary complications to the code, and it conflicts with the
> complete fix in the other thread.

Alex, and I don't appreciate the unnecessary complications you
introduce to the Ceph contribution process!

The mistake you made in your first review ("will end badly") is not a
big deal; happens to everybody - but you still don't admit the mistake
and you ghosted me for a week. But then saying you don't appreciate
the work of somebody who found a bug and posted a simple fix is not
good communication. You can say you prefer a different patch and
explain the technical reasons; but saying you don't appreciate it is
quite condescending.

Now back to the technical facts:

- What exactly about my patch is "bad"?
- Do you believe my patch is not strictly an improvement?
- Why do you believe my fix is only "partial"?
- What unnecessary complications are introduced by my two-line patch
in your opinion?
- What "other thread"? I can't find anything on LKML and ceph-devel.

Max
Alex Markuze Dec. 5, 2024, 12:17 p.m. UTC | #5
The full fix is now in the testing branch.

Max, please follow the mailing list, I posted the patch last week on
the initial thread regarding this issue. Please, comment on the
correct thread, having two threads regarding the same issue introduces
unnecessary confusion.

The fix resolves the following tracker.

https://tracker.ceph.com/issues/67524

Additionally, these issues are no longer recreated after the fix.
https://tracker.ceph.com/issues/68981
https://tracker.ceph.com/issues/68980

I will make a couple runs with KASAN and its peers, as it's not
immediately clear why these two issues are affected.

On Thu, Dec 5, 2024 at 2:02 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Thu, Dec 5, 2024 at 12:31 PM Alex Markuze <amarkuze@redhat.com> wrote:
> > This is a bad patch, I don't appreciate partial fixes that introduce
> > unnecessary complications to the code, and it conflicts with the
> > complete fix in the other thread.
>
> Alex, and I don't appreciate the unnecessary complications you
> introduce to the Ceph contribution process!
>
> The mistake you made in your first review ("will end badly") is not a
> big deal; happens to everybody - but you still don't admit the mistake
> and you ghosted me for a week. But then saying you don't appreciate
> the work of somebody who found a bug and posted a simple fix is not
> good communication. You can say you prefer a different patch and
> explain the technical reasons; but saying you don't appreciate it is
> quite condescending.
>
> Now back to the technical facts:
>
> - What exactly about my patch is "bad"?
> - Do you believe my patch is not strictly an improvement?
> - Why do you believe my fix is only "partial"?
> - What unnecessary complications are introduced by my two-line patch
> in your opinion?
> - What "other thread"? I can't find anything on LKML and ceph-devel.
>
> Max
>
Max Kellermann Dec. 5, 2024, 12:22 p.m. UTC | #6
On Thu, Dec 5, 2024 at 1:02 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> - What "other thread"? I can't find anything on LKML and ceph-devel.

Just in case you mean this patch authored by you:
https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff41272abab1

I don't think that's a good patch, and if I had the power, I would
certainly reject it, because:

- it's big and confusing; hard to review
- it's badly documented; the commit message is just "fixing a race
condition when a file shrinks" but other than that, doesn't explain
anything; a proper explanation is necessary for such a complicated
diff
- the patch changes many distinct things in one patch, but it should
really be split
- this patch is about the buffer overflow for which my patch is much
simpler: https://lore.kernel.org/lkml/20241127212130.2704804-1-max.kellermann@ionos.com/
which I suggested merging instead of all the other candicate patches
https://lore.kernel.org/lkml/CAKPOu+9kdcjMf36bF3HAW4K8v0mHxXQX3_oQfGSshmXBKtS43A@mail.gmail.com/
but you did not reply (as usual, sigh!)
- deeply hidden in this patch is also a fix for the memory leak, but
instead of making one large patch which fixes everything, you should
first merge my trivial leak bug fix and then the fix for the buffer
overflow on top
Max Kellermann Dec. 5, 2024, 12:32 p.m. UTC | #7
On Thu, Dec 5, 2024 at 1:17 PM Alex Markuze <amarkuze@redhat.com> wrote:
>
> The full fix is now in the testing branch.

Alex, yet again, you did not reply to any of my questions! This is tiring.

The full fix may be a "full" fix for something else, and just happens
to mix in several other unrelated things, e.g. a fix for the memory
leak bug I found. That is what I call "bad".

My patch is not a "partial" fix. It is a full fix for the memory leak
bug. It just doesn't fix anything else, but that's how it's supposed
to be (and certainly not "bad"): a tiny patch that can be reviewed
easily that addresses just one issue. That's the opposite of
"complications".

> Max, please follow the mailing list, I posted the patch last week on
> the initial thread regarding this issue. Please, comment on the
> correct thread, having two threads regarding the same issue introduces
> unnecessary confusion.

But... THIS is the initial thread regarding this issue (= memory leak).
It is you who creates unnecessary confusion: with your emails and with
your code.
Alex Markuze Dec. 5, 2024, 12:57 p.m. UTC | #8
I will explain the process for ceph client patches. It's important to
note: The process itself and part of the automation is still evolving
and so many things have to be done manually.

1. A patch is created and pushed into a local testing branch and
tested with xfstests on a kernel compiled with KASAN, several  static
analysis tools are run as well.
2. The local testing branch is merged with the `testing` branch which
implies that all sepia labs teutology tests run on this new kernel --
so care must be taken before pushing.
3. The patch is reviewed on the mailing list by the community, when
acked Ilya takes it to the main branch and eventually to the upstream
kernel.

I will break it up into three separate patches, as it addresses three
different issues. So that's a good comment. Any other constructive
comments will be appreciated, both regarding the patch and the
process.
I didn't send an RFC yet; I need to understand how the other two
issues were fixed as well, I will send an RFC when I'm done, as I need
to make sure all root causes are fixed. I prefer fixing things once.

On Thu, Dec 5, 2024 at 2:22 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Thu, Dec 5, 2024 at 1:02 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> > - What "other thread"? I can't find anything on LKML and ceph-devel.
>
> Just in case you mean this patch authored by you:
> https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff41272abab1
>
> I don't think that's a good patch, and if I had the power, I would
> certainly reject it, because:
>
> - it's big and confusing; hard to review
> - it's badly documented; the commit message is just "fixing a race
> condition when a file shrinks" but other than that, doesn't explain
> anything; a proper explanation is necessary for such a complicated
> diff
> - the patch changes many distinct things in one patch, but it should
> really be split
> - this patch is about the buffer overflow for which my patch is much
> simpler: https://lore.kernel.org/lkml/20241127212130.2704804-1-max.kellermann@ionos.com/
> which I suggested merging instead of all the other candicate patches
> https://lore.kernel.org/lkml/CAKPOu+9kdcjMf36bF3HAW4K8v0mHxXQX3_oQfGSshmXBKtS43A@mail.gmail.com/
> but you did not reply (as usual, sigh!)
> - deeply hidden in this patch is also a fix for the memory leak, but
> instead of making one large patch which fixes everything, you should
> first merge my trivial leak bug fix and then the fix for the buffer
> overflow on top
>
Max Kellermann Dec. 5, 2024, 1:08 p.m. UTC | #9
On Thu, Dec 5, 2024 at 1:57 PM Alex Markuze <amarkuze@redhat.com> wrote:
>
> I will explain the process for ceph client patches. It's important to
> note: The process itself and part of the automation is still evolving
> and so many things have to be done manually.

None of this answers any of my questions on your negative review comments.

> I will break it up into three separate patches, as it addresses three
> different issues.

... one of which will be my patch.
Ilya Dryomov Dec. 5, 2024, 3:12 p.m. UTC | #10
On Thu, Dec 5, 2024 at 2:08 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Thu, Dec 5, 2024 at 1:57 PM Alex Markuze <amarkuze@redhat.com> wrote:
> >
> > I will explain the process for ceph client patches. It's important to
> > note: The process itself and part of the automation is still evolving
> > and so many things have to be done manually.
>
> None of this answers any of my questions on your negative review comments.
>
> > I will break it up into three separate patches, as it addresses three
> > different issues.
>
> ... one of which will be my patch.

It looks like Alex's preference is to address these memory leaks by
passing true for own_pages to osd_req_op_extent_osd_data_pages() and
adjusting where the OSD request is put accordingly -- this is what he
folded in his WIP patch which was posted in Luis' thread [1] the day
after this patch and pushed to the testing branch earlier today [2].
Unfortunately an update to this thread was missed, leaving everyone
including myself confused.

Max, would you be willing to redo this patch to pass true for own_pages
and post a v2?  There is nothing "bad", "partial" or otherwise wrong
with this version, but having the pages be taken care of automatically
is a bit nicer and a conflict with Alex's ongoing work would be avoided.

[1] https://lore.kernel.org/ceph-devel/CAO8a2ShzHuTizjY+T+RNr28XLGOJPNoXJf1rQUburMBVwJywMA@mail.gmail.com/
[2] https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff41272abab1

Thanks,

                Ilya
Max Kellermann Dec. 5, 2024, 3:37 p.m. UTC | #11
On Thu, Dec 5, 2024 at 4:12 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> Max, would you be willing to redo this patch to pass true for own_pages
> and post a v2?  There is nothing "bad", "partial" or otherwise wrong
> with this version, but having the pages be taken care of automatically
> is a bit nicer and a conflict with Alex's ongoing work would be avoided.

Yes, I will send a patch for this. Even though I don't agree that this
is the best way forward; I'd prefer to fix the leak with a minimal
patch adding only the necessary calls, and not mix this with code
refactoring. A minimal fix is easier to review. Mixing a bug fix with
refactoring is more dangerous, which is important to avoid, because
this patch will be put in all stable branches. But ... if you want to,
I'll do that.

btw. Alex's patch
(https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae4)
introduces another memory leak: by postponing the
ceph_osdc_put_request() call, the -EFAULT branch now leaks the
ceph_osd_request object. I'll take care not to make the same mistake
in my v2.
Max Kellermann Dec. 5, 2024, 3:42 p.m. UTC | #12
On Thu, Dec 5, 2024 at 4:37 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> btw. Alex's patch
> (https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae4)
> introduces another memory leak

Sorry, that was wrong. The "break" only exits the inner "while" loop.
I got confused by nested loops.
Ilya Dryomov Dec. 6, 2024, 12:50 p.m. UTC | #13
On Thu, Dec 5, 2024 at 5:30 PM Alex Markuze <amarkuze@redhat.com> wrote:
>
> Good.
> This sequence has not been tested independently, but it should be fine.

Applied.

Thanks,

                Ilya

>
> On Thu, Dec 5, 2024 at 5:49 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> >
> > In two `break` statements, the call to ceph_release_page_vector() was
> > missing, leaking the allocation from ceph_alloc_page_vector().
> >
> > Instead of adding the missing ceph_release_page_vector() calls, the
> > Ceph maintainers preferred to transfer page ownership to the
> > `ceph_osd_request` by passing `own_pages=true` to
> > osd_req_op_extent_osd_data_pages().  This requires postponing the
> > ceph_osdc_put_request() call until after the block that accesses the
> > `pages`.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> > ---
> >  fs/ceph/file.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 4b8d59ebda00..ce342a5d4b8b 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1127,7 +1127,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >
> >                 osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
> >                                                  offset_in_page(read_off),
> > -                                                false, false);
> > +                                                false, true);
> >
> >                 op = &req->r_ops[0];
> >                 if (sparse) {
> > @@ -1186,8 +1186,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >                         ret = min_t(ssize_t, fret, len);
> >                 }
> >
> > -               ceph_osdc_put_request(req);
> > -
> >                 /* Short read but not EOF? Zero out the remainder. */
> >                 if (ret >= 0 && ret < len && (off + ret < i_size)) {
> >                         int zlen = min(len - ret, i_size - off - ret);
> > @@ -1221,7 +1219,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >                                 break;
> >                         }
> >                 }
> > -               ceph_release_page_vector(pages, num_pages);
> > +
> > +               ceph_osdc_put_request(req);
> >
> >                 if (ret < 0) {
> >                         if (ret == -EBLOCKLISTED)
> > --
> > 2.45.2
> >
>
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..24d0f1cc9aac 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1134,6 +1134,7 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 			extent_cnt = __ceph_sparse_read_ext_count(inode, read_len);
 			ret = ceph_alloc_sparse_ext_map(op, extent_cnt);
 			if (ret) {
+				ceph_release_page_vector(pages, num_pages);
 				ceph_osdc_put_request(req);
 				break;
 			}
@@ -1168,6 +1169,7 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 					op->extent.sparse_ext_cnt);
 			if (fret < 0) {
 				ret = fret;
+				ceph_release_page_vector(pages, num_pages);
 				ceph_osdc_put_request(req);
 				break;
 			}