diff mbox series

[1/3] libceph: do not decrease the data length more than once

Message ID 20231024050039.231143-2-xiubli@redhat.com
State New
Headers show
Series libceph: sparse-read misc fixes | expand

Commit Message

Xiubo Li Oct. 24, 2023, 5 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

No need to decrease the data length again if we need to read the
left data.

URL: https://tracker.ceph.com/issues/62081
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 net/ceph/messenger_v2.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Jeff Layton Oct. 30, 2023, 10:21 a.m. UTC | #1
On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> No need to decrease the data length again if we need to read the
> left data.
> 
> URL: https://tracker.ceph.com/issues/62081
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/messenger_v2.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index d09a39ff2cf0..9e3f95d5e425 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>  				bv.bv_offset = 0;
>  			}
>  			set_in_bvec(con, &bv);
> -			con->v2.data_len_remain -= bv.bv_len;
>  			return 0;
>  		}
>  	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {

It's been a while since I was in this code, but where does this get
decremented if you're removing it here?

Thanks,
Jeff Layton Oct. 30, 2023, 12:30 p.m. UTC | #2
On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
> On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > No need to decrease the data length again if we need to read the
> > left data.
> > 
> > URL: https://tracker.ceph.com/issues/62081
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >  net/ceph/messenger_v2.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> > index d09a39ff2cf0..9e3f95d5e425 100644
> > --- a/net/ceph/messenger_v2.c
> > +++ b/net/ceph/messenger_v2.c
> > @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> >  				bv.bv_offset = 0;
> >  			}
> >  			set_in_bvec(con, &bv);
> > -			con->v2.data_len_remain -= bv.bv_len;
> >  			return 0;
> >  		}
> >  	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
> 
> It's been a while since I was in this code, but where does this get
> decremented if you're removing it here?
> 

My question was a bit vague, so let me elaborate a bit:

data_len_remain should be how much unconsumed data is in the message
(IIRC). As we call prepare_sparse_read_cont multiple times, we're
consuming the message data and this gets decremented as we go.

In the above case, we're consuming the message data into the bvec, so
why shouldn't we be decrementing the remaining data by that amount?
Xiubo Li Oct. 31, 2023, 12:17 a.m. UTC | #3
On 10/30/23 20:30, Jeff Layton wrote:
> On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
>> On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> No need to decrease the data length again if we need to read the
>>> left data.
>>>
>>> URL: https://tracker.ceph.com/issues/62081
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   net/ceph/messenger_v2.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>> index d09a39ff2cf0..9e3f95d5e425 100644
>>> --- a/net/ceph/messenger_v2.c
>>> +++ b/net/ceph/messenger_v2.c
>>> @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>>   				bv.bv_offset = 0;
>>>   			}
>>>   			set_in_bvec(con, &bv);
>>> -			con->v2.data_len_remain -= bv.bv_len;
>>>   			return 0;
>>>   		}
>>>   	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
>> It's been a while since I was in this code, but where does this get
>> decremented if you're removing it here?
>>
> My question was a bit vague, so let me elaborate a bit:
>
> data_len_remain should be how much unconsumed data is in the message
> (IIRC). As we call prepare_sparse_read_cont multiple times, we're
> consuming the message data and this gets decremented as we go.
>
> In the above case, we're consuming the message data into the bvec, so
> why shouldn't we be decrementing the remaining data by that amount?

Hi Jeff,

If I didn't miss something about this. IMO we have already decreased it 
in the following two cases:

[1] 
https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2000

[2] 
https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2025

And here won't we decrease them twice ?

Thanks

- Xiubo
Jeff Layton Oct. 31, 2023, 12:23 a.m. UTC | #4
On Tue, 2023-10-31 at 08:17 +0800, Xiubo Li wrote:
> On 10/30/23 20:30, Jeff Layton wrote:
> > On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
> > > On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > No need to decrease the data length again if we need to read the
> > > > left data.
> > > > 
> > > > URL: https://tracker.ceph.com/issues/62081
> > > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > > ---
> > > >   net/ceph/messenger_v2.c | 1 -
> > > >   1 file changed, 1 deletion(-)
> > > > 
> > > > diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> > > > index d09a39ff2cf0..9e3f95d5e425 100644
> > > > --- a/net/ceph/messenger_v2.c
> > > > +++ b/net/ceph/messenger_v2.c
> > > > @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> > > >   				bv.bv_offset = 0;
> > > >   			}
> > > >   			set_in_bvec(con, &bv);
> > > > -			con->v2.data_len_remain -= bv.bv_len;
> > > >   			return 0;
> > > >   		}
> > > >   	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
> > > It's been a while since I was in this code, but where does this get
> > > decremented if you're removing it here?
> > > 
> > My question was a bit vague, so let me elaborate a bit:
> > 
> > data_len_remain should be how much unconsumed data is in the message
> > (IIRC). As we call prepare_sparse_read_cont multiple times, we're
> > consuming the message data and this gets decremented as we go.
> > 
> > In the above case, we're consuming the message data into the bvec, so
> > why shouldn't we be decrementing the remaining data by that amount?
> 
> Hi Jeff,
> 
> If I didn't miss something about this. IMO we have already decreased it 
> in the following two cases:
> 
> [1] 
> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2000
> 
> [2] 
> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2025
> 
> And here won't we decrease them twice ?
> 
> 

I don't get it. The functions returns in both of those cases just after
decrementing data_len_remain, so how can it have already decremented it?

Maybe I don't understand the bug you're trying to fix. data_len_remain
only comes into play when we need to revert. Does the problem involve a
trip through revoke_at_prepare_sparse_data()?
Xiubo Li Oct. 31, 2023, 2:04 a.m. UTC | #5
On 10/31/23 08:23, Jeff Layton wrote:
> On Tue, 2023-10-31 at 08:17 +0800, Xiubo Li wrote:
>> On 10/30/23 20:30, Jeff Layton wrote:
>>> On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
>>>> On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> No need to decrease the data length again if we need to read the
>>>>> left data.
>>>>>
>>>>> URL: https://tracker.ceph.com/issues/62081
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>    net/ceph/messenger_v2.c | 1 -
>>>>>    1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>>>> index d09a39ff2cf0..9e3f95d5e425 100644
>>>>> --- a/net/ceph/messenger_v2.c
>>>>> +++ b/net/ceph/messenger_v2.c
>>>>> @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>>>>    				bv.bv_offset = 0;
>>>>>    			}
>>>>>    			set_in_bvec(con, &bv);
>>>>> -			con->v2.data_len_remain -= bv.bv_len;
>>>>>    			return 0;
>>>>>    		}
>>>>>    	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
>>>> It's been a while since I was in this code, but where does this get
>>>> decremented if you're removing it here?
>>>>
>>> My question was a bit vague, so let me elaborate a bit:
>>>
>>> data_len_remain should be how much unconsumed data is in the message
>>> (IIRC). As we call prepare_sparse_read_cont multiple times, we're
>>> consuming the message data and this gets decremented as we go.
>>>
>>> In the above case, we're consuming the message data into the bvec, so
>>> why shouldn't we be decrementing the remaining data by that amount?
>> Hi Jeff,
>>
>> If I didn't miss something about this. IMO we have already decreased it
>> in the following two cases:
>>
>> [1]
>> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2000
>>
>> [2]
>> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2025
>>
>> And here won't we decrease them twice ?
>>
>>
> I don't get it. The functions returns in both of those cases just after
> decrementing data_len_remain, so how can it have already decremented it?
>
> Maybe I don't understand the bug you're trying to fix. data_len_remain
> only comes into play when we need to revert. Does the problem involve a
> trip through revoke_at_prepare_sparse_data()?

Such as for the first time to read the data it will trigger:

prepare_sparse_read_cont()

   --> ret = con->ops->sparse_read()

        --> cursor->sr_resid = elen;

   --> if (buf) {con->v2.data_len_remain -= ret;}   // After calling 
->sparse_read() it will decrease 'ret', which is 'elen'.

Then the msg will try to read data from the socket buffer, and if the 
data read is less than expected 'elen' then it will go to the code:

https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L1960-L1971

And then won't it decrease 'data_len_remain' twice ?

Did I misreading it the sparse read state machine ?

Thanks

- Xiubo
Jeff Layton Oct. 31, 2023, 10:17 a.m. UTC | #6
On Tue, 2023-10-31 at 10:04 +0800, Xiubo Li wrote:
> On 10/31/23 08:23, Jeff Layton wrote:
> > On Tue, 2023-10-31 at 08:17 +0800, Xiubo Li wrote:
> > > On 10/30/23 20:30, Jeff Layton wrote:
> > > > On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
> > > > > On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
> > > > > > From: Xiubo Li <xiubli@redhat.com>
> > > > > > 
> > > > > > No need to decrease the data length again if we need to read the
> > > > > > left data.
> > > > > > 
> > > > > > URL: https://tracker.ceph.com/issues/62081
> > > > > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > > > > ---
> > > > > >    net/ceph/messenger_v2.c | 1 -
> > > > > >    1 file changed, 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> > > > > > index d09a39ff2cf0..9e3f95d5e425 100644
> > > > > > --- a/net/ceph/messenger_v2.c
> > > > > > +++ b/net/ceph/messenger_v2.c
> > > > > > @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> > > > > >    				bv.bv_offset = 0;
> > > > > >    			}
> > > > > >    			set_in_bvec(con, &bv);
> > > > > > -			con->v2.data_len_remain -= bv.bv_len;
> > > > > >    			return 0;
> > > > > >    		}
> > > > > >    	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
> > > > > It's been a while since I was in this code, but where does this get
> > > > > decremented if you're removing it here?
> > > > > 
> > > > My question was a bit vague, so let me elaborate a bit:
> > > > 
> > > > data_len_remain should be how much unconsumed data is in the message
> > > > (IIRC). As we call prepare_sparse_read_cont multiple times, we're
> > > > consuming the message data and this gets decremented as we go.
> > > > 
> > > > In the above case, we're consuming the message data into the bvec, so
> > > > why shouldn't we be decrementing the remaining data by that amount?
> > > Hi Jeff,
> > > 
> > > If I didn't miss something about this. IMO we have already decreased it
> > > in the following two cases:
> > > 
> > > [1]
> > > https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2000
> > > 
> > > [2]
> > > https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2025
> > > 
> > > And here won't we decrease them twice ?
> > > 
> > > 
> > I don't get it. The functions returns in both of those cases just after
> > decrementing data_len_remain, so how can it have already decremented it?
> > 
> > Maybe I don't understand the bug you're trying to fix. data_len_remain
> > only comes into play when we need to revert. Does the problem involve a
> > trip through revoke_at_prepare_sparse_data()?
> 
> Such as for the first time to read the data it will trigger:
> 
> prepare_sparse_read_cont()
> 
>    --> ret = con->ops->sparse_read()

>         --> cursor->sr_resid = elen;
> 
> 
>    --> if (buf) {con->v2.data_len_remain -= ret;}   // After calling 
> ->sparse_read() it will decrease 'ret', which is 'elen'.
> 

> Then the msg will try to read data from the socket buffer, and if the 
> data read is less than expected 'elen' then it will go to the code:
> 
> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L1960-L1971
> 
> And then won't it decrease 'data_len_remain' twice ?
> 
> Did I misreading it the sparse read state machine ?
> 


Here's the full snippet of code around that area. In this code, we've
just received the data from "in" iter into the current bvec and have
either copied it from the bounce buffer or done the CRC for the last
bvec. Now, we're advancing the iter by the amount we've just read, and
reducing the sr_resid value (which is the residual data in the  current
extent):

                ceph_msg_data_advance(cursor, con->v2.in_bvec.bv_len);
                cursor->sr_resid -= con->v2.in_bvec.bv_len;
                dout("%s: advance by 0x%x sr_resid 0x%x\n", __func__,
                     con->v2.in_bvec.bv_len, cursor->sr_resid);
                WARN_ON_ONCE(cursor->sr_resid > cursor->total_resid);
                if (cursor->sr_resid) {

There's still some more data in this extent? Set up the next bvec for
the next receive:

                        get_bvec_at(cursor, &bv);
                        if (bv.bv_len > cursor->sr_resid)
                                bv.bv_len = cursor->sr_resid;
                        if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {
                                bv.bv_page = con->bounce_page;
                                bv.bv_offset = 0;
                        }
                        set_in_bvec(con, &bv);
                        con->v2.data_len_remain -= bv.bv_len;

...and reduce the data_len_remain for the amount of that next bvec.

                        return 0;
                }

So yeah, I think this is not being decremented twice. The code is
dealing with a different bvec at the point where data_len_remain is
reduced above and it looks correct to me.

What problem are you trying to solve with this patch?
Xiubo Li Nov. 1, 2023, 12:40 a.m. UTC | #7
On 10/31/23 18:17, Jeff Layton wrote:
> On Tue, 2023-10-31 at 10:04 +0800, Xiubo Li wrote:
>> On 10/31/23 08:23, Jeff Layton wrote:
>>> On Tue, 2023-10-31 at 08:17 +0800, Xiubo Li wrote:
>>>> On 10/30/23 20:30, Jeff Layton wrote:
>>>>> On Mon, 2023-10-30 at 06:21 -0400, Jeff Layton wrote:
>>>>>> On Tue, 2023-10-24 at 13:00 +0800, xiubli@redhat.com wrote:
>>>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>>>
>>>>>>> No need to decrease the data length again if we need to read the
>>>>>>> left data.
>>>>>>>
>>>>>>> URL: https://tracker.ceph.com/issues/62081
>>>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>>>> ---
>>>>>>>     net/ceph/messenger_v2.c | 1 -
>>>>>>>     1 file changed, 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>>>>>> index d09a39ff2cf0..9e3f95d5e425 100644
>>>>>>> --- a/net/ceph/messenger_v2.c
>>>>>>> +++ b/net/ceph/messenger_v2.c
>>>>>>> @@ -1966,7 +1966,6 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>>>>>>     				bv.bv_offset = 0;
>>>>>>>     			}
>>>>>>>     			set_in_bvec(con, &bv);
>>>>>>> -			con->v2.data_len_remain -= bv.bv_len;
>>>>>>>     			return 0;
>>>>>>>     		}
>>>>>>>     	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {
>>>>>> It's been a while since I was in this code, but where does this get
>>>>>> decremented if you're removing it here?
>>>>>>
>>>>> My question was a bit vague, so let me elaborate a bit:
>>>>>
>>>>> data_len_remain should be how much unconsumed data is in the message
>>>>> (IIRC). As we call prepare_sparse_read_cont multiple times, we're
>>>>> consuming the message data and this gets decremented as we go.
>>>>>
>>>>> In the above case, we're consuming the message data into the bvec, so
>>>>> why shouldn't we be decrementing the remaining data by that amount?
>>>> Hi Jeff,
>>>>
>>>> If I didn't miss something about this. IMO we have already decreased it
>>>> in the following two cases:
>>>>
>>>> [1]
>>>> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2000
>>>>
>>>> [2]
>>>> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L2025
>>>>
>>>> And here won't we decrease them twice ?
>>>>
>>>>
>>> I don't get it. The functions returns in both of those cases just after
>>> decrementing data_len_remain, so how can it have already decremented it?
>>>
>>> Maybe I don't understand the bug you're trying to fix. data_len_remain
>>> only comes into play when we need to revert. Does the problem involve a
>>> trip through revoke_at_prepare_sparse_data()?
>> Such as for the first time to read the data it will trigger:
>>
>> prepare_sparse_read_cont()
>>
>>     --> ret = con->ops->sparse_read()
>>          --> cursor->sr_resid = elen;
>>
>>
>>     --> if (buf) {con->v2.data_len_remain -= ret;}   // After calling
>> ->sparse_read() it will decrease 'ret', which is 'elen'.
>>
>> Then the msg will try to read data from the socket buffer, and if the
>> data read is less than expected 'elen' then it will go to the code:
>>
>> https://github.com/ceph/ceph-client/blob/for-linus/net/ceph/messenger_v2.c#L1960-L1971
>>
>> And then won't it decrease 'data_len_remain' twice ?
>>
>> Did I misreading it the sparse read state machine ?
>>
>
> Here's the full snippet of code around that area. In this code, we've
> just received the data from "in" iter into the current bvec and have
> either copied it from the bounce buffer or done the CRC for the last
> bvec. Now, we're advancing the iter by the amount we've just read, and
> reducing the sr_resid value (which is the residual data in the  current
> extent):
>
>                  ceph_msg_data_advance(cursor, con->v2.in_bvec.bv_len);
>                  cursor->sr_resid -= con->v2.in_bvec.bv_len;
>                  dout("%s: advance by 0x%x sr_resid 0x%x\n", __func__,
>                       con->v2.in_bvec.bv_len, cursor->sr_resid);
>                  WARN_ON_ONCE(cursor->sr_resid > cursor->total_resid);
>                  if (cursor->sr_resid) {
>
> There's still some more data in this extent? Set up the next bvec for
> the next receive:
>
>                          get_bvec_at(cursor, &bv);
>                          if (bv.bv_len > cursor->sr_resid)
>                                  bv.bv_len = cursor->sr_resid;
>                          if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {
>                                  bv.bv_page = con->bounce_page;
>                                  bv.bv_offset = 0;
>                          }
>                          set_in_bvec(con, &bv);
>                          con->v2.data_len_remain -= bv.bv_len;
>
> ...and reduce the data_len_remain for the amount of that next bvec.
>
>                          return 0;
>                  }
>
> So yeah, I think this is not being decremented twice. The code is
> dealing with a different bvec at the point where data_len_remain is
> reduced above and it looks correct to me.

Okay, I think I misreading the sparse-read state machine.

> What problem are you trying to solve with this patch?

Currently no any issue from this. I just added many debug logs to 
reproduce and debug the issue https://tracker.ceph.com/issues/62081, and 
found odd logs. Such as the improvment for the other two patches:

  90279 <6>[ 5483.137183] libceph: osd_sparse_read:5923 ---> 
sr->sr_datalen: 251723776, sr->sr_index: 0, count: 0

The sr_datalen is not zero when the extents array is empty.

The following is a normal log, and the extent array size is 1.

  90275 <6>[ 5477.553113] libceph: osd_sparse_read:5919 ---> 
sr->sr_datalen: 16384, sr->sr_index: 0, count: 1

For this patch I just found it when going through the code. I will 
remove it for next version.

Thanks Jeff for your reviewing.

- Xiubo
diff mbox series

Patch

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index d09a39ff2cf0..9e3f95d5e425 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1966,7 +1966,6 @@  static int prepare_sparse_read_cont(struct ceph_connection *con)
 				bv.bv_offset = 0;
 			}
 			set_in_bvec(con, &bv);
-			con->v2.data_len_remain -= bv.bv_len;
 			return 0;
 		}
 	} else if (iov_iter_is_kvec(&con->v2.in_iter)) {