linux-generic: odp_ticketlock.c: performance regression

Message ID	1417440841-8666-1-git-send-email-ola.liljedahl@linaro.org
State	New
Headers	show Return-Path: <patchwork-forward+bncBDK5FOHVXUMBBWO46GRQKGQECXMJC4I@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.49 as permitted sender) client-ip=209.85.215.49; Received-SPF: none (google.com: lng-odp-bounces@lists.linaro.org does not designate permitted sender hosts) client-ip=54.225.227.206; From: Ola Liljedahl <ola.liljedahl@linaro.org> To: lng-odp@lists.linaro.org Date: Mon, 1 Dec 2014 14:34:01 +0100 Message-Id: <1417440841-8666-1-git-send-email-ola.liljedahl@linaro.org> Subject: [lng-odp] [PATCH] linux-generic: odp_ticketlock.c: performance regression Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lng-odp-bounces@lists.linaro.org Sender: lng-odp-bounces@lists.linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org

Ola Liljedahl Dec. 1, 2014, 1:34 p.m. UTC

Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
---
Replaced an atomic RMW add with separate load, add and store operations.
This avoids generating a "locked" instruction on x86 which implies unnecessary
strong memory ordering and improves performance. This change could also prove
beneficial on other architectures.

 platform/linux-generic/odp_ticketlock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Ola Liljedahl Dec. 3, 2014, 2:20 p.m. UTC | #1

Ping!

On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> ---
> Replaced an atomic RMW add with separate load, add and store operations.
> This avoids generating a "locked" instruction on x86 which implies unnecessary
> strong memory ordering and improves performance. This change could also prove
> beneficial on other architectures.
>
>  platform/linux-generic/odp_ticketlock.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/platform/linux-generic/odp_ticketlock.c b/platform/linux-generic/odp_ticketlock.c
> index 6c5e74e..1e67ff5 100644
> --- a/platform/linux-generic/odp_ticketlock.c
> +++ b/platform/linux-generic/odp_ticketlock.c
> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t *ticketlock)
>
>  void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>  {
> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1, _ODP_MEMMODEL_RLS);
> +       uint32_t cur = _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
> +                                              _ODP_MEMMODEL_RLX);
> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
> +                                _ODP_MEMMODEL_RLS);
>
>  #if defined __OCTEON__
>         odp_sync_stores(); /* SYNCW to flush write buffer */
> --
> 1.9.1
>

Maxim Uvarov Dec. 3, 2014, 5:16 p.m. UTC | #2

On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
> Ping!

Needed more review for this.
>
> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> Replaced an atomic RMW add with separate load, add and store operations.
>> This avoids generating a "locked" instruction on x86 which implies unnecessary
>> strong memory ordering and improves performance. This change could also prove
>> beneficial on other architectures.
>>
>>   platform/linux-generic/odp_ticketlock.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/platform/linux-generic/odp_ticketlock.c b/platform/linux-generic/odp_ticketlock.c
>> index 6c5e74e..1e67ff5 100644
>> --- a/platform/linux-generic/odp_ticketlock.c
>> +++ b/platform/linux-generic/odp_ticketlock.c
>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t *ticketlock)
>>
>>   void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>>   {
>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1, _ODP_MEMMODEL_RLS);
>> +       uint32_t cur = _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>> +                                              _ODP_MEMMODEL_RLX);
>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>> +                                _ODP_MEMMODEL_RLS);
>>
Isn't this code racy? Some threads can read cur at the same time, then 
delay (interrupt or whatever) then do store.
And as result store will have wrong cut_ticket value.

Maxim.

>>   #if defined __OCTEON__
>>          odp_sync_stores(); /* SYNCW to flush write buffer */
>> --
>> 1.9.1
>>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

Ola Liljedahl Dec. 3, 2014, 8:07 p.m. UTC | #3

On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
<Mario.TorrecillasRodriguez@arm.com> wrote:
> I don¹t think cur_ticket will be read/written by more than one thread
> simultaneously, only the one which owns the lock will attempt to update it.
Exactly.

Do I take that as a review Mario?

>
> Mario.
>
> On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org> wrote:
>
>>On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
>>> Ping!
>>
>>Needed more review for this.
>>>
>>> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>wrote:
>>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> ---
>>>> Replaced an atomic RMW add with separate load, add and store
>>>>operations.
>>>> This avoids generating a "locked" instruction on x86 which implies
>>>>unnecessary
>>>> strong memory ordering and improves performance. This change could
>>>>also prove
>>>> beneficial on other architectures.
>>>>
>>>>   platform/linux-generic/odp_ticketlock.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/platform/linux-generic/odp_ticketlock.c
>>>>b/platform/linux-generic/odp_ticketlock.c
>>>> index 6c5e74e..1e67ff5 100644
>>>> --- a/platform/linux-generic/odp_ticketlock.c
>>>> +++ b/platform/linux-generic/odp_ticketlock.c
>>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
>>>>*ticketlock)
>>>>
>>>>   void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>>>>   {
>>>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
>>>>_ODP_MEMMODEL_RLS);
>>>> +       uint32_t cur = _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>>>> +                                              _ODP_MEMMODEL_RLX);
>>>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>>>> +                                _ODP_MEMMODEL_RLS);
>>>>
>>Isn't this code racy? Some threads can read cur at the same time, then
>>delay (interrupt or whatever) then do store.
>>And as result store will have wrong cut_ticket value.
>>
>>Maxim.
>>
>>>>   #if defined __OCTEON__
>>>>          odp_sync_stores(); /* SYNCW to flush write buffer */
>>>> --
>>>> 1.9.1
>>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>_______________________________________________
>>lng-odp mailing list
>>lng-odp@lists.linaro.org
>>http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

Bill Fischofer Dec. 3, 2014, 8:12 p.m. UTC | #4

Agreed, that correct programming would only have the owner attempt to
unlock, but what about other threads simultaneously attempting to lock
while the owner is unlocking?  Maybe I don't fully appreciate the memmodel
semantics, but there can be arbitrary delays between any two instructions
executed by a single core due to interrupts.

On Wed, Dec 3, 2014 at 2:07 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
wrote:

> On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
> <Mario.TorrecillasRodriguez@arm.com> wrote:
> > I don¹t think cur_ticket will be read/written by more than one thread
> > simultaneously, only the one which owns the lock will attempt to update
> it.
> Exactly.
>
> Do I take that as a review Mario?
>
> >
> > Mario.
> >
> > On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org> wrote:
> >
> >>On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
> >>> Ping!
> >>
> >>Needed more review for this.
> >>>
> >>> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org>
> >>>wrote:
> >>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> >>>> ---
> >>>> Replaced an atomic RMW add with separate load, add and store
> >>>>operations.
> >>>> This avoids generating a "locked" instruction on x86 which implies
> >>>>unnecessary
> >>>> strong memory ordering and improves performance. This change could
> >>>>also prove
> >>>> beneficial on other architectures.
> >>>>
> >>>>   platform/linux-generic/odp_ticketlock.c | 5 ++++-
> >>>>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/platform/linux-generic/odp_ticketlock.c
> >>>>b/platform/linux-generic/odp_ticketlock.c
> >>>> index 6c5e74e..1e67ff5 100644
> >>>> --- a/platform/linux-generic/odp_ticketlock.c
> >>>> +++ b/platform/linux-generic/odp_ticketlock.c
> >>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
> >>>>*ticketlock)
> >>>>
> >>>>   void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
> >>>>   {
> >>>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
> >>>>_ODP_MEMMODEL_RLS);
> >>>> +       uint32_t cur =
> _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
> >>>> +                                              _ODP_MEMMODEL_RLX);
> >>>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
> >>>> +                                _ODP_MEMMODEL_RLS);
> >>>>
> >>Isn't this code racy? Some threads can read cur at the same time, then
> >>delay (interrupt or whatever) then do store.
> >>And as result store will have wrong cut_ticket value.
> >>
> >>Maxim.
> >>
> >>>>   #if defined __OCTEON__
> >>>>          odp_sync_stores(); /* SYNCW to flush write buffer */
> >>>> --
> >>>> 1.9.1
> >>>>
> >>> _______________________________________________
> >>> lng-odp mailing list
> >>> lng-odp@lists.linaro.org
> >>> http://lists.linaro.org/mailman/listinfo/lng-odp
> >>
> >>
> >>_______________________________________________
> >>lng-odp mailing list
> >>lng-odp@lists.linaro.org
> >>http://lists.linaro.org/mailman/listinfo/lng-odp
> >>
> >
> >
> >
> >
> > _______________________________________________
> > lng-odp mailing list
> > lng-odp@lists.linaro.org
> > http://lists.linaro.org/mailman/listinfo/lng-odp
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

Ola Liljedahl Dec. 3, 2014, 10:45 p.m. UTC | #5

Thanks Mario,

I actually intend to post an updated patch because Maxim's questions
are an indication that it is not possible to immediately understand
the code and why it works. The comments could be better (they were
pretty non-existent to start with).

-- Ola

On 3 December 2014 at 21:42, Mario Torrecillas Rodríguez
<Mario.TorrecillasRodriguez@arm.com> wrote:
> Hi Bill,
>
> cur_ticket, in this implementation in particular at least, is only updated
> during the unlock operation (this, plus the fact that there is only one
> owner, is the reason atomicity is not required for cur_ticket).
> Threads attempting to acquire the lock will (atomically) increment
> next_ticket, and then spin on cur_ticket waiting for it to be equal to the
> ticket they grabbed.
> The only thing that will happen if an interrupt occurs between those two
> lines of code is that threads spinning will have to wait a bit longer, but
> this is true for any type of lock.
>
> I don’t see a potential risk of races myself so, Ola, you can take this as a
> review, unless there’s something I’m missing!
>
> Mario.
>
> From: Bill Fischofer <bill.fischofer@linaro.org>
> Date: Wednesday 3 December 2014 20:12
> To: Ola Liljedahl <ola.liljedahl@linaro.org>
> Cc: Mario Torrecillas Rodríguez <Mario.TorrecillasRodriguez@arm.com>,
> "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
> Subject: Re: [lng-odp] [PATCH] linux-generic: odp_ticketlock.c: performance
> regression
>
> Agreed, that correct programming would only have the owner attempt to
> unlock, but what about other threads simultaneously attempting to lock while
> the owner is unlocking?  Maybe I don't fully appreciate the memmodel
> semantics, but there can be arbitrary delays between any two instructions
> executed by a single core due to interrupts.
>
> On Wed, Dec 3, 2014 at 2:07 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
> wrote:
>>
>> On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
>> <Mario.TorrecillasRodriguez@arm.com> wrote:
>> > I don¹t think cur_ticket will be read/written by more than one thread
>> > simultaneously, only the one which owns the lock will attempt to update
>> > it.
>> Exactly.
>>
>> Do I take that as a review Mario?
>>
>> >
>> > Mario.
>> >
>> > On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org> wrote:
>> >
>> >>On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
>> >>> Ping!
>> >>
>> >>Needed more review for this.
>> >>>
>> >>> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org>
>> >>>wrote:
>> >>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> >>>> ---
>> >>>> Replaced an atomic RMW add with separate load, add and store
>> >>>>operations.
>> >>>> This avoids generating a "locked" instruction on x86 which implies
>> >>>>unnecessary
>> >>>> strong memory ordering and improves performance. This change could
>> >>>>also prove
>> >>>> beneficial on other architectures.
>> >>>>
>> >>>>   platform/linux-generic/odp_ticketlock.c | 5 ++++-
>> >>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>> >>>>
>> >>>> diff --git a/platform/linux-generic/odp_ticketlock.c
>> >>>>b/platform/linux-generic/odp_ticketlock.c
>> >>>> index 6c5e74e..1e67ff5 100644
>> >>>> --- a/platform/linux-generic/odp_ticketlock.c
>> >>>> +++ b/platform/linux-generic/odp_ticketlock.c
>> >>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
>> >>>>*ticketlock)
>> >>>>
>> >>>>   void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>> >>>>   {
>> >>>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
>> >>>>_ODP_MEMMODEL_RLS);
>> >>>> +       uint32_t cur =
>> >>>> _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>> >>>> +                                              _ODP_MEMMODEL_RLX);
>> >>>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>> >>>> +                                _ODP_MEMMODEL_RLS);
>> >>>>
>> >>Isn't this code racy? Some threads can read cur at the same time, then
>> >>delay (interrupt or whatever) then do store.
>> >>And as result store will have wrong cut_ticket value.
>> >>
>> >>Maxim.
>> >>
>> >>>>   #if defined __OCTEON__
>> >>>>          odp_sync_stores(); /* SYNCW to flush write buffer */
>> >>>> --
>> >>>> 1.9.1
>> >>>>
>> >>> _______________________________________________
>> >>> lng-odp mailing list
>> >>> lng-odp@lists.linaro.org
>> >>> http://lists.linaro.org/mailman/listinfo/lng-odp
>> >>
>> >>
>> >>_______________________________________________
>> >>lng-odp mailing list
>> >>lng-odp@lists.linaro.org
>> >>http://lists.linaro.org/mailman/listinfo/lng-odp
>> >>
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > lng-odp mailing list
>> > lng-odp@lists.linaro.org
>> > http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Bill Fischofer Dec. 3, 2014, 10:53 p.m. UTC | #6

Thanks for the clear explanation.

On Wed, Dec 3, 2014 at 2:42 PM, Mario Torrecillas Rodríguez <
Mario.TorrecillasRodriguez@arm.com> wrote:

> Hi Bill,
>
> *cur_ticket*, in this implementation in particular at least, is only
> updated during the unlock operation (this, plus the fact that there is only
> one owner, is the reason atomicity is not required for *cur_ticket*).
> Threads attempting to acquire the lock will (atomically) increment
> *next_ticket*, and then spin on *cur_ticket* waiting for it to be equal
> to the ticket they grabbed.
> The only thing that will happen if an interrupt occurs between those two
> lines of code is that threads spinning will have to wait a bit longer, but
> this is true for any type of lock.
>
> I don’t see a potential risk of races myself so, *Ola*, you can take this
> as a review, unless there’s something I’m missing!
>
> Mario.
>
> From: Bill Fischofer <bill.fischofer@linaro.org>
> Date: Wednesday 3 December 2014 20:12
> To: Ola Liljedahl <ola.liljedahl@linaro.org>
> Cc: Mario Torrecillas Rodríguez <Mario.TorrecillasRodriguez@arm.com>, "
> lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
> Subject: Re: [lng-odp] [PATCH] linux-generic: odp_ticketlock.c:
> performance regression
>
> Agreed, that correct programming would only have the owner attempt to
> unlock, but what about other threads simultaneously attempting to lock
> while the owner is unlocking?  Maybe I don't fully appreciate the memmodel
> semantics, but there can be arbitrary delays between any two instructions
> executed by a single core due to interrupts.
>
> On Wed, Dec 3, 2014 at 2:07 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
> wrote:
>
>> On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
>> <Mario.TorrecillasRodriguez@arm.com> wrote:
>> > I don¹t think cur_ticket will be read/written by more than one thread
>> > simultaneously, only the one which owns the lock will attempt to update
>> it.
>> Exactly.
>>
>> Do I take that as a review Mario?
>>
>> >
>> > Mario.
>> >
>> > On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org> wrote:
>> >
>> >>On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
>> >>> Ping!
>> >>
>> >>Needed more review for this.
>> >>>
>> >>> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org>
>> >>>wrote:
>> >>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> >>>> ---
>> >>>> Replaced an atomic RMW add with separate load, add and store
>> >>>>operations.
>> >>>> This avoids generating a "locked" instruction on x86 which implies
>> >>>>unnecessary
>> >>>> strong memory ordering and improves performance. This change could
>> >>>>also prove
>> >>>> beneficial on other architectures.
>> >>>>
>> >>>>   platform/linux-generic/odp_ticketlock.c | 5 ++++-
>> >>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>> >>>>
>> >>>> diff --git a/platform/linux-generic/odp_ticketlock.c
>> >>>>b/platform/linux-generic/odp_ticketlock.c
>> >>>> index 6c5e74e..1e67ff5 100644
>> >>>> --- a/platform/linux-generic/odp_ticketlock.c
>> >>>> +++ b/platform/linux-generic/odp_ticketlock.c
>> >>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
>> >>>>*ticketlock)
>> >>>>
>> >>>>   void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>> >>>>   {
>> >>>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
>> >>>>_ODP_MEMMODEL_RLS);
>> >>>> +       uint32_t cur =
>> _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>> >>>> +                                              _ODP_MEMMODEL_RLX);
>> >>>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>> >>>> +                                _ODP_MEMMODEL_RLS);
>> >>>>
>> >>Isn't this code racy? Some threads can read cur at the same time, then
>> >>delay (interrupt or whatever) then do store.
>> >>And as result store will have wrong cut_ticket value.
>> >>
>> >>Maxim.
>> >>
>> >>>>   #if defined __OCTEON__
>> >>>>          odp_sync_stores(); /* SYNCW to flush write buffer */
>> >>>> --
>> >>>> 1.9.1
>> >>>>
>> >>> _______________________________________________
>> >>> lng-odp mailing list
>> >>> lng-odp@lists.linaro.org
>> >>> http://lists.linaro.org/mailman/listinfo/lng-odp
>> >>
>> >>
>> >>_______________________________________________
>> >>lng-odp mailing list
>> >>lng-odp@lists.linaro.org
>> >>http://lists.linaro.org/mailman/listinfo/lng-odp
>> >>
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > lng-odp mailing list
>> > lng-odp@lists.linaro.org
>> > http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>

Maxim Uvarov Dec. 4, 2014, 12:05 p.m. UTC | #7

Ok, agree it only in _unlock. Should be safe code.

Please include this explanation to git log for record.

Maxim.

On 12/04/2014 01:53 AM, Bill Fischofer wrote:
> Thanks for the clear explanation.
>
> On Wed, Dec 3, 2014 at 2:42 PM, Mario Torrecillas Rodríguez 
> <Mario.TorrecillasRodriguez@arm.com 
> <mailto:Mario.TorrecillasRodriguez@arm.com>> wrote:
>
>     Hi Bill,
>
>     /cur_ticket/, in this implementation in particular at least, is
>     only updated during the unlock operation (this, plus the fact that
>     there is only one owner, is the reason atomicity is not required
>     for /cur_ticket/).
>     Threads attempting to acquire the lock will (atomically) increment
>     /next_ticket/, and then spin on /cur_ticket/ waiting for it to be
>     equal to the ticket they grabbed.
>     The only thing that will happen if an interrupt occurs between
>     those two lines of code is that threads spinning will have to wait
>     a bit longer, but this is true for any type of lock.
>
>     I don’t see a potential risk of races myself so, *Ola*, you can
>     take this as a review, unless there’s something I’m missing!
>
>     Mario.
>
>     From: Bill Fischofer <bill.fischofer@linaro.org
>     <mailto:bill.fischofer@linaro.org>>
>     Date: Wednesday 3 December 2014 20:12
>     To: Ola Liljedahl <ola.liljedahl@linaro.org
>     <mailto:ola.liljedahl@linaro.org>>
>     Cc: Mario Torrecillas Rodríguez
>     <Mario.TorrecillasRodriguez@arm.com
>     <mailto:Mario.TorrecillasRodriguez@arm.com>>,
>     "lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>"
>     <lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>>
>     Subject: Re: [lng-odp] [PATCH] linux-generic: odp_ticketlock.c:
>     performance regression
>
>     Agreed, that correct programming would only have the owner attempt
>     to unlock, but what about other threads simultaneously attempting
>     to lock while the owner is unlocking? Maybe I don't fully
>     appreciate the memmodel semantics, but there can be arbitrary
>     delays between any two instructions executed by a single core due
>     to interrupts.
>
>     On Wed, Dec 3, 2014 at 2:07 PM, Ola Liljedahl
>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote:
>
>         On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
>         <Mario.TorrecillasRodriguez@arm.com
>         <mailto:Mario.TorrecillasRodriguez@arm.com>> wrote:
>         > I don¹t think cur_ticket will be read/written by more than
>         one thread
>         > simultaneously, only the one which owns the lock will
>         attempt to update it.
>         Exactly.
>
>         Do I take that as a review Mario?
>
>         >
>         > Mario.
>         >
>         > On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org
>         <mailto:maxim.uvarov@linaro.org>> wrote:
>         >
>         >>On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
>         >>> Ping!
>         >>
>         >>Needed more review for this.
>         >>>
>         >>> On 1 December 2014 at 14:34, Ola Liljedahl
>         <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>>
>         >>>wrote:
>         >>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org
>         <mailto:ola.liljedahl@linaro.org>>
>         >>>> ---
>         >>>> Replaced an atomic RMW add with separate load, add and store
>         >>>>operations.
>         >>>> This avoids generating a "locked" instruction on x86
>         which implies
>         >>>>unnecessary
>         >>>> strong memory ordering and improves performance. This
>         change could
>         >>>>also prove
>         >>>> beneficial on other architectures.
>         >>>>
>         >>>> platform/linux-generic/odp_ticketlock.c | 5 ++++-
>         >>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>         >>>>
>         >>>> diff --git a/platform/linux-generic/odp_ticketlock.c
>         >>>>b/platform/linux-generic/odp_ticketlock.c
>         >>>> index 6c5e74e..1e67ff5 100644
>         >>>> --- a/platform/linux-generic/odp_ticketlock.c
>         >>>> +++ b/platform/linux-generic/odp_ticketlock.c
>         >>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
>         >>>>*ticketlock)
>         >>>>
>         >>>> void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>         >>>> {
>         >>>> - _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
>         >>>>_ODP_MEMMODEL_RLS);
>         >>>> + uint32_t cur =
>         _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>         >>>> + _ODP_MEMMODEL_RLX);
>         >>>> + _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>         >>>> + _ODP_MEMMODEL_RLS);
>         >>>>
>         >>Isn't this code racy? Some threads can read cur at the same
>         time, then
>         >>delay (interrupt or whatever) then do store.
>         >>And as result store will have wrong cut_ticket value.
>         >>
>         >>Maxim.
>         >>
>         >>>> #if defined __OCTEON__
>         >>>> odp_sync_stores(); /* SYNCW to flush write buffer */
>         >>>> --
>         >>>> 1.9.1
>         >>>>
>         >>> _______________________________________________
>         >>> lng-odp mailing list
>         >>> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         >>> http://lists.linaro.org/mailman/listinfo/lng-odp
>         >>
>         >>
>         >>_______________________________________________
>         >>lng-odp mailing list
>         >>lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         >>http://lists.linaro.org/mailman/listinfo/lng-odp
>         >>
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > lng-odp mailing list
>         > lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         > http://lists.linaro.org/mailman/listinfo/lng-odp
>
>         _______________________________________________
>         lng-odp mailing list
>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         http://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

Maxim Uvarov Dec. 4, 2014, 12:06 p.m. UTC | #8

On 12/04/2014 01:45 AM, Ola Liljedahl wrote:
> Thanks Mario,
>
> I actually intend to post an updated patch because Maxim's questions
> are an indication that it is not possible to immediately understand
> the code and why it works. The comments could be better (they were
> pretty non-existent to start with).
>
> -- Ola

thanks, red this after send previous email about updating git comment.

Maxim.
>
> On 3 December 2014 at 21:42, Mario Torrecillas Rodríguez
> <Mario.TorrecillasRodriguez@arm.com> wrote:
>> Hi Bill,
>>
>> cur_ticket, in this implementation in particular at least, is only updated
>> during the unlock operation (this, plus the fact that there is only one
>> owner, is the reason atomicity is not required for cur_ticket).
>> Threads attempting to acquire the lock will (atomically) increment
>> next_ticket, and then spin on cur_ticket waiting for it to be equal to the
>> ticket they grabbed.
>> The only thing that will happen if an interrupt occurs between those two
>> lines of code is that threads spinning will have to wait a bit longer, but
>> this is true for any type of lock.
>>
>> I don’t see a potential risk of races myself so, Ola, you can take this as a
>> review, unless there’s something I’m missing!
>>
>> Mario.
>>
>> From: Bill Fischofer <bill.fischofer@linaro.org>
>> Date: Wednesday 3 December 2014 20:12
>> To: Ola Liljedahl <ola.liljedahl@linaro.org>
>> Cc: Mario Torrecillas Rodríguez <Mario.TorrecillasRodriguez@arm.com>,
>> "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
>> Subject: Re: [lng-odp] [PATCH] linux-generic: odp_ticketlock.c: performance
>> regression
>>
>> Agreed, that correct programming would only have the owner attempt to
>> unlock, but what about other threads simultaneously attempting to lock while
>> the owner is unlocking?  Maybe I don't fully appreciate the memmodel
>> semantics, but there can be arbitrary delays between any two instructions
>> executed by a single core due to interrupts.
>>
>> On Wed, Dec 3, 2014 at 2:07 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
>> wrote:
>>> On 3 December 2014 at 18:29, Mario Torrecillas Rodriguez
>>> <Mario.TorrecillasRodriguez@arm.com> wrote:
>>>> I don¹t think cur_ticket will be read/written by more than one thread
>>>> simultaneously, only the one which owns the lock will attempt to update
>>>> it.
>>> Exactly.
>>>
>>> Do I take that as a review Mario?
>>>
>>>> Mario.
>>>>
>>>> On 03/12/2014 17:16, "Maxim Uvarov" <maxim.uvarov@linaro.org> wrote:
>>>>
>>>>> On 12/03/2014 05:20 PM, Ola Liljedahl wrote:
>>>>>> Ping!
>>>>> Needed more review for this.
>>>>>> On 1 December 2014 at 14:34, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>> wrote:
>>>>>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>> ---
>>>>>>> Replaced an atomic RMW add with separate load, add and store
>>>>>>> operations.
>>>>>>> This avoids generating a "locked" instruction on x86 which implies
>>>>>>> unnecessary
>>>>>>> strong memory ordering and improves performance. This change could
>>>>>>> also prove
>>>>>>> beneficial on other architectures.
>>>>>>>
>>>>>>>    platform/linux-generic/odp_ticketlock.c | 5 ++++-
>>>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/platform/linux-generic/odp_ticketlock.c
>>>>>>> b/platform/linux-generic/odp_ticketlock.c
>>>>>>> index 6c5e74e..1e67ff5 100644
>>>>>>> --- a/platform/linux-generic/odp_ticketlock.c
>>>>>>> +++ b/platform/linux-generic/odp_ticketlock.c
>>>>>>> @@ -32,7 +32,10 @@ void odp_ticketlock_lock(odp_ticketlock_t
>>>>>>> *ticketlock)
>>>>>>>
>>>>>>>    void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock)
>>>>>>>    {
>>>>>>> -       _odp_atomic_u32_add_mm(&ticketlock->cur_ticket, 1,
>>>>>>> _ODP_MEMMODEL_RLS);
>>>>>>> +       uint32_t cur =
>>>>>>> _odp_atomic_u32_load_mm(&ticketlock->cur_ticket,
>>>>>>> +                                              _ODP_MEMMODEL_RLX);
>>>>>>> +       _odp_atomic_u32_store_mm(&ticketlock->cur_ticket, cur + 1,
>>>>>>> +                                _ODP_MEMMODEL_RLS);
>>>>>>>
>>>>> Isn't this code racy? Some threads can read cur at the same time, then
>>>>> delay (interrupt or whatever) then do store.
>>>>> And as result store will have wrong cut_ticket value.
>>>>>
>>>>> Maxim.
>>>>>
>>>>>>>    #if defined __OCTEON__
>>>>>>>           odp_sync_stores(); /* SYNCW to flush write buffer */
>>>>>>> --
>>>>>>> 1.9.1
>>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> lng-odp@lists.linaro.org
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>> _______________________________________________
>>>>> lng-odp mailing list
>>>>> lng-odp@lists.linaro.org
>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

linux-generic: odp_ticketlock.c: performance regression

Commit Message

Comments

Patch