mbox series

[v2,0/2] lib/hash: perf improvements for lock-free

Message ID 20190702211634.37940-1-honnappa.nagarahalli@arm.com
Headers show
Series lib/hash: perf improvements for lock-free | expand

Message

Honnappa Nagarahalli July 2, 2019, 9:16 p.m. UTC
While using the rte_hash library, there are 2 sets of stores that
happen.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The data from both of these stores is available to the readers only
after the index of the key store entry (key_index) is written in the
hash buckets by the library. So, key_index can act as the guard variable
for both of these writes.

When rte_hash_add_key_with_hash_data is called to update an existing entry,
the key_index is not written. But, the store to the application data
must complete before the address of the application data (pData) is
updated in the key store entry. So, pData alone acts as the guard variable
for this case.

However, it should be noted that there are no ordering requirements
between 1) and 2), except for one requirement - the store to the
application data must complete before the store to key_index.
In other words, there are no ordering requirements between the stores to
the key store entry/signature and store to application data. So, the
synchronization point for application data can be any point between
the 'store to application data' and 'store to the key_index'.

The first patch in this series moves the signature comparison before the
load-acquire of the key_index. This does not result in any issues because
of the full key comparison which is done after the load-acquire of
the key_index.
Performance improvements:
Lookup Hit: 6.16%
Lookup Miss: 8.54%

The second patch in this series, moves the store-release of pData
before the store to any hash internal data structures. This is not
necessary, but just helps to show the non-dependency between application
data and hash table data. On the reader side, the pData is loaded
only if the keys match, this provides performance benefits.
Performance improvements (with patch 1):
Lookup Hit: 6.25%
Lookup Miss: 13.97%

v2
 - Dropped moving the tbl_chng_cnt to the beginning of the cache line
   commit
 - Changed the commit log for patch 1 to indicate that it improves
   performance (Yipeng)
 - Changed the comment in the search_one_bucket_lf function (Yipeng)
 - Changed the commit log for patch2 to indicate that changes to
   store-release of 'pdata' is cosmetic (Yipeng)

Honnappa Nagarahalli (2):
  lib/hash: use ordered loads only if signature matches
  lib/hash: load pData after full key compare

 lib/librte_hash/rte_cuckoo_hash.c | 98 ++++++++++++++++---------------
 1 file changed, 52 insertions(+), 46 deletions(-)

-- 
2.17.1

Comments

Jerin Jacob Kollanukkaran July 4, 2019, 11:13 a.m. UTC | #1
> -----Original Message-----

> From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli

> Sent: Wednesday, July 3, 2019 2:47 AM

> To: yipeng1.wang@intel.com; sameh.gobriel@intel.com;

> bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com;

> honnappa.nagarahalli@arm.com

> Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org;

> nd@arm.com

> Subject: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-

> free


[Snip]

> The first patch in this series moves the signature comparison before the load-

> acquire of the key_index. This does not result in any issues because of the

> full key comparison which is done after the load-acquire of the key_index.

> Performance improvements:

> Lookup Hit: 6.16%

> Lookup Miss: 8.54%

> 

> The second patch in this series, moves the store-release of pData before the

> store to any hash internal data structures. This is not necessary, but just

> helps to show the non-dependency between application data and hash table

> data. On the reader side, the pData is loaded only if the keys match, this

> provides performance benefits.

> Performance improvements (with patch 1):

> Lookup Hit: 6.25%

> Lookup Miss: 13.97%


Could you share the commands/data to test this specific performance measurement?
Thomas Monjalon July 4, 2019, 4:09 p.m. UTC | #2
02/07/2019 23:16, Honnappa Nagarahalli:
> v2

>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line

>    commit

>  - Changed the commit log for patch 1 to indicate that it improves

>    performance (Yipeng)

>  - Changed the comment in the search_one_bucket_lf function (Yipeng)

>  - Changed the commit log for patch2 to indicate that changes to

>    store-release of 'pdata' is cosmetic (Yipeng)

> 

> Honnappa Nagarahalli (2):

>   lib/hash: use ordered loads only if signature matches

>   lib/hash: load pData after full key compare


This series is missing 19.08 because of a lack of review.
It was expected because it was sent late in 19.08 cycle,
so this is just to make the status clear: it will be considered for 19.11.

PS: please think about --in-reply-to when sending new versions.
Honnappa Nagarahalli July 5, 2019, 6:08 a.m. UTC | #3
> [Snip]

> 

> > The first patch in this series moves the signature comparison before

> > the load- acquire of the key_index. This does not result in any issues

> > because of the full key comparison which is done after the load-acquire of

> the key_index.

> > Performance improvements:

> > Lookup Hit: 6.16%

> > Lookup Miss: 8.54%

> >

> > The second patch in this series, moves the store-release of pData

> > before the store to any hash internal data structures. This is not

> > necessary, but just helps to show the non-dependency between

> > application data and hash table data. On the reader side, the pData is

> > loaded only if the keys match, this provides performance benefits.

> > Performance improvements (with patch 1):

> > Lookup Hit: 6.25%

> > Lookup Miss: 13.97%

> 

> Could you share the commands/data to test this specific performance

> measurement?

The data given here uses hash_readwrite_lf test with 5M entries. The keys used for hit is again 5M.

We also tested this using L3-fwd (note that L3-fwd app needs to be changed to enable lock-free hash code path).
Command: ./examples/l3fwd/build/l3fwd -c 0xc0 -n 1 -- -E -P -p 0x3 --config="(0,0,7),(1,0,6)"
On A72:
Upstream lock-free:
Hit: 11.117/11.168/11.101
Miss: 6.760/6.763/6.764

Upstream lock-free + patches(1,2)
Hit: 11.310/11.276/11.385
Miss: 11.245/11.293/11.281
Honnappa Nagarahalli July 5, 2019, 6:14 a.m. UTC | #4
> 02/07/2019 23:16, Honnappa Nagarahalli:

> > v2

> >  - Dropped moving the tbl_chng_cnt to the beginning of the cache line

> >    commit

> >  - Changed the commit log for patch 1 to indicate that it improves

> >    performance (Yipeng)

> >  - Changed the comment in the search_one_bucket_lf function (Yipeng)

> >  - Changed the commit log for patch2 to indicate that changes to

> >    store-release of 'pdata' is cosmetic (Yipeng)

> >

> > Honnappa Nagarahalli (2):

> >   lib/hash: use ordered loads only if signature matches

> >   lib/hash: load pData after full key compare

> 

> This series is missing 19.08 because of a lack of review.

> It was expected because it was sent late in 19.08 cycle, so this is just to make

> the status clear: it will be considered for 19.11.

> 

> PS: please think about --in-reply-to when sending new versions.

I used the following command:

git send-email --to yipeng1.wang@intel.com --to sameh.gobriel@intel.com --to bruce.richardson@intel.com --to pablo.de.lara.guarch@intel.com --to honnappa.nagarahalli@arm.com --cc gavin.hu@arm.com --cc ruifeng.wang@arm.com --cc dev@dpdk.org --cc nd@arm.com --in-reply-to=20190625211520.43181-1-honnappa.nagarahalli@arm.com patches/hash_regression/v2/*.patch

I am not sure what I am missing.
> 

>
Thomas Monjalon July 5, 2019, 6:29 a.m. UTC | #5
05/07/2019 08:14, Honnappa Nagarahalli:
> > 02/07/2019 23:16, Honnappa Nagarahalli:

> > > v2

> > >  - Dropped moving the tbl_chng_cnt to the beginning of the cache line

> > >    commit

> > >  - Changed the commit log for patch 1 to indicate that it improves

> > >    performance (Yipeng)

> > >  - Changed the comment in the search_one_bucket_lf function (Yipeng)

> > >  - Changed the commit log for patch2 to indicate that changes to

> > >    store-release of 'pdata' is cosmetic (Yipeng)

> > >

> > > Honnappa Nagarahalli (2):

> > >   lib/hash: use ordered loads only if signature matches

> > >   lib/hash: load pData after full key compare

> > 

> > This series is missing 19.08 because of a lack of review.

> > It was expected because it was sent late in 19.08 cycle, so this is just to make

> > the status clear: it will be considered for 19.11.

> > 

> > PS: please think about --in-reply-to when sending new versions.

> I used the following command:

> 

> git send-email --to yipeng1.wang@intel.com --to sameh.gobriel@intel.com --to bruce.richardson@intel.com --to pablo.de.lara.guarch@intel.com --to honnappa.nagarahalli@arm.com --cc gavin.hu@arm.com --cc ruifeng.wang@arm.com --cc dev@dpdk.org --cc nd@arm.com --in-reply-to=20190625211520.43181-1-honnappa.nagarahalli@arm.com patches/hash_regression/v2/*.patch

> 

> I am not sure what I am missing.


You are missing nothing, sorry.
My email client showed it badly but it's better refreshed today.
Wang, Yipeng1 July 8, 2019, 4:51 p.m. UTC | #6
>-----Original Message-----

>From: Thomas Monjalon [mailto:thomas@monjalon.net]

>Sent: Thursday, July 4, 2019 9:10 AM

>To: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

>Cc: dev@dpdk.org; Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce

><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; gavin.hu@arm.com;

>ruifeng.wang@arm.com; nd@arm.com

>Subject: Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free

>

>02/07/2019 23:16, Honnappa Nagarahalli:

>> v2

>>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line

>>    commit

>>  - Changed the commit log for patch 1 to indicate that it improves

>>    performance (Yipeng)

>>  - Changed the comment in the search_one_bucket_lf function (Yipeng)

>>  - Changed the commit log for patch2 to indicate that changes to

>>    store-release of 'pdata' is cosmetic (Yipeng)

>>

>> Honnappa Nagarahalli (2):

>>   lib/hash: use ordered loads only if signature matches

>>   lib/hash: load pData after full key compare

>

>This series is missing 19.08 because of a lack of review.

>It was expected because it was sent late in 19.08 cycle,

>so this is just to make the status clear: it will be considered for 19.11.

>

>PS: please think about --in-reply-to when sending new versions.

>

[Wang, Yipeng] Hi, Thomas, Thanks for your work.
I finished the review of this patch, I think it is good to go. Please include it if it is convenient for you.
Honnappa confirmed that my understanding of the code is correct, and it should help performance.
Thomas Monjalon July 8, 2019, 6:10 p.m. UTC | #7
08/07/2019 18:51, Wang, Yipeng1:
>From: Thomas Monjalon [mailto:thomas@monjalon.net]

> >02/07/2019 23:16, Honnappa Nagarahalli:

> >> v2

> >>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line

> >>    commit

> >>  - Changed the commit log for patch 1 to indicate that it improves

> >>    performance (Yipeng)

> >>  - Changed the comment in the search_one_bucket_lf function (Yipeng)

> >>  - Changed the commit log for patch2 to indicate that changes to

> >>    store-release of 'pdata' is cosmetic (Yipeng)

> >>

> >> Honnappa Nagarahalli (2):

> >>   lib/hash: use ordered loads only if signature matches

> >>   lib/hash: load pData after full key compare

> >

> >This series is missing 19.08 because of a lack of review.

> >It was expected because it was sent late in 19.08 cycle,

> >so this is just to make the status clear: it will be considered for 19.11.

> 

> [Wang, Yipeng] Hi, Thomas, Thanks for your work.

> I finished the review of this patch, I think it is good to go.

> Please include it if it is convenient for you.

> Honnappa confirmed that my understanding of the code is correct, and it should help performance.


Applied, thanks