[3/3] lib/hash: adjust tbl_chng_cnt position

Message ID 20190625211520.43181-4-honnappa.nagarahalli@arm.com
State New
Headers show
Series
  • lib/hash: perf improvements for lock-free
Related show

Commit Message

Honnappa Nagarahalli June 25, 2019, 9:15 p.m.
tbl_chng_cnt is one of the first elements of the structure used in
the lookup. Move it to the beginning of the cache line to gain
performance.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Reviewed-by: Gavin Hu <gavin.hu@arm.com>

Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>

---
 lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.1

Comments

Wang, Yipeng1 June 28, 2019, 6:50 p.m. | #1
>-----Original Message-----

>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]

>Sent: Tuesday, June 25, 2019 2:15 PM

>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce

><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com

>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org

>Subject: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position

>

>tbl_chng_cnt is one of the first elements of the structure used in

>the lookup. Move it to the beginning of the cache line to gain

>performance.

>

>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")

>Cc: stable@dpdk.org

>

>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

>Reviewed-by: Gavin Hu <gavin.hu@arm.com>

>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>

>---

> lib/librte_hash/rte_cuckoo_hash.h | 6 +++---

> 1 file changed, 3 insertions(+), 3 deletions(-)

>

>diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h

>index fb19bb27d..af6451b5c 100644

>--- a/lib/librte_hash/rte_cuckoo_hash.h

>+++ b/lib/librte_hash/rte_cuckoo_hash.h

>@@ -170,7 +170,9 @@ struct rte_hash {

>

> 	/* Fields used in lookup */

>

>-	uint32_t key_len __rte_cache_aligned;

>+	uint32_t *tbl_chng_cnt __rte_cache_aligned;

>+	/**< Indicates if the hash table changed from last read. */

>+	uint32_t key_len;

> 	/**< Length of hash key. */

> 	uint8_t hw_trans_mem_support;

> 	/**< If hardware transactional memory is used. */

>@@ -218,8 +220,6 @@ struct rte_hash {

> 	 * is piggy-backed to freeing of the key index.

> 	 */

> 	uint32_t *ext_bkt_to_free;

>-	uint32_t *tbl_chng_cnt;

>-	/**< Indicates if the hash table changed from last read. */

> } __rte_cache_aligned;

>

> struct queue_node {

>--

>2.17.1


[Wang, Yipeng] 
I am not sure about this change. By moving counter to front, I think you seems push key_store out of the cache line. And key_store
Is also used in lookup (and more commonly).
My tests also show perf drop in many cases.
Honnappa Nagarahalli July 2, 2019, 5:23 p.m. | #2
<snip>

> >

> >tbl_chng_cnt is one of the first elements of the structure used in the

> >lookup. Move it to the beginning of the cache line to gain performance.

> >

> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")

> >Cc: stable@dpdk.org

> >

> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>

> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>

> >---

> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---

> > 1 file changed, 3 insertions(+), 3 deletions(-)

> >

> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h

> >b/lib/librte_hash/rte_cuckoo_hash.h

> >index fb19bb27d..af6451b5c 100644

> >--- a/lib/librte_hash/rte_cuckoo_hash.h

> >+++ b/lib/librte_hash/rte_cuckoo_hash.h

> >@@ -170,7 +170,9 @@ struct rte_hash {

> >

> > 	/* Fields used in lookup */

> >

> >-	uint32_t key_len __rte_cache_aligned;

> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;

> >+	/**< Indicates if the hash table changed from last read. */

> >+	uint32_t key_len;

> > 	/**< Length of hash key. */

> > 	uint8_t hw_trans_mem_support;

> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6

> @@

> >struct rte_hash {

> > 	 * is piggy-backed to freeing of the key index.

> > 	 */

> > 	uint32_t *ext_bkt_to_free;

> >-	uint32_t *tbl_chng_cnt;

> >-	/**< Indicates if the hash table changed from last read. */

> > } __rte_cache_aligned;

> >

> > struct queue_node {

> >--

> >2.17.1

> 

> [Wang, Yipeng]

> I am not sure about this change. By moving counter to front, I think you

> seems push key_store out of the cache line. And key_store Is also used in

> lookup (and more commonly).

> My tests also show perf drop in many cases.

I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?

But, I do agree that this work is not complete. We can drop this patch and take this up separately.
Wang, Yipeng1 July 4, 2019, 12:52 a.m. | #3
>-----Original Message-----

>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]

>Sent: Tuesday, July 2, 2019 10:23 AM

>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce

><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>

>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;

>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>

>Subject: RE: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position

>

><snip>

>

>> >

>> >tbl_chng_cnt is one of the first elements of the structure used in the

>> >lookup. Move it to the beginning of the cache line to gain performance.

>> >

>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")

>> >Cc: stable@dpdk.org

>> >

>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>

>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>

>> >---

>> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---

>> > 1 file changed, 3 insertions(+), 3 deletions(-)

>> >

>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h

>> >b/lib/librte_hash/rte_cuckoo_hash.h

>> >index fb19bb27d..af6451b5c 100644

>> >--- a/lib/librte_hash/rte_cuckoo_hash.h

>> >+++ b/lib/librte_hash/rte_cuckoo_hash.h

>> >@@ -170,7 +170,9 @@ struct rte_hash {

>> >

>> > 	/* Fields used in lookup */

>> >

>> >-	uint32_t key_len __rte_cache_aligned;

>> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;

>> >+	/**< Indicates if the hash table changed from last read. */

>> >+	uint32_t key_len;

>> > 	/**< Length of hash key. */

>> > 	uint8_t hw_trans_mem_support;

>> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6

>> @@

>> >struct rte_hash {

>> > 	 * is piggy-backed to freeing of the key index.

>> > 	 */

>> > 	uint32_t *ext_bkt_to_free;

>> >-	uint32_t *tbl_chng_cnt;

>> >-	/**< Indicates if the hash table changed from last read. */

>> > } __rte_cache_aligned;

>> >

>> > struct queue_node {

>> >--

>> >2.17.1

>>

>> [Wang, Yipeng]

>> I am not sure about this change. By moving counter to front, I think you

>> seems push key_store out of the cache line. And key_store Is also used in

>> lookup (and more commonly).

>> My tests also show perf drop in many cases.

>I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm

>platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?

>

>But, I do agree that this work is not complete. We can drop this patch and take this up separately.

[Wang, Yipeng] 
I run the LF test on x86. For 1 reader case it seems a clear improvement for lookup without key-shifts. Other results are a mixed bag even
for lock-free. For most HTM it is a drop.

My opinion is to delay similar fine tuning effort tied to architecture specs.

Patch

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..af6451b5c 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -170,7 +170,9 @@  struct rte_hash {
 
 	/* Fields used in lookup */
 
-	uint32_t key_len __rte_cache_aligned;
+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
+	/**< Indicates if the hash table changed from last read. */
+	uint32_t key_len;
 	/**< Length of hash key. */
 	uint8_t hw_trans_mem_support;
 	/**< If hardware transactional memory is used. */
@@ -218,8 +220,6 @@  struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
-	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
 struct queue_node {