From patchwork Fri Feb 22 07:04:27 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
X-Patchwork-Id: 158988
Delivered-To: patch@linaro.org
Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp1395741jaa;
 Thu, 21 Feb 2019 23:05:50 -0800 (PST)
X-Google-Smtp-Source: AHgI3IZBz5hnh9ytrirSpE0D19uqo1ZVMHld6mr9VA1lnP43zx3g7NR94Q9lGYCzwXpRcikWbXf9
X-Received: by 2002:a17:906:5586:: with SMTP id
 y6mr1821252ejp.197.1550819150778; 
 Thu, 21 Feb 2019 23:05:50 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1550819150; cv=none;
 d=google.com; s=arc-20160816;
 b=suRMPLeVrokOq+s1Yg2VN9DeVRZpwEF1/WRLqvgtWrLFJ7Nd7t3EB7ZYN9UDkfs0k6
 CHleeUhhqmsqNF13gMFOnCV+VHItRXmlNS/eqSLivC8Ye/xgs9Kqkh7o4Y7wjrU/9BnJ
 AJUHTdMB9fnBWNi/O5XeEjrDgTA47NUoXmATz1dttSmi07Nq9An+/dO1ghBgWpq5UQqK
 A3+QbUriRp4Uz2GUGc1D0sZAZDWUPJ1Rs4n8QFlc8hklwd43S/b1c3ZMY1eTgrSkPRA6
 yraN17o6txfJ4Z2Raa/ccBUWCKZn+O09q4QY5mKXFY8D4OaqgHx0p20LP3PO0HDAygYZ
 qYjw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=sender:errors-to:list-subscribe:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:subject:references:in-reply-to
 :message-id:date:cc:to:from;
 bh=4qKfs+V9frFuHzD68wEUtvl26urxT1OM6KpY/2KPpGU=;
 b=RWzEqELNNsaZX97IhSWqmTFMvU1Ss/dYeDze0XVfTtWgN849OpnMK2AvK9XD8SiKoT
 tg1uILwevGcJiH5dpN+GjvgLNprbQ7u31jRLftgtiU2ZMT6ZLFVn1EZ2UVYXQ+mAf0Fs
 rTvFgnV0BGoqXOwo04K/qqEdPXFH7xG54cxPHWKXrNkgI75ywjR2dZpYTPZkOmJK2AHN
 xgWGX4CmSg7V5LQ/TwREjQaT8+tOoACI8MU/9kHWevERTHzEDfKBvrTbAhWNa31/yofh
 lufSLVUHtqnPLyWJ5waQ1I9B+vUJTAfKEOOoldjBX6m4qSor9hiCjXttH2R/nB8LgPhU
 60+A==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: domain of dev-bounces@dpdk.org designates
 92.243.14.124 as permitted sender)
 smtp.mailfrom=dev-bounces@dpdk.org
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org. [92.243.14.124])
 by mx.google.com with ESMTP id f10si320275edb.207.2019.02.21.23.05.50;
 Thu, 21 Feb 2019 23:05:50 -0800 (PST)
Received-SPF: pass (google.com: domain of dev-bounces@dpdk.org designates
 92.243.14.124 as permitted sender) client-ip=92.243.14.124; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: domain of dev-bounces@dpdk.org designates
 92.243.14.124 as permitted sender)
 smtp.mailfrom=dev-bounces@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
 by dpdk.org (Postfix) with ESMTP id 51CC6374E;
 Fri, 22 Feb 2019 08:05:16 +0100 (CET)
Received: from foss.arm.com (foss.arm.com [217.140.101.70])
 by dpdk.org (Postfix) with ESMTP id E63D831FC
 for <dev@dpdk.org>; Fri, 22 Feb 2019 08:05:13 +0100 (CET)
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F63580D;
 Thu, 21 Feb 2019 23:05:13 -0800 (PST)
Received: from qc2400f-1.austin.arm.com (qc2400f-1.austin.arm.com
 [10.118.12.104])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
 E3D3A3F690; Thu, 21 Feb 2019 23:05:12 -0800 (PST)
From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
To: konstantin.ananyev@intel.com, stephen@networkplumber.org,
 paulmck@linux.ibm.com, dev@dpdk.org, honnappa.nagarahalli@arm.com
Cc: gavin.hu@arm.com, dharmik.thakkar@arm.com, malvika.gupta@arm.com,
 nd@arm.com
Date: Fri, 22 Feb 2019 01:04:27 -0600
Message-Id: <20190222070427.22866-6-honnappa.nagarahalli@arm.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190222070427.22866-1-honnappa.nagarahalli@arm.com>
References: <20181222021420.5114-1-honnappa.nagarahalli@arm.com>
 <20190222070427.22866-1-honnappa.nagarahalli@arm.com>
Subject: [dpdk-dev] [RFC v3 5/5] lib/rcu: fix the size of register thread ID
 array size
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Keeping the register thread ID size dependent on the max threads
is resulting in performance drops due to address calculations at
run time. Fixing the size of the thread ID registration array
reduces the complexity of address calculation. This change
fixes the maximum number of threads supported to 512(1 cache line
size of 64B). However, the memory required for QS counters is still
dependent on the max threads parameter. This change provides both
flexibility and addresses performance as well.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_rcu/rte_rcu_qsbr.c | 13 ++-----------
 lib/librte_rcu/rte_rcu_qsbr.h | 29 ++++++++++-------------------
 2 files changed, 12 insertions(+), 30 deletions(-)

-- 
2.17.1

diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index 02464fdba..3cff82121 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -25,17 +25,12 @@
 unsigned int __rte_experimental
 rte_rcu_qsbr_get_memsize(uint32_t max_threads)
 {
-	int n;
 	ssize_t sz;
 
 	RTE_ASSERT(max_threads == 0);
 
 	sz = sizeof(struct rte_rcu_qsbr);
 
-	/* Add the size of the registered thread ID bitmap array */
-	n = RTE_ALIGN(max_threads, RTE_QSBR_THRID_ARRAY_ELM_SIZE);
-	sz += RTE_QSBR_THRID_ARRAY_SIZE(n);
-
 	/* Add the size of quiescent state counter array */
 	sz += sizeof(struct rte_rcu_qsbr_cnt) * max_threads;
 
@@ -51,9 +46,7 @@ rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads)
 	memset(v, 0, rte_rcu_qsbr_get_memsize(max_threads));
 	v->m_threads = max_threads;
 	v->ma_threads = RTE_ALIGN(max_threads, RTE_QSBR_THRID_ARRAY_ELM_SIZE);
-
 	v->num_elems = v->ma_threads/RTE_QSBR_THRID_ARRAY_ELM_SIZE;
-	v->thrid_array_size = RTE_QSBR_THRID_ARRAY_SIZE(v->ma_threads);
 }
 
 /* Dump the details of a single quiescent state variable to a file. */
@@ -74,8 +67,7 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 
 	fprintf(f, "  Registered thread ID mask = 0x");
 	for (i = 0; i < v->num_elems; i++)
-		fprintf(f, "%lx", __atomic_load_n(
-					RTE_QSBR_THRID_ARRAY_ELM(v, i),
+		fprintf(f, "%lx", __atomic_load_n(&v->reg_thread_id[i],
 					__ATOMIC_ACQUIRE));
 	fprintf(f, "\n");
 
@@ -84,8 +76,7 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 
 	fprintf(f, "Quiescent State Counts for readers:\n");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = __atomic_load_n(&v->reg_thread_id[i], __ATOMIC_ACQUIRE);
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
 			fprintf(f, "thread ID = %d, count = %lu\n", t,
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 21fa2c198..1147f11f2 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -33,14 +33,9 @@ extern "C" {
  * Given thread id needs to be converted to index into the array and
  * the id within the array element.
  */
-/* Thread ID array size
- * @param ma_threads
- *   num of threads aligned to 64
- */
-#define RTE_QSBR_THRID_ARRAY_SIZE(ma_threads) \
-	RTE_ALIGN((ma_threads) >> 3, RTE_CACHE_LINE_SIZE)
+#define RTE_RCU_MAX_THREADS 512
+#define RTE_QSBR_THRID_ARRAY_ELEMS (RTE_RCU_MAX_THREADS/(sizeof(uint64_t) * 8))
 #define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
-#define RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *)(v + 1) + i)
 #define RTE_QSBR_THRID_INDEX_SHIFT 6
 #define RTE_QSBR_THRID_MASK 0x3f
 
@@ -49,8 +44,7 @@ struct rte_rcu_qsbr_cnt {
 	uint64_t cnt; /**< Quiescent state counter. */
 } __rte_cache_aligned;
 
-#define RTE_QSBR_CNT_ARRAY_ELM(v, i) ((struct rte_rcu_qsbr_cnt *) \
-	((uint8_t *)(v + 1) + v->thrid_array_size) + i)
+#define RTE_QSBR_CNT_ARRAY_ELM(v, i) (((struct rte_rcu_qsbr_cnt *)(v + 1)) + i)
 
 /**
  * RTE thread Quiescent State structure.
@@ -69,15 +63,14 @@ struct rte_rcu_qsbr {
 	uint64_t token __rte_cache_aligned;
 	/**< Counter to allow for multiple simultaneous QS queries */
 
-	uint32_t thrid_array_size __rte_cache_aligned;
-	/**< Registered thread ID bitmap array size in bytes */
-	uint32_t num_elems;
+	uint32_t num_elems __rte_cache_aligned;
 	/**< Number of elements in the thread ID array */
-
 	uint32_t m_threads;
 	/**< Maximum number of threads this RCU variable will use */
 	uint32_t ma_threads;
 	/**< Maximum number of threads aligned to 32 */
+
+	uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] __rte_cache_aligned;
 } __rte_cache_aligned;
 
 /**
@@ -152,8 +145,7 @@ rte_rcu_qsbr_register_thread(struct rte_rcu_qsbr *v, unsigned int thread_id)
 	/* Release the store to initial TQS count so that readers
 	 * can use it immediately after this function returns.
 	 */
-	__atomic_fetch_or(RTE_QSBR_THRID_ARRAY_ELM(v, i),
-		1UL << id, __ATOMIC_RELEASE);
+	__atomic_fetch_or(&v->reg_thread_id[i], 1UL << id, __ATOMIC_RELEASE);
 }
 
 /**
@@ -188,7 +180,7 @@ rte_rcu_qsbr_unregister_thread(struct rte_rcu_qsbr *v, unsigned int thread_id)
 	 * reporting threads is visible before the thread
 	 * does anything else.
 	 */
-	__atomic_fetch_and(RTE_QSBR_THRID_ARRAY_ELM(v, i),
+	__atomic_fetch_and(&v->reg_thread_id[i],
 				~(1UL << id), __ATOMIC_RELEASE);
 }
 
@@ -298,8 +290,7 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 		/* Load the current registered thread bit map before
 		 * loading the reader thread quiescent state counters.
 		 */
-		bmap = __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i),
-				__ATOMIC_ACQUIRE);
+		bmap = __atomic_load_n(&v->reg_thread_id[i], __ATOMIC_ACQUIRE);
 		id = i << RTE_QSBR_THRID_INDEX_SHIFT;
 
 		while (bmap) {
@@ -324,7 +315,7 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 				 * Re-read the bitmap.
 				 */
 				bmap = __atomic_load_n(
-						RTE_QSBR_THRID_ARRAY_ELM(v, i),
+						&v->reg_thread_id[i],
 						__ATOMIC_ACQUIRE);
 
 				continue;