From patchwork Fri Feb 22 07:04:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Honnappa Nagarahalli X-Patchwork-Id: 158988 Delivered-To: patch@linaro.org Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp1395741jaa; Thu, 21 Feb 2019 23:05:50 -0800 (PST) X-Google-Smtp-Source: AHgI3IZBz5hnh9ytrirSpE0D19uqo1ZVMHld6mr9VA1lnP43zx3g7NR94Q9lGYCzwXpRcikWbXf9 X-Received: by 2002:a17:906:5586:: with SMTP id y6mr1821252ejp.197.1550819150778; Thu, 21 Feb 2019 23:05:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550819150; cv=none; d=google.com; s=arc-20160816; b=suRMPLeVrokOq+s1Yg2VN9DeVRZpwEF1/WRLqvgtWrLFJ7Nd7t3EB7ZYN9UDkfs0k6 CHleeUhhqmsqNF13gMFOnCV+VHItRXmlNS/eqSLivC8Ye/xgs9Kqkh7o4Y7wjrU/9BnJ AJUHTdMB9fnBWNi/O5XeEjrDgTA47NUoXmATz1dttSmi07Nq9An+/dO1ghBgWpq5UQqK A3+QbUriRp4Uz2GUGc1D0sZAZDWUPJ1Rs4n8QFlc8hklwd43S/b1c3ZMY1eTgrSkPRA6 yraN17o6txfJ4Z2Raa/ccBUWCKZn+O09q4QY5mKXFY8D4OaqgHx0p20LP3PO0HDAygYZ qYjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:cc:to:from; bh=4qKfs+V9frFuHzD68wEUtvl26urxT1OM6KpY/2KPpGU=; b=RWzEqELNNsaZX97IhSWqmTFMvU1Ss/dYeDze0XVfTtWgN849OpnMK2AvK9XD8SiKoT tg1uILwevGcJiH5dpN+GjvgLNprbQ7u31jRLftgtiU2ZMT6ZLFVn1EZ2UVYXQ+mAf0Fs rTvFgnV0BGoqXOwo04K/qqEdPXFH7xG54cxPHWKXrNkgI75ywjR2dZpYTPZkOmJK2AHN xgWGX4CmSg7V5LQ/TwREjQaT8+tOoACI8MU/9kHWevERTHzEDfKBvrTbAhWNa31/yofh lufSLVUHtqnPLyWJ5waQ1I9B+vUJTAfKEOOoldjBX6m4qSor9hiCjXttH2R/nB8LgPhU 60+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) smtp.mailfrom=dev-bounces@dpdk.org Return-Path: Received: from dpdk.org (dpdk.org. [92.243.14.124]) by mx.google.com with ESMTP id f10si320275edb.207.2019.02.21.23.05.50; Thu, 21 Feb 2019 23:05:50 -0800 (PST) Received-SPF: pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) client-ip=92.243.14.124; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 51CC6374E; Fri, 22 Feb 2019 08:05:16 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id E63D831FC for ; Fri, 22 Feb 2019 08:05:13 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F63580D; Thu, 21 Feb 2019 23:05:13 -0800 (PST) Received: from qc2400f-1.austin.arm.com (qc2400f-1.austin.arm.com [10.118.12.104]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E3D3A3F690; Thu, 21 Feb 2019 23:05:12 -0800 (PST) From: Honnappa Nagarahalli To: konstantin.ananyev@intel.com, stephen@networkplumber.org, paulmck@linux.ibm.com, dev@dpdk.org, honnappa.nagarahalli@arm.com Cc: gavin.hu@arm.com, dharmik.thakkar@arm.com, malvika.gupta@arm.com, nd@arm.com Date: Fri, 22 Feb 2019 01:04:27 -0600 Message-Id: <20190222070427.22866-6-honnappa.nagarahalli@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190222070427.22866-1-honnappa.nagarahalli@arm.com> References: <20181222021420.5114-1-honnappa.nagarahalli@arm.com> <20190222070427.22866-1-honnappa.nagarahalli@arm.com> Subject: [dpdk-dev] [RFC v3 5/5] lib/rcu: fix the size of register thread ID array size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Keeping the register thread ID size dependent on the max threads is resulting in performance drops due to address calculations at run time. Fixing the size of the thread ID registration array reduces the complexity of address calculation. This change fixes the maximum number of threads supported to 512(1 cache line size of 64B). However, the memory required for QS counters is still dependent on the max threads parameter. This change provides both flexibility and addresses performance as well. Signed-off-by: Honnappa Nagarahalli --- lib/librte_rcu/rte_rcu_qsbr.c | 13 ++----------- lib/librte_rcu/rte_rcu_qsbr.h | 29 ++++++++++------------------- 2 files changed, 12 insertions(+), 30 deletions(-) -- 2.17.1 diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c index 02464fdba..3cff82121 100644 --- a/lib/librte_rcu/rte_rcu_qsbr.c +++ b/lib/librte_rcu/rte_rcu_qsbr.c @@ -25,17 +25,12 @@ unsigned int __rte_experimental rte_rcu_qsbr_get_memsize(uint32_t max_threads) { - int n; ssize_t sz; RTE_ASSERT(max_threads == 0); sz = sizeof(struct rte_rcu_qsbr); - /* Add the size of the registered thread ID bitmap array */ - n = RTE_ALIGN(max_threads, RTE_QSBR_THRID_ARRAY_ELM_SIZE); - sz += RTE_QSBR_THRID_ARRAY_SIZE(n); - /* Add the size of quiescent state counter array */ sz += sizeof(struct rte_rcu_qsbr_cnt) * max_threads; @@ -51,9 +46,7 @@ rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads) memset(v, 0, rte_rcu_qsbr_get_memsize(max_threads)); v->m_threads = max_threads; v->ma_threads = RTE_ALIGN(max_threads, RTE_QSBR_THRID_ARRAY_ELM_SIZE); - v->num_elems = v->ma_threads/RTE_QSBR_THRID_ARRAY_ELM_SIZE; - v->thrid_array_size = RTE_QSBR_THRID_ARRAY_SIZE(v->ma_threads); } /* Dump the details of a single quiescent state variable to a file. */ @@ -74,8 +67,7 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v) fprintf(f, " Registered thread ID mask = 0x"); for (i = 0; i < v->num_elems; i++) - fprintf(f, "%lx", __atomic_load_n( - RTE_QSBR_THRID_ARRAY_ELM(v, i), + fprintf(f, "%lx", __atomic_load_n(&v->reg_thread_id[i], __ATOMIC_ACQUIRE)); fprintf(f, "\n"); @@ -84,8 +76,7 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v) fprintf(f, "Quiescent State Counts for readers:\n"); for (i = 0; i < v->num_elems; i++) { - bmap = __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i), - __ATOMIC_ACQUIRE); + bmap = __atomic_load_n(&v->reg_thread_id[i], __ATOMIC_ACQUIRE); while (bmap) { t = __builtin_ctzl(bmap); fprintf(f, "thread ID = %d, count = %lu\n", t, diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h index 21fa2c198..1147f11f2 100644 --- a/lib/librte_rcu/rte_rcu_qsbr.h +++ b/lib/librte_rcu/rte_rcu_qsbr.h @@ -33,14 +33,9 @@ extern "C" { * Given thread id needs to be converted to index into the array and * the id within the array element. */ -/* Thread ID array size - * @param ma_threads - * num of threads aligned to 64 - */ -#define RTE_QSBR_THRID_ARRAY_SIZE(ma_threads) \ - RTE_ALIGN((ma_threads) >> 3, RTE_CACHE_LINE_SIZE) +#define RTE_RCU_MAX_THREADS 512 +#define RTE_QSBR_THRID_ARRAY_ELEMS (RTE_RCU_MAX_THREADS/(sizeof(uint64_t) * 8)) #define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8) -#define RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *)(v + 1) + i) #define RTE_QSBR_THRID_INDEX_SHIFT 6 #define RTE_QSBR_THRID_MASK 0x3f @@ -49,8 +44,7 @@ struct rte_rcu_qsbr_cnt { uint64_t cnt; /**< Quiescent state counter. */ } __rte_cache_aligned; -#define RTE_QSBR_CNT_ARRAY_ELM(v, i) ((struct rte_rcu_qsbr_cnt *) \ - ((uint8_t *)(v + 1) + v->thrid_array_size) + i) +#define RTE_QSBR_CNT_ARRAY_ELM(v, i) (((struct rte_rcu_qsbr_cnt *)(v + 1)) + i) /** * RTE thread Quiescent State structure. @@ -69,15 +63,14 @@ struct rte_rcu_qsbr { uint64_t token __rte_cache_aligned; /**< Counter to allow for multiple simultaneous QS queries */ - uint32_t thrid_array_size __rte_cache_aligned; - /**< Registered thread ID bitmap array size in bytes */ - uint32_t num_elems; + uint32_t num_elems __rte_cache_aligned; /**< Number of elements in the thread ID array */ - uint32_t m_threads; /**< Maximum number of threads this RCU variable will use */ uint32_t ma_threads; /**< Maximum number of threads aligned to 32 */ + + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] __rte_cache_aligned; } __rte_cache_aligned; /** @@ -152,8 +145,7 @@ rte_rcu_qsbr_register_thread(struct rte_rcu_qsbr *v, unsigned int thread_id) /* Release the store to initial TQS count so that readers * can use it immediately after this function returns. */ - __atomic_fetch_or(RTE_QSBR_THRID_ARRAY_ELM(v, i), - 1UL << id, __ATOMIC_RELEASE); + __atomic_fetch_or(&v->reg_thread_id[i], 1UL << id, __ATOMIC_RELEASE); } /** @@ -188,7 +180,7 @@ rte_rcu_qsbr_unregister_thread(struct rte_rcu_qsbr *v, unsigned int thread_id) * reporting threads is visible before the thread * does anything else. */ - __atomic_fetch_and(RTE_QSBR_THRID_ARRAY_ELM(v, i), + __atomic_fetch_and(&v->reg_thread_id[i], ~(1UL << id), __ATOMIC_RELEASE); } @@ -298,8 +290,7 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait) /* Load the current registered thread bit map before * loading the reader thread quiescent state counters. */ - bmap = __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i), - __ATOMIC_ACQUIRE); + bmap = __atomic_load_n(&v->reg_thread_id[i], __ATOMIC_ACQUIRE); id = i << RTE_QSBR_THRID_INDEX_SHIFT; while (bmap) { @@ -324,7 +315,7 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait) * Re-read the bitmap. */ bmap = __atomic_load_n( - RTE_QSBR_THRID_ARRAY_ELM(v, i), + &v->reg_thread_id[i], __ATOMIC_ACQUIRE); continue;