From patchwork Tue Apr 23 04:31:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Honnappa Nagarahalli X-Patchwork-Id: 162652 Delivered-To: patch@linaro.org Received: by 2002:a02:c6d8:0:0:0:0:0 with SMTP id r24csp3347834jan; Mon, 22 Apr 2019 21:32:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqwEVO6xMWXKi72G1/EL6UGXMeUvzp4y+RWF+BCvw+n1BSG9za5RAzlkwFsVAA6H6ysHPg0i X-Received: by 2002:a17:906:54d1:: with SMTP id c17mr11556760ejp.223.1555993942618; Mon, 22 Apr 2019 21:32:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555993942; cv=none; d=google.com; s=arc-20160816; b=tM9GnI6BgZMTQbfJbewVPG3m3qO7SyYR28lJT489/XJc1JlIftCQWW+OgtPP8ehywq XygZDsbwTyhgqKEom3LNJNF3NLASM44zbtyeiI3j2oLUX5D5ADHOa/KpJWv9T+4kiHDt YzTkgzsCJQPGooSvWzqJnP9WpfHuil2MVpqDtZ3Z9zs/NKvemi3//M16wc+kGWJ7kQ02 Ram8QyjiFs1NgPwgY7iWWsq+WDHJteCsxjN41fvXQnLKzAiIfdCJuGMzpAeJf3oY9UMX L2ZhQrMA8umGXihyuy0Z3pvvfX1mTCF/sWdXzEh7ThHq5L7UpGbs8BJgrcvBu85AK/s/ kkCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:cc:to:from; bh=7o5bnBqN5T6Du4CNXixCZ/8T6yWIGsyGjHrMwlpIKs4=; b=YdE+5k8PfLd9RoyqGtt/kAqjtscxOsffhKluMoA2nWvI8ZgNJXx0IoInMVPZbGRAj/ jkHsKUB2SxQrfGogCIJ/soT/vMjZimD+6vKqHY1bFqlAw7QKoZH6a5tiEAMdOzev5sMc zgzvf+UK2psME5LSxIKNNLeXUtdeZDKbl5d3YJ4mqsBzqEJ5aLGU8LeD+TvJ8BPFnTek S9L/nA+B8plZuerSqbiVtuSoRIXU4xyXFqE10QXLWOJ09h+RVdWZDAlokXVx/maa54ig 8yCE0h/g49/r94danBzNfQUOxiecZ7dxXwaNzrPfePLE6FiYYOQCnIKmyfXngyB+b/i7 MF4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) smtp.mailfrom=dev-bounces@dpdk.org Return-Path: Received: from dpdk.org (dpdk.org. [92.243.14.124]) by mx.google.com with ESMTP id x13si2957858edh.155.2019.04.22.21.32.22; Mon, 22 Apr 2019 21:32:22 -0700 (PDT) Received-SPF: pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) client-ip=92.243.14.124; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dev-bounces@dpdk.org designates 92.243.14.124 as permitted sender) smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 56B9D1B4EA; Tue, 23 Apr 2019 06:31:55 +0200 (CEST) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 762821B4D7 for ; Tue, 23 Apr 2019 06:31:49 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBB1715AD; Mon, 22 Apr 2019 21:31:48 -0700 (PDT) Received: from qc2400f-1.austin.arm.com (qc2400f-1.austin.arm.com [10.118.13.209]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6109F3F557; Mon, 22 Apr 2019 21:31:48 -0700 (PDT) From: Honnappa Nagarahalli To: konstantin.ananyev@intel.com, stephen@networkplumber.org, paulmck@linux.ibm.com, marko.kovacevic@intel.com, dev@dpdk.org Cc: honnappa.nagarahalli@arm.com, gavin.hu@arm.com, dharmik.thakkar@arm.com, malvika.gupta@arm.com Date: Mon, 22 Apr 2019 23:31:30 -0500 Message-Id: <20190423043130.18153-4-honnappa.nagarahalli@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190423043130.18153-1-honnappa.nagarahalli@arm.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190423043130.18153-1-honnappa.nagarahalli@arm.com> Subject: [dpdk-dev] [PATCH v7 3/3] doc/rcu: add lib_rcu documentation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add lib_rcu QSBR API and programmer guide documentation. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Marko Kovacevic --- doc/api/doxy-api-index.md | 3 +- doc/api/doxy-api.conf.in | 1 + .../prog_guide/img/rcu_general_info.svg | 509 ++++++++++++++++++ doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/rcu_lib.rst | 185 +++++++ 5 files changed, 698 insertions(+), 1 deletion(-) create mode 100644 doc/guides/prog_guide/img/rcu_general_info.svg create mode 100644 doc/guides/prog_guide/rcu_lib.rst -- 2.17.1 diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index de1e215dd..8f0e84de6 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -54,7 +54,8 @@ The public API headers are grouped by topics: [memzone] (@ref rte_memzone.h), [mempool] (@ref rte_mempool.h), [malloc] (@ref rte_malloc.h), - [memcpy] (@ref rte_memcpy.h) + [memcpy] (@ref rte_memcpy.h), + [rcu] (@ref rte_rcu_qsbr.h) - **timers**: [cycles] (@ref rte_cycles.h), diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in index 7722fc3e9..b9896cb63 100644 --- a/doc/api/doxy-api.conf.in +++ b/doc/api/doxy-api.conf.in @@ -51,6 +51,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \ @TOPDIR@/lib/librte_port \ @TOPDIR@/lib/librte_power \ @TOPDIR@/lib/librte_rawdev \ + @TOPDIR@/lib/librte_rcu \ @TOPDIR@/lib/librte_reorder \ @TOPDIR@/lib/librte_ring \ @TOPDIR@/lib/librte_sched \ diff --git a/doc/guides/prog_guide/img/rcu_general_info.svg b/doc/guides/prog_guide/img/rcu_general_info.svg new file mode 100644 index 000000000..e7ca1dacb --- /dev/null +++ b/doc/guides/prog_guide/img/rcu_general_info.svg @@ -0,0 +1,509 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + + + Sheet.3 + + + + Sheet.4 + + + + Sheet.5 + D1 + + + + D1 + + Sheet.6 + + + + Sheet.7 + + + + Sheet.8 + D2 + + + + D2 + + Sheet.9 + + + + Sheet.10 + Reader Thread 1 + + + + Reader Thread 1 + + Sheet.11 + + + + Sheet.12 + + + + Sheet.13 + D1 + + + + D1 + + Sheet.14 + + + + Sheet.15 + + + + Sheet.16 + D2 + + + + D2 + + Sheet.17 + + + + Sheet.18 + T 2 + + + + T 2 + + Sheet.19 + + + + Sheet.20 + + + + Sheet.21 + D1 + + + + D1 + + Sheet.22 + + + + Sheet.23 + + + + Sheet.24 + D2 + + + + D2 + + Sheet.25 + + + + Sheet.26 + T 3 + + + + T 3 + + Sheet.28 + Time + + + + Time + + Sheet.29 + + + + Sheet.30 + + + + Sheet.31 + Remove reference to entry1 + + + + Remove reference to entry1 + + Sheet.33 + + + + Sheet.34 + Delete + + + + Delete + + Sheet.35 + + + + Sheet.36 + + + + Sheet.37 + Delete entry1 from D1 + + + + Delete entry1 from D1 + + Sheet.38 + + + + Sheet.39 + + + + Sheet.40 + + + + Sheet.41 + Free memory for entries1 and 2 after every reader has gone th... + + + + Free memory for entries1 and 2 after every reader has gone through at least 1 quiescent state + + Sheet.46 + Free + + + + Free + + Sheet.48 + + + + Sheet.49 + + + + Sheet.50 + Grace Period + + + + Grace Period + + Sheet.51 + + + + Sheet.52 + + + + Sheet.53 + + + + Sheet.54 + Delete entry2 from D1 + + + + Delete entry2 from D1 + + Sheet.56 + + + + Sheet.57 + + + + Sheet.58 + + + + Sheet.59 + Critical sections + + + + Critical sections + + Sheet.60 + + + + Sheet.61 + + + + Sheet.62 + + + + Sheet.63 + Quiescent states + + + + Quiescent states + + Sheet.64 + + + + Sheet.65 + + + + Sheet.66 + + + + Sheet.67 + + + + Sheet.68 + + + + Sheet.69 + + + + Sheet.70 + while(1) loop + + + + while(1) loop + + Sheet.71 + + + + Sheet.72 + Reader thread is not accessing any shared data structure. i.e... + + + + Reader thread is not accessing any shared data structure.i.e. non critical section or quiescent state. + + Sheet.73 + Dx + + + + Dx + + Sheet.74 + Reader thread is accessing the shared data structure Dx. i.e.... + + + + Reader thread is accessing the shared data structure Dx.i.e. critical section. + + Sheet.75 + Point in time when the reference to the entry is removed usin... + + + + Point in time when the reference to the entry is removed using an atomic operation. + + Sheet.76 + Delete + + + + Delete + + Sheet.77 + Point in time when the writer can free the deleted entry. + + + + Point in time when the writer can free the deleted entry. + + Sheet.78 + Free + + + + Free + + Sheet.79 + Time duration between Delete and Free, during which memory ca... + + + + Time duration between Delete and Free, during which memory cannot be freed. + + Sheet.80 + Grace Period + + + + Grace Period + + Sheet.83 + + + + Sheet.84 + + + + Sheet.85 + + + + Sheet.86 + + + + Sheet.87 + + + + Sheet.88 + Remove reference to entry2 + + + + Remove reference to entry2 + + diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index 95f5e7964..17df2c563 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -56,6 +56,7 @@ Programmer's Guide metrics_lib bpf_lib ipsec_lib + rcu_lib source_org dev_kit_build_system dev_kit_root_make_help diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst new file mode 100644 index 000000000..55d44e15d --- /dev/null +++ b/doc/guides/prog_guide/rcu_lib.rst @@ -0,0 +1,185 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2019 Arm Limited. + +.. _RCU_Library: + +RCU Library +============ + +Lock-less data structures provide scalability and determinism. +They enable use cases where locking may not be allowed +(for ex: real-time applications). + +In the following paras, the term 'memory' refers to memory allocated +by typical APIs like malloc or anything that is representative of +memory, for ex: an index of a free element array. + +Since these data structures are lock less, the writers and readers +are accessing the data structures concurrently. Hence, while removing +an element from a data structure, the writers cannot return the memory +to the allocator, without knowing that the readers are not +referencing that element/memory anymore. Hence, it is required to +separate the operation of removing an element into 2 steps: + +Delete: in this step, the writer removes the reference to the element from +the data structure but does not return the associated memory to the +allocator. This will ensure that new readers will not get a reference to +the removed element. Removing the reference is an atomic operation. + +Free(Reclaim): in this step, the writer returns the memory to the +memory allocator, only after knowing that all the readers have stopped +referencing the deleted element. + +This library helps the writer determine when it is safe to free the +memory. + +This library makes use of thread Quiescent State (QS). + +What is Quiescent State +----------------------- +Quiescent State can be defined as 'any point in the thread execution where the +thread does not hold a reference to shared memory'. It is up to the application +to determine its quiescent state. + +Let us consider the following diagram: + +.. figure:: img/rcu_general_info.* + + +As shown, reader thread 1 accesses data structures D1 and D2. When it is +accessing D1, if the writer has to remove an element from D1, the +writer cannot free the memory associated with that element immediately. +The writer can return the memory to the allocator only after the reader +stops referencing D1. In other words, reader thread RT1 has to enter +a quiescent state. + +Similarly, since reader thread 2 is also accessing D1, writer has to +wait till thread 2 enters quiescent state as well. + +However, the writer does not need to wait for reader thread 3 to enter +quiescent state. Reader thread 3 was not accessing D1 when the delete +operation happened. So, reader thread 1 will not have a reference to the +deleted entry. + +It can be noted that, the critical sections for D2 is a quiescent state +for D1. i.e. for a given data structure Dx, any point in the thread execution +that does not reference Dx is a quiescent state. + +Since memory is not freed immediately, there might be a need for +provisioning of additional memory, depending on the application requirements. + +Factors affecting RCU mechanism +--------------------------------- + +It is important to make sure that this library keeps the overhead of +identifying the end of grace period and subsequent freeing of memory, +to a minimum. The following paras explain how grace period and critical +section affect this overhead. + +The writer has to poll the readers to identify the end of grace period. +Polling introduces memory accesses and wastes CPU cycles. The memory +is not available for reuse during grace period. Longer grace periods +exasperate these conditions. + +The length of the critical section and the number of reader threads +is proportional to the duration of the grace period. Keeping the critical +sections smaller will keep the grace period smaller. However, keeping the +critical sections smaller requires additional CPU cycles (due to additional +reporting) in the readers. + +Hence, we need the characteristics of small grace period and large critical +section. This library addresses this by allowing the writer to do +other work without having to block till the readers report their quiescent +state. + +RCU in DPDK +----------- + +For DPDK applications, the start and end of while(1) loop (where no +references to shared data structures are kept) act as perfect quiescent +states. This will combine all the shared data structure accesses into a +single, large critical section which helps keep the overhead on the +reader side to a minimum. + +DPDK supports pipeline model of packet processing and service cores. +In these use cases, a given data structure may not be used by all the +workers in the application. The writer does not have to wait for all +the workers to report their quiescent state. To provide the required +flexibility, this library has a concept of QS variable. The application +can create one QS variable per data structure to help it track the +end of grace period for each data structure. This helps keep the grace +period to a minimum. + +How to use this library +----------------------- + +The application must allocate memory and initialize a QS variable. + +Application can call ``rte_rcu_qsbr_get_memsize`` to calculate the size +of memory to allocate. This API takes maximum number of reader threads, +using this variable, as a parameter. Currently, a maximum of 1024 threads +are supported. + +Further, the application can initialize a QS variable using the API +``rte_rcu_qsbr_init``. + +Each reader thread is assumed to have a unique thread ID. Currently, the +management of the thread ID (for ex: allocation/free) is left to the +application. The thread ID should be in the range of 0 to +maximum number of threads provided while creating the QS variable. +The application could also use lcore_id as the thread ID where applicable. + +``rte_rcu_qsbr_thread_register`` API will register a reader thread +to report its quiescent state. This can be called from a reader thread. +A control plane thread can also call this on behalf of a reader thread. +The reader thread must call ``rte_rcu_qsbr_thread_online`` API to start +reporting its quiescent state. + +Some of the use cases might require the reader threads to make +blocking API calls (for ex: while using eventdev APIs). The writer thread +should not wait for such reader threads to enter quiescent state. +The reader thread must call ``rte_rcu_qsbr_thread_offline`` API, before calling +blocking APIs. It can call ``rte_rcu_qsbr_thread_online`` API once the blocking +API call returns. + +The writer thread can trigger the reader threads to report their quiescent +state by calling the API ``rte_rcu_qsbr_start``. It is possible for multiple +writer threads to query the quiescent state status simultaneously. Hence, +``rte_rcu_qsbr_start`` returns a token to each caller. + +The writer thread must call ``rte_rcu_qsbr_check`` API with the token to +get the current quiescent state status. Option to block till all the reader +threads enter the quiescent state is provided. If this API indicates that +all the reader threads have entered the quiescent state, the application +can free the deleted entry. + +The APIs ``rte_rcu_qsbr_start`` and ``rte_rcu_qsbr_check`` are lock free. +Hence, they can be called concurrently from multiple writers even while +running as worker threads. + +The separation of triggering the reporting from querying the status provides +the writer threads flexibility to do useful work instead of blocking for the +reader threads to enter the quiescent state or go offline. This reduces the +memory accesses due to continuous polling for the status. + +``rte_rcu_qsbr_synchronize`` API combines the functionality of +``rte_rcu_qsbr_start`` and blocking ``rte_rcu_qsbr_check`` into a single API. +This API triggers the reader threads to report their quiescent state and +polls till all the readers enter the quiescent state or go offline. This +API does not allow the writer to do useful work while waiting and +introduces additional memory accesses due to continuous polling. + +The reader thread must call ``rte_rcu_qsbr_thread_offline`` and +``rte_rcu_qsbr_thread_unregister`` APIs to remove itself from reporting its +quiescent state. The ``rte_rcu_qsbr_check`` API will not wait for this reader +thread to report the quiescent state status anymore. + +The reader threads should call ``rte_rcu_qsbr_quiescent`` API to indicate that +they entered a quiescent state. This API checks if a writer has triggered a +quiescent state query and update the state accordingly. + +The ``rte_rcu_qsbr_lock`` and ``rte_rcu_qsbr_unlock`` are empty functions. +However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid +in debugging issues. One can mark the access to shared data structures on the +reader side using these APIs. The ``rte_rcu_qsbr_quiescent`` will check if +all the locks are unlocked.