From patchwork Mon Dec 7 18:05:10 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Bill Fischofer X-Patchwork-Id: 57812 Delivered-To: patch@linaro.org Received: by 10.112.147.194 with SMTP id tm2csp1310430lbb; Mon, 7 Dec 2015 10:05:42 -0800 (PST) X-Received: by 10.55.82.193 with SMTP id g184mr38674992qkb.65.1449511542686; Mon, 07 Dec 2015 10:05:42 -0800 (PST) Return-Path: Received: from lists.linaro.org (lists.linaro.org. [54.225.227.206]) by mx.google.com with ESMTP id f189si27102199qhe.127.2015.12.07.10.05.42; Mon, 07 Dec 2015 10:05:42 -0800 (PST) Received-SPF: pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) client-ip=54.225.227.206; Authentication-Results: mx.google.com; spf=pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) smtp.mailfrom=lng-odp-bounces@lists.linaro.org; dkim=neutral (body hash did not verify) header.i=@linaro-org.20150623.gappssmtp.com Received: by lists.linaro.org (Postfix, from userid 109) id C8EFC61CB2; Mon, 7 Dec 2015 18:05:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on ip-10-142-244-252 X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, T_DKIM_INVALID, URIBL_BLOCKED autolearn=disabled version=3.4.0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by lists.linaro.org (Postfix) with ESMTP id 6537C61B78; Mon, 7 Dec 2015 18:05:25 +0000 (UTC) X-Original-To: lng-odp@lists.linaro.org Delivered-To: lng-odp@lists.linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id BE86961C66; Mon, 7 Dec 2015 18:05:20 +0000 (UTC) Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by lists.linaro.org (Postfix) with ESMTPS id EDD1561A23 for ; Mon, 7 Dec 2015 18:05:17 +0000 (UTC) Received: by obciw8 with SMTP id iw8so31922576obc.1 for ; Mon, 07 Dec 2015 10:05:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; bh=LjrhYNO6B6/64ajFdacq3Y4LNA9RQQ99tAnxnkmyFKM=; b=qBvUEpL8D019A6TNK2bF7LAVM6Umt1Qvy1cCjcLgQgM47tU6NCiMA8pu6myaIm8Fxl ebcW4NGj4sgDNYkUOkMiA3AmkYfMqMZHwl5LCA5I8T2BDqXQFlba+D+/3ueJcpzjVBVr aNTha/4WyFBTjrRVMP1lhNHMfMFcpJfRWOfcIJR5yHFbeEAA/y12mazHvkbNJNCveKWg h9g68q2VndZd3y6oSqIGhUkysz0eFQwajiQU1r6m7yDnS5RGVl5ZdxmAmAwMxSq7zKOa CtScOCDqTdlNhinZhPFIa+CnnLjpNCz7lAoZt6cFw/qUO6VV9GKQa5tuvkE/ncypMFtT 3CGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-type:content-transfer-encoding; bh=LjrhYNO6B6/64ajFdacq3Y4LNA9RQQ99tAnxnkmyFKM=; b=PTszuP39DcM8AHSLLyxdyZB+/wT4yot9ZcKx9yYQgmqPOW4xnC2CRrf5yXfAUMDMx1 oayhT21AZGjGgiZ7fucIpXPVsi3ebUfqCcxRwHz7ELC9d4jZ18H47I3Fv609iPK6LAmz zer7UgWy3mGiiWqX8m9AfpkYWGb1eBqPQre/YdP69kftiiXj21gEFM4XMGoyPYEikxFq jDpvyi09j5ANosZJe3Q/9j+RFtXWZGbqQf2YU4qOo9nXK2mav+XrxqqBeIO5vjYM//br bLMFZQWlF3cHlLFU8/7qXYoJSw5nmFauer0o7ztVifuJ4FqcdWyKvkjuaH9DNSE04JBa qYvQ== X-Gm-Message-State: ALoCoQnyZ8yJhbhqP9nT1twgXhd4V6ZCI6QXQUSKy4VcJodacbLJI8y9KiGIVOvO5RkYJAbM/2jT X-Received: by 10.60.64.103 with SMTP id n7mr24284940oes.71.1449511517309; Mon, 07 Dec 2015 10:05:17 -0800 (PST) Received: from Ubuntu15.localdomain (cpe-66-68-129-43.austin.res.rr.com. [66.68.129.43]) by smtp.gmail.com with ESMTPSA id cu1sm12054882oeb.1.2015.12.07.10.05.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Dec 2015 10:05:16 -0800 (PST) From: Bill Fischofer To: lng-odp@lists.linaro.org Date: Mon, 7 Dec 2015 12:05:10 -0600 Message-Id: <1449511510-2316-1-git-send-email-bill.fischofer@linaro.org> X-Mailer: git-send-email 2.1.4 MIME-Version: 1.0 X-Topics: patch Subject: [lng-odp] [PATCHv2] doc: userguide: add application programming section X-BeenThere: lng-odp@lists.linaro.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: "The OpenDataPlane \(ODP\) List" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: lng-odp-bounces@lists.linaro.org Sender: "lng-odp" Continue the refinement of the user guide, completing the reformatting to standard asciidoc style and expanding the ODP Applicaition Programming section, including a reorganized and expanded discussion of ODP queues. Signed-off-by: Bill Fischofer --- doc/users-guide/users-guide.adoc | 451 +++++++++++++++++++++++++++++++-------- 1 file changed, 359 insertions(+), 92 deletions(-) diff --git a/doc/users-guide/users-guide.adoc b/doc/users-guide/users-guide.adoc index cf77fa0..d660fb8 100644 --- a/doc/users-guide/users-guide.adoc +++ b/doc/users-guide/users-guide.adoc @@ -8,16 +8,19 @@ OpenDataPlane (ODP) Users-Guide Abstract -------- This document is intended to guide a new ODP application developer. -Further details about ODP may be found at the http://opendataplane.org[ODP] home page. +Further details about ODP may be found at the http://opendataplane.org[ODP] +home page. .Overview of a system running ODP applications image::../images/overview.png[align="center"] -ODP is an API specification that allows many implementations to provide platform independence, automatic hardware acceleration and CPU scaling to high performance networking applications. -This document describes how to write an application that can successfully take advantage of the API. +ODP is an API specification that allows many implementations to provide +platform independence, automatic hardware acceleration and CPU scaling to +high performance networking applications. This document describes how to +write an application that can successfully take advantage of the API. :numbered: -== Introduction == +== Introduction .OpenDataPlane Components image::../images/odp_components.png[align="center"] @@ -42,7 +45,7 @@ ODP API specification--that is the responsibility of each ODP implementation. * Application-centric. Covers functional needs of data plane applications. * Ensures portability by specifying the functional behavior of ODP. * Defined jointly and openly by application writers and platform implementers. -* Archiected to be implementable on a wide range of platforms efficiently +* Architected to be implementable on a wide range of platforms efficiently * Sponsored, governed, and maintained by the Linaro Networking Group (LNG) .ODP Implementations @@ -68,7 +71,7 @@ where the application will run on a target platform chosen by someone else. * One size does not fit all--supporting multiple implementations allows ODP to adapt to widely differing internals among platforms. * Anyone can create an ODP implementation tailored to their platform -* Distribution and mainteinance of each implementation is as owner wishes +* Distribution and maintenance of each implementation is as owner wishes - Open source or closed source as business needs determine - Have independent release cycles and service streams * Allows HW and SW innovation in how ODP APIs are implemented on each platform. @@ -100,7 +103,7 @@ drivers supported by DPDK. they are derived from a reference implementation. .ODP Validation Test Suite -Third, to enure consistency between different ODP implementations, ODP +Third, to ensure consistency between different ODP implementations, ODP consists of a validation suite that verifies that any given implementation of ODP faithfully provides the specified functional behavior of each ODP API. As a separate open source component, the validation suite may be used by @@ -115,16 +118,16 @@ ODP API specification. * Key to ensuring application portability across all ODP implementations * Tests that ODP implementations conform to the specified functional behavior of ODP APIs. -* Can be run at any time by users and vendors to validat implementations -od ODP. +* Can be run at any time by users and vendors to validate implementations +of ODP. -=== ODP API Specification Versioning === +=== ODP API Specification Versioning As an evolving standard, the ODP API specification is released under an incrementing version number, and corresponding implementations of ODP, as well as the validation suite that verifies API conformance, are linked to this -version number. ODP versions are specified using a stanard three-level +version number. ODP versions are specified using a standard three-level number (major.minor.fixlevel) that are incremented according to the degree of -change the level represents. Increments to the fixlevel represent clarification +change the level represents. Increments to the fix level represent clarification of the specification or other minor changes that do not affect either the syntax or semantics of the specification. Such changes in the API specification are expected to be rare. Increments to the minor level @@ -136,26 +139,26 @@ the major level represent significant structural changes that most likely require some level of application source code change, again as documented in the release notes for that version. -=== ODP Implementation Versioning === +=== ODP Implementation Versioning ODP implementations are free to use whatever release naming/numbering conventions they wish, as long as it is clear what level of the ODP API a given release implements. A recommended convention is to use the same three level numbering scheme where the major and minor numbers correspond to the ODP API -level and the fixlevel represents an implementation-defined service level +level and the fix level represents an implementation-defined service level associated with that API level implementation. The LNG-supplied ODP reference implementations follow this convention. -=== ODP Validation Test Suite Versioning === +=== ODP Validation Test Suite Versioning The ODP validation test suite follows these same naming conventions. The major and minor release numbers correspond to the ODP API level that the suite -validates and the fixlevel represents the service level of the validation +validates and the fix level represents the service level of the validation suite itself for that API level. -=== ODP Design Goals === +=== ODP Design Goals ODP has three primary goals that follow from its component structure. The first is application portability across a wide range of platforms. These platforms differ in terms of processor instruction set architecture, number and types of -application processing cores, memory oranization, as well as the number and +application processing cores, memory organization, as well as the number and type of platform specific hardware acceleration and offload features that are available. ODP applications can move from one conforming implementation to another with at most a recompile. @@ -175,7 +178,7 @@ of processing cores that are available to realize application function. The result is that an application written to this model does not require redesign as it scales from 4, to 40, to 400 cores. -== Organization of this Document == +== Organization of this Document This document is organized into several sections. The first presents a high level overview of the ODP API component areas and their associated abstract data types. This section introduces ODP APIs at a conceptual level. @@ -190,14 +193,14 @@ full reference specification for each API. The latter is intended to be used by ODP application programmers, as well as implementers, to understand the precise syntax and semantics of each API. -== ODP API Concepts == +== ODP API Concepts ODP programs are built around several conceptual structures that every -appliation programmer needs to be familiar with to use ODP effectively. The +application programmer needs to be familiar with to use ODP effectively. The main ODP concepts are: Thread, Event, Queue, Pool, Shared Memory, Buffer, Packet, PktIO, Timer, and Synchronizer. -=== Thread === +=== Thread The thread is the fundamental programming unit in ODP. ODP applications are organized into a collection of threads that perform the work that the application is designed to do. ODP threads may or may not share memory with @@ -209,7 +212,7 @@ A control thread is a supervisory thread that organizes the operation of worker threads. Worker threads, by contrast, exist to perform the main processing logic of the application and employ a run to completion model. Worker threads, in particular, are intended to operate on -dedicated processing cores, especially in many core proessing environments, +dedicated processing cores, especially in many core processing environments, however a given implementation may multitask multiple threads on a single core if desired (typically on smaller and lower performance target environments). @@ -219,7 +222,7 @@ _thread mask_ and _scheduler group_ that determine where they can run and the type of work that they can handle. These will be discussed in greater detail later. -=== Event === +=== Event Events are what threads process to perform their work. Events can represent new work, such as the arrival of a packet that needs to be processed, or they can represent the completion of requests that have executed asynchronously. @@ -232,7 +235,7 @@ References to events are via handles of abstract type +odp_event_t+. Cast functions are provided to convert these into specific handles of the appropriate type represented by the event. -=== Queue === +=== Queue A queue is a message passing channel that holds events. Events can be added to a queue via enqueue operations or removed from a queue via dequeue operations. The endpoints of a queue will vary depending on how it is used. @@ -244,7 +247,7 @@ stateful processing on events as well as stateless processing. Queues are represented by handles of abstract type +odp_queue_t+. -=== Pool === +=== Pool A pool is a shared memory area from which elements may be drawn. Pools represent the backing store for events, among other things. Pools are typically created and destroyed by the application during initialization and @@ -256,32 +259,32 @@ are Buffer and Packet. Pools are represented by handles of abstract type +odp_pool_t+. -=== Shared Memory === +=== Shared Memory Shared memory represents raw blocks of storage that are sharable between threads. They are the building blocks of pools but can be used directly by ODP applications if desired. Shared memory is represented by handles of abstract type +odp_shm_t+. -=== Buffer === +=== Buffer A buffer is a fixed sized block of shared storage that is used by ODP components and/or applications to realize their function. Buffers contain zero or more bytes of application data as well as system maintained metadata that provide information about the buffer, such as its size or the pool it was allocated from. Metadata is an important ODP concept because it allows for arbitrary amounts of side information to be associated with an -ODP object. Most ODP objects have assocaited metadata and this metadata is +ODP object. Most ODP objects have associated metadata and this metadata is manipulated via accessor functions that act as getters and setters for -this information. Getter acces functions permit an application to read +this information. Getter access functions permit an application to read a metadata item, while setter access functions permit an application to write a metadata item. Note that some metadata is inherently read only and thus no setter is provided to manipulate it. When object have multiple metadata items, each has its own associated getter and/or setter access function to inspect or manipulate it. -Buffers are represened by handles of abstract type +odp_buffer_t+. +Buffers are represented by handles of abstract type +odp_buffer_t+. -=== Packet === +=== Packet Packets are received and transmitted via I/O interfaces and represent the basic data that data plane applications manipulate. Packets are drawn from pools of type +ODP_POOL_PACKET+. @@ -294,7 +297,7 @@ with each packet for its own use. Packets are represented by handles of abstract type +odp_packet_t+. -=== PktIO === +=== PktIO PktIO is how ODP represents I/O interfaces. A pktio object is a logical port capable of receiving and/or transmitting packets. This may be directly supported by the underlying platform as an integrated feature, @@ -302,7 +305,7 @@ or may represent a device attached via a PCIE or other bus. PktIOs are represented by handles of abstract type +odp_pktio_t+. -=== Timer === +=== Timer Timers are how ODP applications measure and respond to the passage of time. Timers are drawn from specialized pools called timer pools that have their own abstract type (+odp_timer_pool_t+). Applications may have many timers @@ -310,7 +313,7 @@ active at the same time and can set them to use either relative or absolute time. When timers expire they create events of type +odp_timeout_t+, which serve as notifications of timer expiration. -=== Synchronizer === +=== Synchronizer Multiple threads operating in parallel typically require various synchronization services to permit them to operate in a reliable and coordinated manner. ODP provides a rich set of locks, barriers, and similar @@ -325,7 +328,7 @@ flow of work through an ODP application. These include the Classifier, Scheduler, and Traffic Manager. These components relate to the three main stages of packet processing: Receive, Process, and Transmit. -=== Classifier === +=== Classifier The *Classifier* provides a suite of APIs that control packet receive (RX) processing. @@ -362,8 +365,8 @@ Note that the use of the classifier is optional. Applications may directly receive packets from a corresponding PktIO input queue via direct polling if they choose. -=== Scheduler === -The *Scheduler* provides a suite of APIs that control scalabable event +=== Scheduler +The *Scheduler* provides a suite of APIs that control scalable event processing. .ODP Scheduler and Event Processing @@ -391,10 +394,10 @@ scheduled back to a thread to continue processing with the results of the requested asynchronous operation. Threads themselves can enqueue events to queues for downstream processing -by other threads, permitting flexibility in how applicaitions structure +by other threads, permitting flexibility in how applications structure themselves to maximize concurrency. -=== Traffic Manager === +=== Traffic Manager The *Traffic Manager* provides a suite of APIs that control traffic shaping and Quality of Service (QoS) processing for packet output. @@ -413,23 +416,33 @@ goals. Again, the advantage here is that on many platforms traffic management functions are implemented in hardware, permitting transparent offload of this work. -Glossary --------- -[glossary] -odp_worker:: - An opd_worker is a type of odp_thread. It will usually be isolated from the scheduling of any host operating system and is intended for fast-path processing with a low and predictable latency. Odp_workers will not generally receive interrupts and will run to completion. -odp_control:: - An odp_control is a type of odp_thread. It will be isolated from the host operating system house keeping tasks but will be scheduled by it and may receive interrupts. -odp_thread:: - An odp_thread is a flow of execution that in a Linux environment could be a Linux process or thread. -event:: - An event is a notification that can be placed in a queue. - -The include structure ---------------------- -Applications only include the 'include/odp.h file which includes the 'platform//include/plat' files to provide a complete definition of the API on that platform. -The doxygen documentation defining the behavior of the ODP API is all contained in the public API files, and the actual definitions for an implementation will be found in the per platform directories. -Per-platform data that might normally be a #define can be recovered via the appropriate access function if the #define is not directly visible to the application. +== ODP Application Programming +At the highest level, an *ODP Application* is a program that uses one or more +ODP APIs. Because ODP is a framework rather than a programming environment, +applications are free to also use other APIs that may or may not provide the +same portability characteristics as ODP APIs. + +ODP applications vary in terms of what they do and how they operate, but in +general all share the following characteristics: + +. They are organized into one or more _threads_ that execute in parallel. +. These threads communicate and coordinate their activities using various +_synchronization_ mechanisms. +. They receive packets from one or more _packet I/O interfaces_. +. They examine, transform, or otherwise process packets. +. They transmit packets to one or more _packet I/O interfaces_. + +ODP provides APIs to assist in each of these areas. + +=== The include structure +Applications only include the 'include/odp.h' file, which includes the +'platform//include/odp' files to provide a complete +definition of the API on that platform. The doxygen documentation defining +the behavior of the ODP API is all contained in the public API files, and the +actual definitions for an implementation will be found in the per platform +directories. Per-platform data that might normally be a +#define+ can be +recovered via the appropriate access function if the #define is not directly +visible to the application. .Users include structure ---- @@ -442,51 +455,305 @@ Per-platform data that might normally be a #define can be recovered via the appr │   └── odp.h This file should be the only file included by the application. ---- -Initialization --------------- -IMPORTANT: ODP depends on the application to perform a graceful shutdown, calling the terminate functions should only be done when the application is sure it has closed the ingress and subsequently drained all queues etc. +=== Initialization +IMPORTANT: ODP depends on the application to perform a graceful shutdown, +calling the terminate functions should only be done when the application is +sure it has closed the ingress and subsequently drained all queues, etc. + +=== Startup +The first API that must be called by an ODP application is 'odp_init_global()'. +This takes two pointers. The first, +odp_init_t+, contains ODP initialization +data that is platform independent and portable, while the second, ++odp_platform_init_t+, is passed unparsed to the implementation +to be used for platform specific data that is not yet, or may never be +suitable for the ODP API. + +Calling odp_init_global() establishes the ODP API framework and MUST be +called before any other ODP API may be called. Note that it is only called +once per application. Following global initialization, each thread in turn +calls 'odp_init_local()' is called. This establishes the local ODP thread +context for that thread and MUST be called before other ODP APIs may be +called by that thread. + +=== Shutdown +Shutdown is the logical reverse of the initialization procedure, with +'odp_term_local()' called for each thread before 'odp_term_global()' is +called to terminate ODP. + +.ODP Application Structure Flow Diagram +image::../images/resource_management.png[align="center"] -Startup -~~~~~~~~ -The first API that must be called is 'odp_init_global()'. -This takes two pointers, odp_init_t contains ODP initialization data that is platform independent and portable. -The second odp_platform_init_t is passed un parsed to the implementation and can be used for platform specific data that is not yet, or may never be suitable for the ODP API. +== Common Conventions +Many ODP APIs share common conventions regarding their arguments and return +types. This section highlights some of the more common and frequently used +conventions. + +=== Handles and Special Designators +ODP resources are represented via _handles_ that have abstract type +_odp_resource_t_. So pools are represented by handles of type +odp_pool_t+, +queues by handles of type +odp_queue_t+, etc. Each such type +has a distinguished type _ODP_RESOURCE_INVALID_ that is used to indicate a +handle that does not refer to a valid resource of that type. Resources are +typically created via an API named _odp_resource_create()_ that returns a +handle of type _odp_resource_t_ that represents the created object. This +returned handle is set to _ODP_RESOURCE_INVALID_ if, for example, the +resource could not be created due to resource exhaustion. Invalid resources +do not necessarily represent error conditions. For example, +ODP_EVENT_INVALID+ +in response to an +odp_queue_deq()+ call to get an event from a queue simply +indicates that the queue is empty. + +=== Addressing Scope +Unless specifically noted in the API, all ODP resources are global to the ODP +application, whether it runs as a single process or multiple processes. ODP +handles therefore have common meaning within an ODP application but have no +meaning outside the scope of the application. + +=== Resources and Names +Many ODP resource objects, such as pools and queues, support an +application-specified character string _name_ that is associated with an ODP +object at create time. This name serves two purposes: documentation, and +lookup. The lookup function is particularly useful to allow an ODP application +that is divided into multiple processes to obtain the handle for the common +resource. + +== Queues +Queues are the fundamental event sequencing mechanism provided by ODP and all +ODP applications make use of them either explicitly or implicitly. Queues are +created via the 'odp_queue_create()' API that returns a handle of type ++odp_queue_t+ that is used to refer to this queue in all subsequent APIs that +reference it. Queues have one of two ODP-defined _types_, POLL, and SCHED that +determine how they are used. POLL queues directly managed by the ODP +application while SCHED queues make use of the *ODP scheduler* to provide +automatic scalable dispatching and synchronization services. + +.Operations on POLL queues +[source,c] +---- +odp_queue_t poll_q1 = odp_queue_create("poll queue 1", ODP_QUEUE_TYPE_POLL, NULL); +odp_queue_t poll_q2 = odp_queue_create("poll queue 2", ODP_QUEUE_TYPE_POLL, NULL); +... +odp_event_t ev = odp_queue_deq(poll_q1); +...do something +int rc = odp_queue_enq(poll_q2, ev); +---- -The second API that must be called is 'odp_init_local()', this must be called once per odp_thread, regardless of odp_thread type. Odp_threads may be of type ODP_THREAD_WORKER or ODP_THREAD_CONTROL +The key distinction is that dequeueing events from POLL queues is an +application responsibility while dequeueing events from SCHED queues is the +responsibility of the ODP scheduler. -Shutdown -~~~~~~~~~ -Shutdown is the logical reverse of the initialization procedure, with 'odp_thread_term()' called for each worker before 'odp_term_global()' is called. +.Operations on SCHED queues +[source,c] +---- +odp_queue_param_t qp; +odp_queue_param_init(&qp); +odp_schedule_prio_t prio = ...; +odp_schedule_group_t sched_group = ...; +qp.sched.prio = prio; +qp.sched.sync = ODP_SCHED_SYNC_[NONE|ATOMIC|ORDERED]; +qp.sched.group = sched_group; +qp.lock_count = n; /* Only relevant for ordered queues */ +odp_queue_t sched_q1 = odp_queue_create("sched queue 1", ODP_QUEUE_TYPE_SCHED, &qp); + +...thread init processing + +while (1) { + odp_event_t ev; + odp_queue_t which_q; + ev = odp_schedule(&which_q, ); + ...process the event +} +---- -image::../images/resource_management.png[align="center"] +With scheduled queues, events are sent to a queue, and the the sender chooses +a queue based on the service it needs. The sender does not need to know +which ODP thread (on which core) or hardware accelerator will process +the event, but all the events on a queue are eventually scheduled and processed. + +As can be seen, SCHED queues have additional attributes that are specified at +queue create that control how the scheduler is to process events contained +on them. These include group, priority, and synchronization class. + +=== Scheduler Groups +The scheduler's dispatching job is to return the next event from the highest +priority SCHED queue that the caller is eligible to receive events from. +This latter consideration is determined by the queues _scheduler group_, which +is set at queue create time, and by the caller's _scheduler group mask_ that +indicates which scheduler group(s) it belongs to. Scheduler groups are +represented by handles of type +odp_scheduler_group_t+ and are created by +the *odp_scheduler_group_create()* API. A number of scheduler groups are +_predefined_ by ODP. These include +ODP_SCHED_GROUP_ALL+ (all threads), ++ODP_SCHED_GROUP_WORKER+ (all worker threads), and +ODP_SCHED_GROUP_CONTROL+ +(all control threads). The application is free to create additional scheduler +groups for its own purpose and threads can join or leave scheduler groups +using the *odp_scheduler_group_join()* and *odp_scheduler_group_leave()* APIs + +=== Scheduler Priority +The +prio+ field of the +odp_queue_param_t+ specifies the queue's scheduling +priority, which is how queues within eligible scheduler groups are selected +for dispatch. Queues have a default scheduling priority of NORMAL but can be +set to HIGHEST or LOWEST according to application needs. + +=== Scheduler Synchronization +In addition to its dispatching function, which provide automatic scalability to +ODP applications in many core environments, the other main function of the +scheduler is to provide event synchronization services that greatly simplify +application programming in a parallel processing environment. A queue's +SYNC mode determines how the scheduler handles the synchronization processing +of multiple events originating from the same queue. + +Three types of queue scheduler synchronization area supported: Parallel, +Atomic, and Ordered. + +==== Parallel Queues +SCHED queues that specify a sync mode of ODP_SCHED_SYNC_NONE are unrestricted +in how events are processed. + +.Parallel Queue Scheduling +image::../images/parallel_queue.png[align="center"] -Queues ------- -There are three queue types, atomic, ordered and parallel. -A queue belongs to a single odp_worker and a odp_worker may have multiple queues. +All events held on parallel queues are eligible to be scheduled simultaneously +and any required synchronization between them is the responsibility of the +application. Events originating from parallel queues thus have the highest +throughput rate, however they also potentially involve the most work on the +part of the application. In the Figure above, four threads are calling +*odp_schedule()* to obtain events to process. The scheduler has assigned +three events from the first queue to three threads in parallel. The fourth +thread is processing a single event from the third queue. The second queue +might either be empty, of lower priority, or not in a scheduler group matching +any of the threads being serviced by the scheduler. + +=== Atomic Queues +Atomic queues simplify event synchronization because only a single event +from a given atomic queue may be processed at a time. Events scheduled from +atomic queues thus can be processed lock free because the locking is being +done implicitly by the scheduler. + +.Atomic Queue Scheduling +image::../images/atomic_queue.png[align="center"] -Events are sent to a queue, and the the sender chooses a queue based on the service it needs. -The sender does not need to know which odp_worker (on which core) or HW accelerator will process the event, but all the events on a queue are eventually scheduled and processed. +In this example, no matter how many events may be held in an atomic queue, only +one of them can be scheduled at a time. Here two threads process events from +two different atomic queues. Note that there is no synchronization between +different atomic queues, only between events originating from the same atomic +queue. The queue context associated with the atomic queue is held until the +next call to the scheduler or until the application explicitly releases it +via a call to *odp_schedule_release_atomic()*. -NOTE: Both ordered and parallel queue types improve throughput over an atomic queue (due to parallel event processing), but the user has to take care of the context data synchronization (if needed). +Note that while atomic queues simplify programming, the serial nature of +atomic queues will impair scaling. -Atomic Queue -~~~~~~~~~~~~ -Only one event at a time may be processed from a given queue. The processing maintains order and context data synchronization but this will impair scaling. +=== Ordered Queues +Ordered queues provide the best of both worlds by providing the inherent +scaleabilty of parallel queues, with the easy synchronization of atomic +queues. -.Overview Atomic Queue processing -image::../images/atomic_queue.png[align="center"] +.Ordered Queue Scheduling +image::../images/ordered_queue.png[align="center"] -Ordered Queue -~~~~~~~~~~~~~ -An ordered queue will ensure that the sequencing at the output is identical to that of the input, but multiple events may be processed simultaneously and the order is restored before the events move to the next queue +When scheduling events from an ordered queue, the scheduler dispatches multiple +events from the queue in parallel to different threads, however the scheduler +also ensures that the relative sequence of these events on output queues +is identical to their sequence from their originating ordered queue. + +As with atomic queues, the ordering guarantees associated with ordered queues +refer to events originating from the same queue, not for those originating on +different queues. Thus in this figure three thread are processing events 5, 3, +and 4, respectively from the first ordered queue. Regardless of how these +threads complete processing, these events will appear in their original +relative order on their output queue. + +==== Order Preservation +Relative order is preserved independent of whether events are being sent to +different output queues. For example, if some events are sent to output queue +A while others are sent to output queue B then the events on these output +queues will still be in the same relative order as they were on their +originating queue. Similarly, if the processing consumes events so that no +output is issued for some of them (_e.g.,_ as part of IP fragment reassembly +processing) then other events will still be correctly ordered with respect to +these sequence gaps. Finally, if multiple events are enqueued for a given +order (_e.g.,_ as part of packet segmentation processing for MTU +considerations), then each of these events will occupy the originator's +sequence in the target output queue(s). In this case the relative order of these +events will be in the order that the thread issued *odp_queue_enq()* calls for +them. + +The ordered context associated with the dispatch of an event from an ordered +queue lasts until the next scheduler call or until explicitly released by +the thread calling *odp_schedule_release_ordered()*. This call may be used +as a performance advisory that the thread no longer requires ordering +guarantees for the current context. As a result, any subsequent enqueues +within the current scheduler context will be treated as if the thread was +operating in a parallel queue context. + +==== Ordered Locking +Another powerful feature of the scheduler's handling of ordered queues is +*ordered locks*. Each ordered queue has associated with it a number of ordered +locks as specified by the _lock_count_ parameter at queue create time. + +Ordered locks provide an efficient means to perform in-order sequential +processing within an ordered context. For example, supposed events with relative +order 5, 6, and 7 are executing in parallel by three different threads. An +ordered lock will enable these threads to synchronize such that they can +perform some critical section in their originating queue order. The number of +ordered locks supported for each ordered queue is implementation dependent (and +queryable via the *odp_config_max_ordered_locks_per_queue()* API). If the +implementation supports multiple ordered locks then these may be used to +protect different ordered critical sections within a given ordered context. + +==== Summary: Ordered Queues +To see how these considerations fit together, consider the following code: + +.Processing with Ordered Queues +[source,c] +---- +void worker_thread() +{ + odp_init_local(); + ...other initialization processing + + while (1) { + ev = odp_schedule(&which_q, ODP_SCHED_WAIT); + ...process events in parallel + odp_schedule_order_lock(0); + ...critical section processed in order + odp_schedule_order_unlock(0); + ...continue processing in parallel + odp_queue_enq(dest_q, ev); + } +} +---- -.Overview Ordered Queue processing -image::../images/ordered_queue.png[align="center"] +This represents a simplified structure for a typical worker thread operating +on ordered queues. Multiple events are processed in parallel and the use of +ordered queues ensures that they will be placed on +dest_q+ in the same order +as they originated. While processing in parallel, the use of ordered locks +enables critical sections to be processed in order within the overall parallel +flow. When a thread arrives at the _odp_schedule_order_lock()_ call, it waits +until the locking order for this lock for all prior events has been resolved +and then enters the critical section. The _odp_schedule_order_unlock()_ call +releases the critical section and allows the next order to enter it. -Parallel Queue -~~~~~~~~~~~~~~ -There are no restrictions on the number of events being processed. +=== Queue Scheduling Summary -.Overview parallel Queue processing -image::../images/parallel_queue.png[align="center"] +NOTE: Both ordered and parallel queues improve throughput over atomic queues +due to parallel event processing, but require that the application take +steps to ensure context data synchronization if needed. + +== Glossary +[glossary] +worker thread:: + A worker is a type of ODP thread. It will usually be isolated from + the scheduling of any host operating system and is intended for fast-path + processing with a low and predictable latency. Worker threads will not + generally receive interrupts and will run to completion. +control thread:: + A control threadis a type of ODP thread. It will be isolated from the host + operating system house keeping tasks but will be scheduled by it and may + receive interrupts. +thread:: + An ODP thread is a flow of execution that in a Linux environment could be + a Linux process or thread. +event:: + An event is a notification that can be placed in a queue. +queue:: + A communication channel that holds events