diff mbox

[PATCHv4] doc: userguide: add application programming section

Message ID 1449854593-19370-1-git-send-email-bill.fischofer@linaro.org
State Accepted
Commit a8ae5e08d23be6464fb3352d79d13e1995e40f40
Headers show

Commit Message

Bill Fischofer Dec. 11, 2015, 5:23 p.m. UTC
Complete the reformatting to standard asciidoc style, expand the
ODP Application Programming section, and include a reorganized and
expanded discussion of ODP queues.

Signed-off-by: Bill Fischofer <bill.fischofer@linaro.org>
---
 doc/users-guide/users-guide.adoc | 450 +++++++++++++++++++++++++++++++--------
 1 file changed, 358 insertions(+), 92 deletions(-)

Comments

Mike Holmes Dec. 11, 2015, 6:51 p.m. UTC | #1
On 11 December 2015 at 12:23, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> Complete the reformatting to standard asciidoc style, expand the

> ODP Application Programming section, and include a reorganized and

> expanded discussion of ODP queues.

>

> Signed-off-by: Bill Fischofer <bill.fischofer@linaro.org>

>


Reviewed-by: Mike Holmes <mike.holmes@linaro.org>


Mike



> ---

>  doc/users-guide/users-guide.adoc | 450

> +++++++++++++++++++++++++++++++--------

>  1 file changed, 358 insertions(+), 92 deletions(-)

>

> diff --git a/doc/users-guide/users-guide.adoc

> b/doc/users-guide/users-guide.adoc

> index cf77fa0..2e30f3a 100644

> --- a/doc/users-guide/users-guide.adoc

> +++ b/doc/users-guide/users-guide.adoc

> @@ -8,16 +8,19 @@ OpenDataPlane (ODP)  Users-Guide

>  Abstract

>  --------

>  This document is intended to guide a new ODP application developer.

> -Further details about ODP may be found at the http://opendataplane.org[ODP]

> home page.

> +Further details about ODP may be found at the http://opendataplane.org

> [ODP]

> +home page.

>

>  .Overview of a system running ODP applications

>  image::../images/overview.png[align="center"]

>

> -ODP is an API specification that allows many implementations to provide

> platform independence, automatic hardware acceleration and CPU scaling to

> high performance networking  applications.

> -This document describes how to write an application that can successfully

> take advantage of the API.

> +ODP is an API specification that allows many implementations to provide

> +platform independence, automatic hardware acceleration and CPU scaling to

> +high performance networking  applications. This document describes how to

> +write an application that can successfully take advantage of the API.

>

>  :numbered:

> -== Introduction ==

> +== Introduction

>  .OpenDataPlane Components

>  image::../images/odp_components.png[align="center"]

>

> @@ -42,7 +45,7 @@ ODP API specification--that is the responsibility of

> each ODP implementation.

>  * Application-centric.  Covers functional needs of data plane

> applications.

>  * Ensures portability by specifying the functional behavior of ODP.

>  * Defined jointly and openly by application writers and platform

> implementers.

> -* Archiected to be implementable on a wide range of platforms efficiently

> +* Architected to be implementable on a wide range of platforms efficiently

>  * Sponsored, governed, and maintained by the Linaro Networking Group (LNG)

>

>  .ODP Implementations

> @@ -68,7 +71,7 @@ where the application will run on a target platform

> chosen by someone else.

>  * One size does not fit all--supporting multiple implementations allows

> ODP

>  to adapt to widely differing internals among platforms.

>  * Anyone can create an ODP implementation tailored to their platform

> -* Distribution and mainteinance of each implementation is as owner wishes

> +* Distribution and maintenance of each implementation is as owner wishes

>    - Open source or closed source as business needs determine

>    - Have independent release cycles and service streams

>  * Allows HW and SW innovation in how ODP APIs are implemented on each

> platform.

> @@ -100,7 +103,7 @@ drivers supported by DPDK.

>  they are derived from a reference implementation.

>

>  .ODP Validation Test Suite

> -Third, to enure consistency between different ODP implementations, ODP

> +Third, to ensure consistency between different ODP implementations, ODP

>  consists of a validation suite that verifies that any given

> implementation of

>  ODP faithfully provides the specified functional behavior of each ODP API.

>  As a separate open source component, the validation suite may be used by

> @@ -115,16 +118,16 @@ ODP API specification.

>  * Key to ensuring application portability across all ODP implementations

>  * Tests that ODP implementations conform to the specified functional

> behavior

>  of ODP APIs.

> -* Can be run at any time by users and vendors to validat implementations

> -od ODP.

> +* Can be run at any time by users and vendors to validate implementations

> +of ODP.

>

> -=== ODP API Specification Versioning ===

> +=== ODP API Specification Versioning

>  As an evolving standard, the ODP API specification is released under an

>  incrementing version number, and corresponding implementations of ODP, as

> well

>  as the validation suite that verifies API conformance, are linked to this

> -version number. ODP versions are specified using a stanard three-level

> +version number. ODP versions are specified using a standard three-level

>  number (major.minor.fixlevel) that are incremented according to the

> degree of

> -change the level represents. Increments to the fixlevel represent

> clarification

> +change the level represents. Increments to the fix level represent

> clarification

>  of the specification or other minor changes that do not affect either the

>  syntax or semantics of the specification. Such changes in the API

> specification

>  are expected to be rare. Increments to the minor level

> @@ -136,26 +139,26 @@ the major level represent significant structural

> changes that most likely

>  require some level of application source code change, again as documented

> in

>  the release notes for that version.

>

> -=== ODP Implementation Versioning ===

> +=== ODP Implementation Versioning

>  ODP implementations are free to use whatever release naming/numbering

>  conventions they wish, as long as it is clear what level of the ODP API a

> given

>  release implements. A recommended convention is to use the same three

> level

>  numbering scheme where the major and minor numbers correspond to the ODP

> API

> -level and the fixlevel represents an implementation-defined service level

> +level and the fix level represents an implementation-defined service level

>  associated with that API level implementation. The LNG-supplied ODP

> reference

>  implementations follow this convention.

>

> -=== ODP Validation Test Suite Versioning ===

> +=== ODP Validation Test Suite Versioning

>  The ODP validation test suite follows these same naming conventions. The

> major

>  and minor release numbers correspond to the ODP API level that the suite

> -validates and the fixlevel represents the service level of the validation

> +validates and the fix level represents the service level of the validation

>  suite itself for that API level.

>

> -=== ODP Design Goals ===

> +=== ODP Design Goals

>  ODP has three primary goals that follow from its component structure. The

> first

>  is application portability across a wide range of platforms. These

> platforms

>  differ in terms of processor instruction set architecture, number and

> types of

> -application processing cores, memory oranization, as well as the number

> and

> +application processing cores, memory organization, as well as the number

> and

>  type of platform specific hardware acceleration and offload features that

>  are available. ODP applications can move from one conforming

> implementation

>  to another with at most a recompile.

> @@ -175,7 +178,7 @@ of processing cores that are available to realize

> application function. The

>  result is that an application written to this model does not require

> redesign

>  as it scales from 4, to 40, to 400 cores.

>

> -== Organization of this Document ==

> +== Organization of this Document

>  This document is organized into several sections. The first presents a

> high

>  level overview of the ODP API component areas and their associated

> abstract

>  data types. This section introduces ODP APIs at a conceptual level.

> @@ -190,14 +193,14 @@ full reference specification for each API. The

> latter is intended to be used

>  by ODP application programmers, as well as implementers, to understand the

>  precise syntax and semantics of each API.

>

> -== ODP API Concepts ==

> +== ODP API Concepts

>  ODP programs are built around several conceptual structures that every

> -appliation programmer needs to be familiar with to use ODP effectively.

> The

> +application programmer needs to be familiar with to use ODP effectively.

> The

>  main ODP concepts are:

>  Thread, Event, Queue, Pool, Shared Memory, Buffer, Packet, PktIO, Timer,

>  and Synchronizer.

>

> -=== Thread ===

> +=== Thread

>  The thread is the fundamental programming unit in ODP.  ODP applications

> are

>  organized into a collection of threads that perform the work that the

>  application is designed to do. ODP threads may or may not share memory

> with

> @@ -209,7 +212,7 @@ A control thread is a supervisory thread that organizes

>  the operation of worker threads. Worker threads, by contrast, exist to

>  perform the main processing logic of the application and employ a run to

>  completion model. Worker threads, in particular, are intended to operate

> on

> -dedicated processing cores, especially in many core proessing

> environments,

> +dedicated processing cores, especially in many core processing

> environments,

>  however a given implementation may multitask multiple threads on a single

>  core if desired (typically on smaller and lower performance target

>  environments).

> @@ -219,7 +222,7 @@ _thread mask_ and _scheduler group_ that determine

> where they can run and

>  the type of work that they can handle. These will be discussed in greater

>  detail later.

>

> -=== Event ===

> +=== Event

>  Events are what threads process to perform their work. Events can

> represent

>  new work, such as the arrival of a packet that needs to be processed, or

> they

>  can represent the completion of requests that have executed

> asynchronously.

> @@ -232,7 +235,7 @@ References to events are via handles of abstract type

> +odp_event_t+. Cast

>  functions are provided to convert these into specific handles of the

>  appropriate type represented by the event.

>

> -=== Queue ===

> +=== Queue

>  A queue is a message passing channel that holds events.  Events can be

>  added to a queue via enqueue operations or removed from a queue via

> dequeue

>  operations. The endpoints of a queue will vary depending on how it is

> used.

> @@ -244,7 +247,7 @@ stateful processing on events as well as stateless

> processing.

>

>  Queues are represented by handles of abstract type +odp_queue_t+.

>

> -=== Pool ===

> +=== Pool

>  A pool is a shared memory area from which elements may be drawn. Pools

>  represent the backing store for events, among other things. Pools are

>  typically created and destroyed by the application during initialization

> and

> @@ -256,32 +259,32 @@ are Buffer and Packet.

>

>  Pools are represented by handles of abstract type +odp_pool_t+.

>

> -=== Shared Memory ===

> +=== Shared Memory

>  Shared memory represents raw blocks of storage that are sharable between

>  threads. They are the building blocks of pools but can be used directly by

>  ODP applications if desired.

>

>  Shared memory is represented by handles of abstract type +odp_shm_t+.

>

> -=== Buffer ===

> +=== Buffer

>  A buffer is a fixed sized block of shared storage that is used by ODP

>  components and/or applications to realize their function. Buffers contain

>  zero or more bytes of application data as well as system maintained

>  metadata that provide information about the buffer, such as its size or

> the

>  pool it was allocated from. Metadata is an important ODP concept because

> it

>  allows for arbitrary amounts of side information to be associated with an

> -ODP object. Most ODP objects have assocaited metadata and this metadata is

> +ODP object. Most ODP objects have associated metadata and this metadata is

>  manipulated via accessor functions that act as getters and setters for

> -this information. Getter acces functions permit an application to read

> +this information. Getter access functions permit an application to read

>  a metadata item, while setter access functions permit an application to

> write

>  a metadata item. Note that some metadata is inherently read only and thus

>  no setter is provided to manipulate it.  When object have multiple

> metadata

>  items, each has its own associated getter and/or setter access function to

>  inspect or manipulate it.

>

> -Buffers are represened by handles of abstract type +odp_buffer_t+.

> +Buffers are represented by handles of abstract type +odp_buffer_t+.

>

> -=== Packet ===

> +=== Packet

>  Packets are received and transmitted via I/O interfaces and represent

>  the basic data that data plane applications manipulate.

>  Packets are drawn from pools of type +ODP_POOL_PACKET+.

> @@ -294,7 +297,7 @@ with each packet for its own use.

>

>  Packets are represented by handles of abstract type +odp_packet_t+.

>

> -=== PktIO ===

> +=== PktIO

>  PktIO is how ODP represents I/O interfaces. A pktio object is a logical

>  port capable of receiving and/or transmitting packets. This may be

> directly

>  supported by the underlying platform as an integrated feature,

> @@ -302,7 +305,7 @@ or may represent a device attached via a PCIE or other

> bus.

>

>  PktIOs are represented by handles of abstract type +odp_pktio_t+.

>

> -=== Timer ===

> +=== Timer

>  Timers are how ODP applications measure and respond to the passage of

> time.

>  Timers are drawn from specialized pools called timer pools that have their

>  own abstract type (+odp_timer_pool_t+). Applications may have many timers

> @@ -310,7 +313,7 @@ active at the same time and can set them to use either

> relative or absolute

>  time. When timers expire they create events of type +odp_timeout_t+, which

>  serve as notifications of timer expiration.

>

> -=== Synchronizer ===

> +=== Synchronizer

>  Multiple threads operating in parallel typically require various

>  synchronization services to permit them to operate in a reliable and

>  coordinated manner. ODP provides a rich set of locks, barriers, and

> similar

> @@ -325,7 +328,7 @@ flow of work through an ODP application. These include

> the Classifier,

>  Scheduler, and Traffic Manager.  These components relate to the three

>  main stages of packet processing: Receive, Process, and Transmit.

>

> -=== Classifier ===

> +=== Classifier

>  The *Classifier* provides a suite of APIs that control packet receive (RX)

>  processing.

>

> @@ -362,8 +365,8 @@ Note that the use of the classifier is optional.

> Applications may directly

>  receive packets from a corresponding PktIO input queue via direct polling

>  if they choose.

>

> -=== Scheduler ===

> -The *Scheduler* provides a suite of APIs that control scalabable event

> +=== Scheduler

> +The *Scheduler* provides a suite of APIs that control scalable event

>  processing.

>

>  .ODP Scheduler and Event Processing

> @@ -391,10 +394,10 @@ scheduled back to a thread to continue processing

> with the results of the

>  requested asynchronous operation.

>

>  Threads themselves can enqueue events to queues for downstream processing

> -by other threads, permitting flexibility in how applicaitions structure

> +by other threads, permitting flexibility in how applications structure

>  themselves to maximize concurrency.

>

> -=== Traffic Manager ===

> +=== Traffic Manager

>  The *Traffic Manager* provides a suite of APIs that control traffic

> shaping and

>  Quality of Service (QoS) processing for packet output.

>

> @@ -413,23 +416,33 @@ goals. Again, the advantage here is that on many

> platforms traffic management

>  functions are implemented in hardware, permitting transparent offload of

>  this work.

>

> -Glossary

> ---------

> -[glossary]

> -odp_worker::

> -    An opd_worker is a type of odp_thread. It will usually be isolated

> from the scheduling of any host operating system and is intended for

> fast-path processing with a low and predictable latency. Odp_workers will

> not generally receive interrupts and will run to completion.

> -odp_control::

> -    An odp_control is a type of odp_thread. It will be isolated from the

> host operating system house keeping tasks but will be scheduled by it and

> may receive interrupts.

> -odp_thread::

> -    An odp_thread is a flow of execution that in a Linux environment

> could be a Linux process or thread.

> -event::

> -    An event is a notification that can be placed in a queue.

> -

> -The include structure

> ----------------------

> -Applications only include the 'include/odp.h file which includes the

> 'platform/<implementation name>/include/plat' files to provide a complete

> definition of the API on that platform.

> -The doxygen documentation defining the behavior of the ODP API is all

> contained in the public API files, and the actual definitions for an

> implementation will be found in the per platform directories.

> -Per-platform data that might normally be a #define can be recovered via

> the appropriate access function if the #define is not directly visible to

> the application.

> +== ODP Application Programming

> +At the highest level, an *ODP Application* is a program that uses one or

> more

> +ODP APIs. Because ODP is a framework rather than a programming

> environment,

> +applications are free to also use other APIs that may or may not provide

> the

> +same portability characteristics as ODP APIs.

> +

> +ODP applications vary in terms of what they do and how they operate, but

> in

> +general all share the following characteristics:

> +

> +. They are organized into one or more _threads_ that execute in parallel.

> +. These threads communicate and coordinate their activities using various

> +_synchronization_ mechanisms.

> +. They receive packets from one or more _packet I/O interfaces_.

> +. They examine, transform, or otherwise process packets.

> +. They transmit packets to one or more _packet I/O interfaces_.

> +

> +ODP provides APIs to assist in each of these areas.

> +

> +=== The include structure

> +Applications only include the 'include/odp.h' file, which includes the

> +'platform/<implementation name>/include/odp' files to provide a complete

> +definition of the API on that platform. The doxygen documentation defining

> +the behavior of the ODP API is all contained in the public API files, and

> the

> +actual definitions for an implementation will be found in the per platform

> +directories. Per-platform data that might normally be a +#define+ can be

> +recovered via the appropriate access function if the #define is not

> directly

> +visible to the application.

>

>  .Users include structure

>  ----

> @@ -442,51 +455,304 @@ Per-platform data that might normally be a #define

> can be recovered via the appr

>  │   └── odp.h   This file should be the only file included by the

> application.

>  ----

>

> -Initialization

> ---------------

> -IMPORTANT: ODP depends on the application to perform a graceful shutdown,

> calling the terminate functions should only be done when the application is

> sure it has closed the ingress and subsequently drained all queues etc.

> +=== Initialization

> +IMPORTANT: ODP depends on the application to perform a graceful shutdown,

> +calling the terminate functions should only be done when the application

> is

> +sure it has closed the ingress and subsequently drained all queues, etc.

> +

> +=== Startup

> +The first API that must be called by an ODP application is

> 'odp_init_global()'.

> +This takes two pointers. The first, +odp_init_t+, contains ODP

> initialization

> +data that is platform independent and portable, while the second,

> ++odp_platform_init_t+, is passed unparsed to the implementation

> +to be used for platform specific data that is not yet, or may never be

> +suitable for the ODP API.

> +

> +Calling odp_init_global() establishes the ODP API framework and MUST be

> +called before any other ODP API may be called. Note that it is only called

> +once per application. Following global initialization, each thread in turn

> +calls 'odp_init_local()' is called. This establishes the local ODP thread

> +context for that thread and MUST be called before other ODP APIs may be

> +called by that thread.

> +

> +=== Shutdown

> +Shutdown is the logical reverse of the initialization procedure, with

> +'odp_term_local()' called for each thread before 'odp_term_global()' is

> +called to terminate ODP.

> +

> +.ODP Application Structure Flow Diagram

> +image::../images/resource_management.png[align="center"]

>

> -Startup

> -~~~~~~~~

> -The first API that must be called is 'odp_init_global()'.

> -This takes two pointers, odp_init_t contains ODP initialization data that

> is platform independent and portable.

> -The second odp_platform_init_t is passed un parsed to the  implementation

> and can be used for platform specific data that is not yet, or may never be

> suitable for the ODP API.

> +== Common Conventions

> +Many ODP APIs share common conventions regarding their arguments and

> return

> +types. This section highlights some of the more common and frequently used

> +conventions.

> +

> +=== Handles and Special Designators

> +ODP resources are represented via _handles_ that have abstract type

> +_odp_resource_t_.  So pools are represented by handles of type

> +odp_pool_t+,

> +queues by handles of type +odp_queue_t+, etc. Each such type

> +has a distinguished type _ODP_RESOURCE_INVALID_ that is used to indicate a

> +handle that does not refer to a valid resource of that type. Resources are

> +typically created via an API named _odp_resource_create()_ that returns a

> +handle of type _odp_resource_t_ that represents the created object. This

> +returned handle is set to _ODP_RESOURCE_INVALID_ if, for example, the

> +resource could not be created due to resource exhaustion. Invalid

> resources

> +do not necessarily represent error conditions. For example,

> +ODP_EVENT_INVALID+

> +in response to an +odp_queue_deq()+ call to get an event from a queue

> simply

> +indicates that the queue is empty.

> +

> +=== Addressing Scope

> +Unless specifically noted in the API, all ODP resources are global to the

> ODP

> +application, whether it runs as a single process or multiple processes.

> ODP

> +handles therefore have common meaning within an ODP application but have

> no

> +meaning outside the scope of the application.

> +

> +=== Resources and Names

> +Many ODP resource objects, such as pools and queues, support an

> +application-specified character string _name_ that is associated with an

> ODP

> +object at create time.  This name serves two purposes: documentation, and

> +lookup. The lookup function is particularly useful to allow an ODP

> application

> +that is divided into multiple processes to obtain the handle for the

> common

> +resource.

> +

> +== Queues

> +Queues are the fundamental event sequencing mechanism provided by ODP and

> all

> +ODP applications make use of them either explicitly or implicitly. Queues

> are

> +created via the 'odp_queue_create()' API that returns a handle of type

> ++odp_queue_t+ that is used to refer to this queue in all subsequent APIs

> that

> +reference it. Queues have one of two ODP-defined _types_, POLL, and SCHED

> that

> +determine how they are used. POLL queues directly managed by the ODP

> +application while SCHED queues make use of the *ODP scheduler* to provide

> +automatic scalable dispatching and synchronization services.

> +

> +.Operations on POLL queues

> +[source,c]

> +----

> +odp_queue_t poll_q1 = odp_queue_create("poll queue 1",

> ODP_QUEUE_TYPE_POLL, NULL);

> +odp_queue_t poll_q2 = odp_queue_create("poll queue 2",

> ODP_QUEUE_TYPE_POLL, NULL);

> +...

> +odp_event_t ev = odp_queue_deq(poll_q1);

> +...do something

> +int rc = odp_queue_enq(poll_q2, ev);

> +----

>

> -The second API that must be called is 'odp_init_local()', this must be

> called once per odp_thread, regardless of odp_thread type.  Odp_threads may

> be of type ODP_THREAD_WORKER or ODP_THREAD_CONTROL

> +The key distinction is that dequeueing events from POLL queues is an

> +application responsibility while dequeueing events from SCHED queues is

> the

> +responsibility of the ODP scheduler.

>

> -Shutdown

> -~~~~~~~~~

> -Shutdown is the logical reverse of the initialization procedure, with

> 'odp_thread_term()' called for each worker before 'odp_term_global()' is

> called.

> +.Operations on SCHED queues

> +[source,c]

> +----

> +odp_queue_param_t qp;

> +odp_queue_param_init(&qp);

> +odp_schedule_prio_t prio = ...;

> +odp_schedule_group_t sched_group = ...;

> +qp.sched.prio = prio;

> +qp.sched.sync = ODP_SCHED_SYNC_[NONE|ATOMIC|ORDERED];

> +qp.sched.group = sched_group;

> +qp.lock_count = n; /* Only relevant for ordered queues */

> +odp_queue_t sched_q1 = odp_queue_create("sched queue 1",

> ODP_QUEUE_TYPE_SCHED, &qp);

> +

> +...thread init processing

> +

> +while (1) {

> +        odp_event_t ev;

> +        odp_queue_t which_q;

> +        ev = odp_schedule(&which_q, <wait option>);

> +        ...process the event

> +}

> +----

>

> -image::../images/resource_management.png[align="center"]

> +With scheduled queues, events are sent to a queue, and the the sender

> chooses

> +a queue based on the service it needs. The sender does not need to know

> +which ODP thread (on which core) or hardware accelerator will process

> +the event, but all the events on a queue are eventually scheduled and

> processed.

> +

> +As can be seen, SCHED queues have additional attributes that are

> specified at

> +queue create that control how the scheduler is to process events contained

> +on them. These include group, priority, and synchronization class.

> +

> +=== Scheduler Groups

> +The scheduler's dispatching job is to return the next event from the

> highest

> +priority SCHED queue that the caller is eligible to receive events from.

> +This latter consideration is determined by the queues _scheduler group_,

> which

> +is set at queue create time, and by the caller's _scheduler group mask_

> that

> +indicates which scheduler group(s) it belongs to. Scheduler groups are

> +represented by handles of type +odp_scheduler_group_t+ and are created by

> +the *odp_scheduler_group_create()* API. A number of scheduler groups are

> +_predefined_ by ODP.  These include +ODP_SCHED_GROUP_ALL+ (all threads),

> ++ODP_SCHED_GROUP_WORKER+ (all worker threads), and

> +ODP_SCHED_GROUP_CONTROL+

> +(all control threads). The application is free to create additional

> scheduler

> +groups for its own purpose and threads can join or leave scheduler groups

> +using the *odp_scheduler_group_join()* and *odp_scheduler_group_leave()*

> APIs

> +

> +=== Scheduler Priority

> +The +prio+ field of the +odp_queue_param_t+ specifies the queue's

> scheduling

> +priority, which is how queues within eligible scheduler groups are

> selected

> +for dispatch. Queues have a default scheduling priority of NORMAL but can

> be

> +set to HIGHEST or LOWEST according to application needs.

> +

> +=== Scheduler Synchronization

> +In addition to its dispatching function, which provide automatic

> scalability to

> +ODP applications in many core environments, the other main function of the

> +scheduler is to provide event synchronization services that greatly

> simplify

> +application programming in a parallel processing environment. A queue's

> +SYNC mode determines how the scheduler handles the synchronization

> processing

> +of multiple events originating from the same queue.

> +

> +Three types of queue scheduler synchronization area supported: Parallel,

> +Atomic, and Ordered.

> +

> +==== Parallel Queues

> +SCHED queues that specify a sync mode of ODP_SCHED_SYNC_NONE are

> unrestricted

> +in how events are processed.

> +

> +.Parallel Queue Scheduling

> +image::../images/parallel_queue.png[align="center"]

>

> -Queues

> -------

> -There are three queue types, atomic, ordered and parallel.

> -A queue belongs to a single odp_worker and a odp_worker may have multiple

> queues.

> +All events held on parallel queues are eligible to be scheduled

> simultaneously

> +and any required synchronization between them is the responsibility of the

> +application. Events originating from parallel queues thus have the highest

> +throughput rate, however they also potentially involve the most work on

> the

> +part of the application. In the Figure above, four threads are calling

> +*odp_schedule()* to obtain events to process. The scheduler has assigned

> +three events from the first queue to three threads in parallel. The fourth

> +thread is processing a single event from the third queue. The second queue

> +might either be empty, of lower priority, or not in a scheduler group

> matching

> +any of the threads being serviced by the scheduler.

> +

> +=== Atomic Queues

> +Atomic queues simplify event synchronization because only a single event

> +from a given atomic queue may be processed at a time. Events scheduled

> from

> +atomic queues thus can be processed lock free because the locking is being

> +done implicitly by the scheduler.

> +

> +.Atomic Queue Scheduling

> +image::../images/atomic_queue.png[align="center"]

>

> -Events are sent to a queue, and the the sender chooses a queue based on

> the service it needs.

> -The sender does not need to know which odp_worker (on which core) or HW

> accelerator will process the event, but all the events on a queue are

> eventually scheduled and processed.

> +In this example, no matter how many events may be held in an atomic

> queue, only

> +one of them can be scheduled at a time. Here two threads process events

> from

> +two different atomic queues. Note that there is no synchronization between

> +different atomic queues, only between events originating from the same

> atomic

> +queue. The queue context associated with the atomic queue is held until

> the

> +next call to the scheduler or until the application explicitly releases it

> +via a call to *odp_schedule_release_atomic()*.

>

> -NOTE: Both ordered and parallel queue types improve throughput over an

> atomic queue (due to parallel event processing), but the user has to take

> care of the context data synchronization (if needed).

> +Note that while atomic queues simplify programming, the serial nature of

> +atomic queues will impair scaling.

>

> -Atomic Queue

> -~~~~~~~~~~~~

> -Only one event at a time may be processed from a given queue. The

> processing maintains order and context data synchronization but this will

> impair scaling.

> +=== Ordered Queues

> +Ordered queues provide the best of both worlds by providing the inherent

> +scaleabilty of parallel queues, with the easy synchronization of atomic

> +queues.

>

> -.Overview Atomic Queue processing

> -image::../images/atomic_queue.png[align="center"]

> +.Ordered Queue Scheduling

> +image::../images/ordered_queue.png[align="center"]

>

> -Ordered Queue

> -~~~~~~~~~~~~~

> -An ordered queue will ensure that the sequencing at the output is

> identical to that of the input, but multiple events may be processed

> simultaneously and the order is restored before the events move to the next

> queue

> +When scheduling events from an ordered queue, the scheduler dispatches

> multiple

> +events from the queue in parallel to different threads, however the

> scheduler

> +also ensures that the relative sequence of these events on output queues

> +is identical to their sequence from their originating ordered queue.

> +

> +As with atomic queues, the ordering guarantees associated with ordered

> queues

> +refer to events originating from the same queue, not for those

> originating on

> +different queues. Thus in this figure three thread are processing events

> 5, 3,

> +and 4, respectively from the first ordered queue. Regardless of how these

> +threads complete processing, these events will appear in their original

> +relative order on their output queue.

> +

> +==== Order Preservation

> +Relative order is preserved independent of whether events are being sent

> to

> +different output queues.  For example, if some events are sent to output

> queue

> +A while others are sent to output queue B then the events on these output

> +queues will still be in the same relative order as they were on their

> +originating queue.  Similarly, if the processing consumes events so that

> no

> +output is issued for some of them (_e.g.,_ as part of IP fragment

> reassembly

> +processing) then other events will still be correctly ordered with

> respect to

> +these sequence gaps. Finally, if multiple events are enqueued for a given

> +order (_e.g.,_ as part of packet segmentation processing for MTU

> +considerations), then each of these events will occupy the originator's

> +sequence in the target output queue(s). In this case the relative order

> of these

> +events will be in the order that the thread issued *odp_queue_enq()*

> calls for

> +them.

> +

> +The ordered context associated with the dispatch of an event from an

> ordered

> +queue lasts until the next scheduler call or until explicitly released by

> +the thread calling *odp_schedule_release_ordered()*. This call may be used

> +as a performance advisory that the thread no longer requires ordering

> +guarantees for the current context. As a result, any subsequent enqueues

> +within the current scheduler context will be treated as if the thread was

> +operating in a parallel queue context.

> +

> +==== Ordered Locking

> +Another powerful feature of the scheduler's handling of ordered queues is

> +*ordered locks*. Each ordered queue has associated with it a number of

> ordered

> +locks as specified by the _lock_count_ parameter at queue create time.

> +

> +Ordered locks provide an efficient means to perform in-order sequential

> +processing within an ordered context. For example, supposed events with

> relative

> +order 5, 6, and 7 are executing in parallel by three different threads. An

> +ordered lock will enable these threads to synchronize such that they can

> +perform some critical section in their originating queue order. The

> number of

> +ordered locks supported for each ordered queue is implementation

> dependent (and

> +queryable via the *odp_config_max_ordered_locks_per_queue()* API). If the

> +implementation supports multiple ordered locks then these may be used to

> +protect different ordered critical sections within a given ordered

> context.

> +

> +==== Summary: Ordered Queues

> +To see how these considerations fit together, consider the following code:

> +

> +.Processing with Ordered Queues

> +[source,c]

> +----

> +void worker_thread()

> +        odp_init_local();

> +        ...other initialization processing

> +

> +        while (1) {

> +                ev = odp_schedule(&which_q, ODP_SCHED_WAIT);

> +                ...process events in parallel

> +                odp_schedule_order_lock(0);

> +                ...critical section processed in order

> +                odp_schedule_order_unlock(0);

> +                ...continue processing in parallel

> +                odp_queue_enq(dest_q, ev);

> +        }

> +}

> +----

>

> -.Overview Ordered Queue processing

> -image::../images/ordered_queue.png[align="center"]

> +This represents a simplified structure for a typical worker thread

> operating

> +on ordered queues. Multiple events are processed in parallel and the use

> of

> +ordered queues ensures that they will be placed on +dest_q+ in the same

> order

> +as they originated.  While processing in parallel, the use of ordered

> locks

> +enables critical sections to be processed in order within the overall

> parallel

> +flow. When a thread arrives at the _odp_schedule_order_lock()_ call, it

> waits

> +until the locking order for this lock for all prior events has been

> resolved

> +and then enters the critical section. The _odp_schedule_order_unlock()_

> call

> +releases the critical section and allows the next order to enter it.

>

> -Parallel Queue

> -~~~~~~~~~~~~~~

> -There are no restrictions on the number of events being processed.

> +=== Queue Scheduling Summary

>

> -.Overview parallel Queue processing

> -image::../images/parallel_queue.png[align="center"]

> +NOTE: Both ordered and parallel queues improve throughput over atomic

> queues

> +due to parallel event processing, but require that the application take

> +steps to ensure context data synchronization if needed.

> +

> +== Glossary

> +[glossary]

> +worker thread::

> +    A worker is a type of ODP thread. It will usually be isolated from

> +    the scheduling of any host operating system and is intended for

> fast-path

> +    processing with a low and predictable latency. Worker threads will not

> +    generally receive interrupts and will run to completion.

> +control thread::

> +    A control threadis a type of ODP thread. It will be isolated from the

> host

> +    operating system house keeping tasks but will be scheduled by it and

> may

> +    receive interrupts.

> +thread::

> +    An ODP thread is a flow of execution that in a Linux environment

> could be

> +    a Linux process or thread.

> +event::

> +    An event is a notification that can be placed in a queue.

> +queue::

> +    A communication channel that holds events

> --

> 2.1.4

>

> _______________________________________________

> lng-odp mailing list

> lng-odp@lists.linaro.org

> https://lists.linaro.org/mailman/listinfo/lng-odp

>




-- 
Mike Holmes
Technical Manager - Linaro Networking Group
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs
Maxim Uvarov Dec. 14, 2015, 9:30 a.m. UTC | #2
Merged,
Maxim.

On 12/11/2015 21:51, Mike Holmes wrote:
>
> On 11 December 2015 at 12:23, Bill Fischofer 
> <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
>
>     Complete the reformatting to standard asciidoc style, expand the
>     ODP Application Programming section, and include a reorganized and
>     expanded discussion of ODP queues.
>
>     Signed-off-by: Bill Fischofer <bill.fischofer@linaro.org
>     <mailto:bill.fischofer@linaro.org>>
>
>
> Reviewed-by: Mike Holmes <mike.holmes@linaro.org 
> <mailto:mike.holmes@linaro.org>>
>
> Mike
>
>     ---
>      doc/users-guide/users-guide.adoc | 450
>     +++++++++++++++++++++++++++++++--------
>      1 file changed, 358 insertions(+), 92 deletions(-)
>
>     diff --git a/doc/users-guide/users-guide.adoc
>     b/doc/users-guide/users-guide.adoc
>     index cf77fa0..2e30f3a 100644
>     --- a/doc/users-guide/users-guide.adoc
>     +++ b/doc/users-guide/users-guide.adoc
>     @@ -8,16 +8,19 @@ OpenDataPlane (ODP)  Users-Guide
>      Abstract
>      --------
>      This document is intended to guide a new ODP application developer.
>     -Further details about ODP may be found at the
>     http://opendataplane.org[ODP] home page.
>     +Further details about ODP may be found at the
>     http://opendataplane.org[ODP]
>     +home page.
>
>      .Overview of a system running ODP applications
>      image::../images/overview.png[align="center"]
>
>     -ODP is an API specification that allows many implementations to
>     provide platform independence, automatic hardware acceleration and
>     CPU scaling to high performance networking  applications.
>     -This document describes how to write an application that can
>     successfully take advantage of the API.
>     +ODP is an API specification that allows many implementations to
>     provide
>     +platform independence, automatic hardware acceleration and CPU
>     scaling to
>     +high performance networking  applications. This document
>     describes how to
>     +write an application that can successfully take advantage of the API.
>
>      :numbered:
>     -== Introduction ==
>     +== Introduction
>      .OpenDataPlane Components
>      image::../images/odp_components.png[align="center"]
>
>     @@ -42,7 +45,7 @@ ODP API specification--that is the
>     responsibility of each ODP implementation.
>      * Application-centric.  Covers functional needs of data plane
>     applications.
>      * Ensures portability by specifying the functional behavior of ODP.
>      * Defined jointly and openly by application writers and platform
>     implementers.
>     -* Archiected to be implementable on a wide range of platforms
>     efficiently
>     +* Architected to be implementable on a wide range of platforms
>     efficiently
>      * Sponsored, governed, and maintained by the Linaro Networking
>     Group (LNG)
>
>      .ODP Implementations
>     @@ -68,7 +71,7 @@ where the application will run on a target
>     platform chosen by someone else.
>      * One size does not fit all--supporting multiple implementations
>     allows ODP
>      to adapt to widely differing internals among platforms.
>      * Anyone can create an ODP implementation tailored to their platform
>     -* Distribution and mainteinance of each implementation is as
>     owner wishes
>     +* Distribution and maintenance of each implementation is as owner
>     wishes
>        - Open source or closed source as business needs determine
>        - Have independent release cycles and service streams
>      * Allows HW and SW innovation in how ODP APIs are implemented on
>     each platform.
>     @@ -100,7 +103,7 @@ drivers supported by DPDK.
>      they are derived from a reference implementation.
>
>      .ODP Validation Test Suite
>     -Third, to enure consistency between different ODP
>     implementations, ODP
>     +Third, to ensure consistency between different ODP
>     implementations, ODP
>      consists of a validation suite that verifies that any given
>     implementation of
>      ODP faithfully provides the specified functional behavior of each
>     ODP API.
>      As a separate open source component, the validation suite may be
>     used by
>     @@ -115,16 +118,16 @@ ODP API specification.
>      * Key to ensuring application portability across all ODP
>     implementations
>      * Tests that ODP implementations conform to the specified
>     functional behavior
>      of ODP APIs.
>     -* Can be run at any time by users and vendors to validat
>     implementations
>     -od ODP.
>     +* Can be run at any time by users and vendors to validate
>     implementations
>     +of ODP.
>
>     -=== ODP API Specification Versioning ===
>     +=== ODP API Specification Versioning
>      As an evolving standard, the ODP API specification is released
>     under an
>      incrementing version number, and corresponding implementations of
>     ODP, as well
>      as the validation suite that verifies API conformance, are linked
>     to this
>     -version number. ODP versions are specified using a stanard
>     three-level
>     +version number. ODP versions are specified using a standard
>     three-level
>      number (major.minor.fixlevel) that are incremented according to
>     the degree of
>     -change the level represents. Increments to the fixlevel represent
>     clarification
>     +change the level represents. Increments to the fix level
>     represent clarification
>      of the specification or other minor changes that do not affect
>     either the
>      syntax or semantics of the specification. Such changes in the API
>     specification
>      are expected to be rare. Increments to the minor level
>     @@ -136,26 +139,26 @@ the major level represent significant
>     structural changes that most likely
>      require some level of application source code change, again as
>     documented in
>      the release notes for that version.
>
>     -=== ODP Implementation Versioning ===
>     +=== ODP Implementation Versioning
>      ODP implementations are free to use whatever release naming/numbering
>      conventions they wish, as long as it is clear what level of the
>     ODP API a given
>      release implements. A recommended convention is to use the same
>     three level
>      numbering scheme where the major and minor numbers correspond to
>     the ODP API
>     -level and the fixlevel represents an implementation-defined
>     service level
>     +level and the fix level represents an implementation-defined
>     service level
>      associated with that API level implementation. The LNG-supplied
>     ODP reference
>      implementations follow this convention.
>
>     -=== ODP Validation Test Suite Versioning ===
>     +=== ODP Validation Test Suite Versioning
>      The ODP validation test suite follows these same naming
>     conventions. The major
>      and minor release numbers correspond to the ODP API level that
>     the suite
>     -validates and the fixlevel represents the service level of the
>     validation
>     +validates and the fix level represents the service level of the
>     validation
>      suite itself for that API level.
>
>     -=== ODP Design Goals ===
>     +=== ODP Design Goals
>      ODP has three primary goals that follow from its component
>     structure. The first
>      is application portability across a wide range of platforms.
>     These platforms
>      differ in terms of processor instruction set architecture, number
>     and types of
>     -application processing cores, memory oranization, as well as the
>     number and
>     +application processing cores, memory organization, as well as the
>     number and
>      type of platform specific hardware acceleration and offload
>     features that
>      are available. ODP applications can move from one conforming
>     implementation
>      to another with at most a recompile.
>     @@ -175,7 +178,7 @@ of processing cores that are available to
>     realize application function. The
>      result is that an application written to this model does not
>     require redesign
>      as it scales from 4, to 40, to 400 cores.
>
>     -== Organization of this Document ==
>     +== Organization of this Document
>      This document is organized into several sections. The first
>     presents a high
>      level overview of the ODP API component areas and their
>     associated abstract
>      data types. This section introduces ODP APIs at a conceptual level.
>     @@ -190,14 +193,14 @@ full reference specification for each API.
>     The latter is intended to be used
>      by ODP application programmers, as well as implementers, to
>     understand the
>      precise syntax and semantics of each API.
>
>     -== ODP API Concepts ==
>     +== ODP API Concepts
>      ODP programs are built around several conceptual structures that
>     every
>     -appliation programmer needs to be familiar with to use ODP
>     effectively. The
>     +application programmer needs to be familiar with to use ODP
>     effectively. The
>      main ODP concepts are:
>      Thread, Event, Queue, Pool, Shared Memory, Buffer, Packet, PktIO,
>     Timer,
>      and Synchronizer.
>
>     -=== Thread ===
>     +=== Thread
>      The thread is the fundamental programming unit in ODP. ODP
>     applications are
>      organized into a collection of threads that perform the work that the
>      application is designed to do. ODP threads may or may not share
>     memory with
>     @@ -209,7 +212,7 @@ A control thread is a supervisory thread that
>     organizes
>      the operation of worker threads. Worker threads, by contrast,
>     exist to
>      perform the main processing logic of the application and employ a
>     run to
>      completion model. Worker threads, in particular, are intended to
>     operate on
>     -dedicated processing cores, especially in many core proessing
>     environments,
>     +dedicated processing cores, especially in many core processing
>     environments,
>      however a given implementation may multitask multiple threads on
>     a single
>      core if desired (typically on smaller and lower performance target
>      environments).
>     @@ -219,7 +222,7 @@ _thread mask_ and _scheduler group_ that
>     determine where they can run and
>      the type of work that they can handle. These will be discussed in
>     greater
>      detail later.
>
>     -=== Event ===
>     +=== Event
>      Events are what threads process to perform their work. Events can
>     represent
>      new work, such as the arrival of a packet that needs to be
>     processed, or they
>      can represent the completion of requests that have executed
>     asynchronously.
>     @@ -232,7 +235,7 @@ References to events are via handles of
>     abstract type +odp_event_t+. Cast
>      functions are provided to convert these into specific handles of the
>      appropriate type represented by the event.
>
>     -=== Queue ===
>     +=== Queue
>      A queue is a message passing channel that holds events. Events can be
>      added to a queue via enqueue operations or removed from a queue
>     via dequeue
>      operations. The endpoints of a queue will vary depending on how
>     it is used.
>     @@ -244,7 +247,7 @@ stateful processing on events as well as
>     stateless processing.
>
>      Queues are represented by handles of abstract type +odp_queue_t+.
>
>     -=== Pool ===
>     +=== Pool
>      A pool is a shared memory area from which elements may be drawn.
>     Pools
>      represent the backing store for events, among other things. Pools are
>      typically created and destroyed by the application during
>     initialization and
>     @@ -256,32 +259,32 @@ are Buffer and Packet.
>
>      Pools are represented by handles of abstract type +odp_pool_t+.
>
>     -=== Shared Memory ===
>     +=== Shared Memory
>      Shared memory represents raw blocks of storage that are sharable
>     between
>      threads. They are the building blocks of pools but can be used
>     directly by
>      ODP applications if desired.
>
>      Shared memory is represented by handles of abstract type +odp_shm_t+.
>
>     -=== Buffer ===
>     +=== Buffer
>      A buffer is a fixed sized block of shared storage that is used by ODP
>      components and/or applications to realize their function. Buffers
>     contain
>      zero or more bytes of application data as well as system maintained
>      metadata that provide information about the buffer, such as its
>     size or the
>      pool it was allocated from. Metadata is an important ODP concept
>     because it
>      allows for arbitrary amounts of side information to be associated
>     with an
>     -ODP object. Most ODP objects have assocaited metadata and this
>     metadata is
>     +ODP object. Most ODP objects have associated metadata and this
>     metadata is
>      manipulated via accessor functions that act as getters and
>     setters for
>     -this information. Getter acces functions permit an application to
>     read
>     +this information. Getter access functions permit an application
>     to read
>      a metadata item, while setter access functions permit an
>     application to write
>      a metadata item. Note that some metadata is inherently read only
>     and thus
>      no setter is provided to manipulate it.  When object have
>     multiple metadata
>      items, each has its own associated getter and/or setter access
>     function to
>      inspect or manipulate it.
>
>     -Buffers are represened by handles of abstract type +odp_buffer_t+.
>     +Buffers are represented by handles of abstract type +odp_buffer_t+.
>
>     -=== Packet ===
>     +=== Packet
>      Packets are received and transmitted via I/O interfaces and represent
>      the basic data that data plane applications manipulate.
>      Packets are drawn from pools of type +ODP_POOL_PACKET+.
>     @@ -294,7 +297,7 @@ with each packet for its own use.
>
>      Packets are represented by handles of abstract type +odp_packet_t+.
>
>     -=== PktIO ===
>     +=== PktIO
>      PktIO is how ODP represents I/O interfaces. A pktio object is a
>     logical
>      port capable of receiving and/or transmitting packets. This may
>     be directly
>      supported by the underlying platform as an integrated feature,
>     @@ -302,7 +305,7 @@ or may represent a device attached via a PCIE
>     or other bus.
>
>      PktIOs are represented by handles of abstract type +odp_pktio_t+.
>
>     -=== Timer ===
>     +=== Timer
>      Timers are how ODP applications measure and respond to the
>     passage of time.
>      Timers are drawn from specialized pools called timer pools that
>     have their
>      own abstract type (+odp_timer_pool_t+). Applications may have
>     many timers
>     @@ -310,7 +313,7 @@ active at the same time and can set them to
>     use either relative or absolute
>      time. When timers expire they create events of type
>     +odp_timeout_t+, which
>      serve as notifications of timer expiration.
>
>     -=== Synchronizer ===
>     +=== Synchronizer
>      Multiple threads operating in parallel typically require various
>      synchronization services to permit them to operate in a reliable and
>      coordinated manner. ODP provides a rich set of locks, barriers,
>     and similar
>     @@ -325,7 +328,7 @@ flow of work through an ODP application. These
>     include the Classifier,
>      Scheduler, and Traffic Manager.  These components relate to the three
>      main stages of packet processing: Receive, Process, and Transmit.
>
>     -=== Classifier ===
>     +=== Classifier
>      The *Classifier* provides a suite of APIs that control packet
>     receive (RX)
>      processing.
>
>     @@ -362,8 +365,8 @@ Note that the use of the classifier is
>     optional.  Applications may directly
>      receive packets from a corresponding PktIO input queue via direct
>     polling
>      if they choose.
>
>     -=== Scheduler ===
>     -The *Scheduler* provides a suite of APIs that control scalabable
>     event
>     +=== Scheduler
>     +The *Scheduler* provides a suite of APIs that control scalable event
>      processing.
>
>      .ODP Scheduler and Event Processing
>     @@ -391,10 +394,10 @@ scheduled back to a thread to continue
>     processing with the results of the
>      requested asynchronous operation.
>
>      Threads themselves can enqueue events to queues for downstream
>     processing
>     -by other threads, permitting flexibility in how applicaitions
>     structure
>     +by other threads, permitting flexibility in how applications
>     structure
>      themselves to maximize concurrency.
>
>     -=== Traffic Manager ===
>     +=== Traffic Manager
>      The *Traffic Manager* provides a suite of APIs that control
>     traffic shaping and
>      Quality of Service (QoS) processing for packet output.
>
>     @@ -413,23 +416,33 @@ goals. Again, the advantage here is that on
>     many platforms traffic management
>      functions are implemented in hardware, permitting transparent
>     offload of
>      this work.
>
>     -Glossary
>     ---------
>     -[glossary]
>     -odp_worker::
>     -    An opd_worker is a type of odp_thread. It will usually be
>     isolated from the scheduling of any host operating system and is
>     intended for fast-path processing with a low and predictable
>     latency. Odp_workers will not generally receive interrupts and
>     will run to completion.
>     -odp_control::
>     -    An odp_control is a type of odp_thread. It will be isolated
>     from the host operating system house keeping tasks but will be
>     scheduled by it and may receive interrupts.
>     -odp_thread::
>     -    An odp_thread is a flow of execution that in a Linux
>     environment could be a Linux process or thread.
>     -event::
>     -    An event is a notification that can be placed in a queue.
>     -
>     -The include structure
>     ----------------------
>     -Applications only include the 'include/odp.h file which includes
>     the 'platform/<implementation name>/include/plat' files to provide
>     a complete definition of the API on that platform.
>     -The doxygen documentation defining the behavior of the ODP API is
>     all contained in the public API files, and the actual definitions
>     for an implementation will be found in the per platform directories.
>     -Per-platform data that might normally be a #define can be
>     recovered via the appropriate access function if the #define is
>     not directly visible to the application.
>     +== ODP Application Programming
>     +At the highest level, an *ODP Application* is a program that uses
>     one or more
>     +ODP APIs. Because ODP is a framework rather than a programming
>     environment,
>     +applications are free to also use other APIs that may or may not
>     provide the
>     +same portability characteristics as ODP APIs.
>     +
>     +ODP applications vary in terms of what they do and how they
>     operate, but in
>     +general all share the following characteristics:
>     +
>     +. They are organized into one or more _threads_ that execute in
>     parallel.
>     +. These threads communicate and coordinate their activities using
>     various
>     +_synchronization_ mechanisms.
>     +. They receive packets from one or more _packet I/O interfaces_.
>     +. They examine, transform, or otherwise process packets.
>     +. They transmit packets to one or more _packet I/O interfaces_.
>     +
>     +ODP provides APIs to assist in each of these areas.
>     +
>     +=== The include structure
>     +Applications only include the 'include/odp.h' file, which
>     includes the
>     +'platform/<implementation name>/include/odp' files to provide a
>     complete
>     +definition of the API on that platform. The doxygen documentation
>     defining
>     +the behavior of the ODP API is all contained in the public API
>     files, and the
>     +actual definitions for an implementation will be found in the per
>     platform
>     +directories. Per-platform data that might normally be a +#define+
>     can be
>     +recovered via the appropriate access function if the #define is
>     not directly
>     +visible to the application.
>
>      .Users include structure
>      ----
>     @@ -442,51 +455,304 @@ Per-platform data that might normally be a
>     #define can be recovered via the appr
>      │   └── odp.h   This file should be the only file included by the
>     application.
>      ----
>
>     -Initialization
>     ---------------
>     -IMPORTANT: ODP depends on the application to perform a graceful
>     shutdown, calling the terminate functions should only be done when
>     the application is sure it has closed the ingress and subsequently
>     drained all queues etc.
>     +=== Initialization
>     +IMPORTANT: ODP depends on the application to perform a graceful
>     shutdown,
>     +calling the terminate functions should only be done when the
>     application is
>     +sure it has closed the ingress and subsequently drained all
>     queues, etc.
>     +
>     +=== Startup
>     +The first API that must be called by an ODP application is
>     'odp_init_global()'.
>     +This takes two pointers. The first, +odp_init_t+, contains ODP
>     initialization
>     +data that is platform independent and portable, while the second,
>     ++odp_platform_init_t+, is passed unparsed to the implementation
>     +to be used for platform specific data that is not yet, or may
>     never be
>     +suitable for the ODP API.
>     +
>     +Calling odp_init_global() establishes the ODP API framework and
>     MUST be
>     +called before any other ODP API may be called. Note that it is
>     only called
>     +once per application. Following global initialization, each
>     thread in turn
>     +calls 'odp_init_local()' is called. This establishes the local
>     ODP thread
>     +context for that thread and MUST be called before other ODP APIs
>     may be
>     +called by that thread.
>     +
>     +=== Shutdown
>     +Shutdown is the logical reverse of the initialization procedure, with
>     +'odp_term_local()' called for each thread before
>     'odp_term_global()' is
>     +called to terminate ODP.
>     +
>     +.ODP Application Structure Flow Diagram
>     +image::../images/resource_management.png[align="center"]
>
>     -Startup
>     -~~~~~~~~
>     -The first API that must be called is 'odp_init_global()'.
>     -This takes two pointers, odp_init_t contains ODP initialization
>     data that is platform independent and portable.
>     -The second odp_platform_init_t is passed un parsed to the 
>     implementation and can be used for platform specific data that is
>     not yet, or may never be suitable for the ODP API.
>     +== Common Conventions
>     +Many ODP APIs share common conventions regarding their arguments
>     and return
>     +types. This section highlights some of the more common and
>     frequently used
>     +conventions.
>     +
>     +=== Handles and Special Designators
>     +ODP resources are represented via _handles_ that have abstract type
>     +_odp_resource_t_.  So pools are represented by handles of type
>     +odp_pool_t+,
>     +queues by handles of type +odp_queue_t+, etc. Each such type
>     +has a distinguished type _ODP_RESOURCE_INVALID_ that is used to
>     indicate a
>     +handle that does not refer to a valid resource of that type.
>     Resources are
>     +typically created via an API named _odp_resource_create()_ that
>     returns a
>     +handle of type _odp_resource_t_ that represents the created
>     object. This
>     +returned handle is set to _ODP_RESOURCE_INVALID_ if, for example, the
>     +resource could not be created due to resource exhaustion. Invalid
>     resources
>     +do not necessarily represent error conditions. For example,
>     +ODP_EVENT_INVALID+
>     +in response to an +odp_queue_deq()+ call to get an event from a
>     queue simply
>     +indicates that the queue is empty.
>     +
>     +=== Addressing Scope
>     +Unless specifically noted in the API, all ODP resources are
>     global to the ODP
>     +application, whether it runs as a single process or multiple
>     processes. ODP
>     +handles therefore have common meaning within an ODP application
>     but have no
>     +meaning outside the scope of the application.
>     +
>     +=== Resources and Names
>     +Many ODP resource objects, such as pools and queues, support an
>     +application-specified character string _name_ that is associated
>     with an ODP
>     +object at create time.  This name serves two purposes:
>     documentation, and
>     +lookup. The lookup function is particularly useful to allow an
>     ODP application
>     +that is divided into multiple processes to obtain the handle for
>     the common
>     +resource.
>     +
>     +== Queues
>     +Queues are the fundamental event sequencing mechanism provided by
>     ODP and all
>     +ODP applications make use of them either explicitly or
>     implicitly. Queues are
>     +created via the 'odp_queue_create()' API that returns a handle of
>     type
>     ++odp_queue_t+ that is used to refer to this queue in all
>     subsequent APIs that
>     +reference it. Queues have one of two ODP-defined _types_, POLL,
>     and SCHED that
>     +determine how they are used. POLL queues directly managed by the ODP
>     +application while SCHED queues make use of the *ODP scheduler* to
>     provide
>     +automatic scalable dispatching and synchronization services.
>     +
>     +.Operations on POLL queues
>     +[source,c]
>     +----
>     +odp_queue_t poll_q1 = odp_queue_create("poll queue 1",
>     ODP_QUEUE_TYPE_POLL, NULL);
>     +odp_queue_t poll_q2 = odp_queue_create("poll queue 2",
>     ODP_QUEUE_TYPE_POLL, NULL);
>     +...
>     +odp_event_t ev = odp_queue_deq(poll_q1);
>     +...do something
>     +int rc = odp_queue_enq(poll_q2, ev);
>     +----
>
>     -The second API that must be called is 'odp_init_local()', this
>     must be called once per odp_thread, regardless of odp_thread
>     type.  Odp_threads may be of type ODP_THREAD_WORKER or
>     ODP_THREAD_CONTROL
>     +The key distinction is that dequeueing events from POLL queues is an
>     +application responsibility while dequeueing events from SCHED
>     queues is the
>     +responsibility of the ODP scheduler.
>
>     -Shutdown
>     -~~~~~~~~~
>     -Shutdown is the logical reverse of the initialization procedure,
>     with 'odp_thread_term()' called for each worker before
>     'odp_term_global()' is called.
>     +.Operations on SCHED queues
>     +[source,c]
>     +----
>     +odp_queue_param_t qp;
>     +odp_queue_param_init(&qp);
>     +odp_schedule_prio_t prio = ...;
>     +odp_schedule_group_t sched_group = ...;
>     +qp.sched.prio = prio;
>     +qp.sched.sync = ODP_SCHED_SYNC_[NONE|ATOMIC|ORDERED];
>     +qp.sched.group = sched_group;
>     +qp.lock_count = n; /* Only relevant for ordered queues */
>     +odp_queue_t sched_q1 = odp_queue_create("sched queue 1",
>     ODP_QUEUE_TYPE_SCHED, &qp);
>     +
>     +...thread init processing
>     +
>     +while (1) {
>     +        odp_event_t ev;
>     +        odp_queue_t which_q;
>     +        ev = odp_schedule(&which_q, <wait option>);
>     +        ...process the event
>     +}
>     +----
>
>     -image::../images/resource_management.png[align="center"]
>     +With scheduled queues, events are sent to a queue, and the the
>     sender chooses
>     +a queue based on the service it needs. The sender does not need
>     to know
>     +which ODP thread (on which core) or hardware accelerator will process
>     +the event, but all the events on a queue are eventually scheduled
>     and processed.
>     +
>     +As can be seen, SCHED queues have additional attributes that are
>     specified at
>     +queue create that control how the scheduler is to process events
>     contained
>     +on them. These include group, priority, and synchronization class.
>     +
>     +=== Scheduler Groups
>     +The scheduler's dispatching job is to return the next event from
>     the highest
>     +priority SCHED queue that the caller is eligible to receive
>     events from.
>     +This latter consideration is determined by the queues _scheduler
>     group_, which
>     +is set at queue create time, and by the caller's _scheduler group
>     mask_ that
>     +indicates which scheduler group(s) it belongs to. Scheduler
>     groups are
>     +represented by handles of type +odp_scheduler_group_t+ and are
>     created by
>     +the *odp_scheduler_group_create()* API. A number of scheduler
>     groups are
>     +_predefined_ by ODP.  These include +ODP_SCHED_GROUP_ALL+ (all
>     threads),
>     ++ODP_SCHED_GROUP_WORKER+ (all worker threads), and
>     +ODP_SCHED_GROUP_CONTROL+
>     +(all control threads). The application is free to create
>     additional scheduler
>     +groups for its own purpose and threads can join or leave
>     scheduler groups
>     +using the *odp_scheduler_group_join()* and
>     *odp_scheduler_group_leave()* APIs
>     +
>     +=== Scheduler Priority
>     +The +prio+ field of the +odp_queue_param_t+ specifies the queue's
>     scheduling
>     +priority, which is how queues within eligible scheduler groups
>     are selected
>     +for dispatch. Queues have a default scheduling priority of NORMAL
>     but can be
>     +set to HIGHEST or LOWEST according to application needs.
>     +
>     +=== Scheduler Synchronization
>     +In addition to its dispatching function, which provide automatic
>     scalability to
>     +ODP applications in many core environments, the other main
>     function of the
>     +scheduler is to provide event synchronization services that
>     greatly simplify
>     +application programming in a parallel processing environment. A
>     queue's
>     +SYNC mode determines how the scheduler handles the
>     synchronization processing
>     +of multiple events originating from the same queue.
>     +
>     +Three types of queue scheduler synchronization area supported:
>     Parallel,
>     +Atomic, and Ordered.
>     +
>     +==== Parallel Queues
>     +SCHED queues that specify a sync mode of ODP_SCHED_SYNC_NONE are
>     unrestricted
>     +in how events are processed.
>     +
>     +.Parallel Queue Scheduling
>     +image::../images/parallel_queue.png[align="center"]
>
>     -Queues
>     -------
>     -There are three queue types, atomic, ordered and parallel.
>     -A queue belongs to a single odp_worker and a odp_worker may have
>     multiple queues.
>     +All events held on parallel queues are eligible to be scheduled
>     simultaneously
>     +and any required synchronization between them is the
>     responsibility of the
>     +application. Events originating from parallel queues thus have
>     the highest
>     +throughput rate, however they also potentially involve the most
>     work on the
>     +part of the application. In the Figure above, four threads are
>     calling
>     +*odp_schedule()* to obtain events to process. The scheduler has
>     assigned
>     +three events from the first queue to three threads in parallel.
>     The fourth
>     +thread is processing a single event from the third queue. The
>     second queue
>     +might either be empty, of lower priority, or not in a scheduler
>     group matching
>     +any of the threads being serviced by the scheduler.
>     +
>     +=== Atomic Queues
>     +Atomic queues simplify event synchronization because only a
>     single event
>     +from a given atomic queue may be processed at a time. Events
>     scheduled from
>     +atomic queues thus can be processed lock free because the locking
>     is being
>     +done implicitly by the scheduler.
>     +
>     +.Atomic Queue Scheduling
>     +image::../images/atomic_queue.png[align="center"]
>
>     -Events are sent to a queue, and the the sender chooses a queue
>     based on the service it needs.
>     -The sender does not need to know which odp_worker (on which core)
>     or HW accelerator will process the event, but all the events on a
>     queue are eventually scheduled and processed.
>     +In this example, no matter how many events may be held in an
>     atomic queue, only
>     +one of them can be scheduled at a time. Here two threads process
>     events from
>     +two different atomic queues. Note that there is no
>     synchronization between
>     +different atomic queues, only between events originating from the
>     same atomic
>     +queue. The queue context associated with the atomic queue is held
>     until the
>     +next call to the scheduler or until the application explicitly
>     releases it
>     +via a call to *odp_schedule_release_atomic()*.
>
>     -NOTE: Both ordered and parallel queue types improve throughput
>     over an atomic queue (due to parallel event processing), but the
>     user has to take care of the context data synchronization (if needed).
>     +Note that while atomic queues simplify programming, the serial
>     nature of
>     +atomic queues will impair scaling.
>
>     -Atomic Queue
>     -~~~~~~~~~~~~
>     -Only one event at a time may be processed from a given queue. The
>     processing maintains order and context data synchronization but
>     this will impair scaling.
>     +=== Ordered Queues
>     +Ordered queues provide the best of both worlds by providing the
>     inherent
>     +scaleabilty of parallel queues, with the easy synchronization of
>     atomic
>     +queues.
>
>     -.Overview Atomic Queue processing
>     -image::../images/atomic_queue.png[align="center"]
>     +.Ordered Queue Scheduling
>     +image::../images/ordered_queue.png[align="center"]
>
>     -Ordered Queue
>     -~~~~~~~~~~~~~
>     -An ordered queue will ensure that the sequencing at the output is
>     identical to that of the input, but multiple events may be
>     processed simultaneously and the order is restored before the
>     events move to the next queue
>     +When scheduling events from an ordered queue, the scheduler
>     dispatches multiple
>     +events from the queue in parallel to different threads, however
>     the scheduler
>     +also ensures that the relative sequence of these events on output
>     queues
>     +is identical to their sequence from their originating ordered queue.
>     +
>     +As with atomic queues, the ordering guarantees associated with
>     ordered queues
>     +refer to events originating from the same queue, not for those
>     originating on
>     +different queues. Thus in this figure three thread are processing
>     events 5, 3,
>     +and 4, respectively from the first ordered queue. Regardless of
>     how these
>     +threads complete processing, these events will appear in their
>     original
>     +relative order on their output queue.
>     +
>     +==== Order Preservation
>     +Relative order is preserved independent of whether events are
>     being sent to
>     +different output queues.  For example, if some events are sent to
>     output queue
>     +A while others are sent to output queue B then the events on
>     these output
>     +queues will still be in the same relative order as they were on their
>     +originating queue.  Similarly, if the processing consumes events
>     so that no
>     +output is issued for some of them (_e.g.,_ as part of IP fragment
>     reassembly
>     +processing) then other events will still be correctly ordered
>     with respect to
>     +these sequence gaps. Finally, if multiple events are enqueued for
>     a given
>     +order (_e.g.,_ as part of packet segmentation processing for MTU
>     +considerations), then each of these events will occupy the
>     originator's
>     +sequence in the target output queue(s). In this case the relative
>     order of these
>     +events will be in the order that the thread issued
>     *odp_queue_enq()* calls for
>     +them.
>     +
>     +The ordered context associated with the dispatch of an event from
>     an ordered
>     +queue lasts until the next scheduler call or until explicitly
>     released by
>     +the thread calling *odp_schedule_release_ordered()*. This call
>     may be used
>     +as a performance advisory that the thread no longer requires ordering
>     +guarantees for the current context. As a result, any subsequent
>     enqueues
>     +within the current scheduler context will be treated as if the
>     thread was
>     +operating in a parallel queue context.
>     +
>     +==== Ordered Locking
>     +Another powerful feature of the scheduler's handling of ordered
>     queues is
>     +*ordered locks*. Each ordered queue has associated with it a
>     number of ordered
>     +locks as specified by the _lock_count_ parameter at queue create
>     time.
>     +
>     +Ordered locks provide an efficient means to perform in-order
>     sequential
>     +processing within an ordered context. For example, supposed
>     events with relative
>     +order 5, 6, and 7 are executing in parallel by three different
>     threads. An
>     +ordered lock will enable these threads to synchronize such that
>     they can
>     +perform some critical section in their originating queue order.
>     The number of
>     +ordered locks supported for each ordered queue is implementation
>     dependent (and
>     +queryable via the *odp_config_max_ordered_locks_per_queue()*
>     API). If the
>     +implementation supports multiple ordered locks then these may be
>     used to
>     +protect different ordered critical sections within a given
>     ordered context.
>     +
>     +==== Summary: Ordered Queues
>     +To see how these considerations fit together, consider the
>     following code:
>     +
>     +.Processing with Ordered Queues
>     +[source,c]
>     +----
>     +void worker_thread()
>     +        odp_init_local();
>     +        ...other initialization processing
>     +
>     +        while (1) {
>     +                ev = odp_schedule(&which_q, ODP_SCHED_WAIT);
>     +                ...process events in parallel
>     +                odp_schedule_order_lock(0);
>     +                ...critical section processed in order
>     +                odp_schedule_order_unlock(0);
>     +                ...continue processing in parallel
>     +                odp_queue_enq(dest_q, ev);
>     +        }
>     +}
>     +----
>
>     -.Overview Ordered Queue processing
>     -image::../images/ordered_queue.png[align="center"]
>     +This represents a simplified structure for a typical worker
>     thread operating
>     +on ordered queues. Multiple events are processed in parallel and
>     the use of
>     +ordered queues ensures that they will be placed on +dest_q+ in
>     the same order
>     +as they originated.  While processing in parallel, the use of
>     ordered locks
>     +enables critical sections to be processed in order within the
>     overall parallel
>     +flow. When a thread arrives at the _odp_schedule_order_lock()_
>     call, it waits
>     +until the locking order for this lock for all prior events has
>     been resolved
>     +and then enters the critical section. The
>     _odp_schedule_order_unlock()_ call
>     +releases the critical section and allows the next order to enter it.
>
>     -Parallel Queue
>     -~~~~~~~~~~~~~~
>     -There are no restrictions on the number of events being processed.
>     +=== Queue Scheduling Summary
>
>     -.Overview parallel Queue processing
>     -image::../images/parallel_queue.png[align="center"]
>     +NOTE: Both ordered and parallel queues improve throughput over
>     atomic queues
>     +due to parallel event processing, but require that the
>     application take
>     +steps to ensure context data synchronization if needed.
>     +
>     +== Glossary
>     +[glossary]
>     +worker thread::
>     +    A worker is a type of ODP thread. It will usually be isolated
>     from
>     +    the scheduling of any host operating system and is intended
>     for fast-path
>     +    processing with a low and predictable latency. Worker threads
>     will not
>     +    generally receive interrupts and will run to completion.
>     +control thread::
>     +    A control threadis a type of ODP thread. It will be isolated
>     from the host
>     +    operating system house keeping tasks but will be scheduled by
>     it and may
>     +    receive interrupts.
>     +thread::
>     +    An ODP thread is a flow of execution that in a Linux
>     environment could be
>     +    a Linux process or thread.
>     +event::
>     +    An event is a notification that can be placed in a queue.
>     +queue::
>     +    A communication channel that holds events
>     --
>     2.1.4
>
>     _______________________________________________
>     lng-odp mailing list
>     lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>     https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
> -- 
> Mike Holmes
> Technical Manager - Linaro Networking Group
> Linaro.org <http://www.linaro.org/>***│ *Open source software for ARM SoCs
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
diff mbox

Patch

diff --git a/doc/users-guide/users-guide.adoc b/doc/users-guide/users-guide.adoc
index cf77fa0..2e30f3a 100644
--- a/doc/users-guide/users-guide.adoc
+++ b/doc/users-guide/users-guide.adoc
@@ -8,16 +8,19 @@  OpenDataPlane (ODP)  Users-Guide
 Abstract
 --------
 This document is intended to guide a new ODP application developer.
-Further details about ODP may be found at the http://opendataplane.org[ODP] home page.
+Further details about ODP may be found at the http://opendataplane.org[ODP]
+home page.
 
 .Overview of a system running ODP applications
 image::../images/overview.png[align="center"]
 
-ODP is an API specification that allows many implementations to provide platform independence, automatic hardware acceleration and CPU scaling to high performance networking  applications.
-This document describes how to write an application that can successfully take advantage of the API.
+ODP is an API specification that allows many implementations to provide
+platform independence, automatic hardware acceleration and CPU scaling to
+high performance networking  applications. This document describes how to
+write an application that can successfully take advantage of the API.
 
 :numbered:
-== Introduction ==
+== Introduction
 .OpenDataPlane Components
 image::../images/odp_components.png[align="center"]
 
@@ -42,7 +45,7 @@  ODP API specification--that is the responsibility of each ODP implementation.
 * Application-centric.  Covers functional needs of data plane applications.
 * Ensures portability by specifying the functional behavior of ODP.
 * Defined jointly and openly by application writers and platform implementers.
-* Archiected to be implementable on a wide range of platforms efficiently
+* Architected to be implementable on a wide range of platforms efficiently
 * Sponsored, governed, and maintained by the Linaro Networking Group (LNG)
 
 .ODP Implementations
@@ -68,7 +71,7 @@  where the application will run on a target platform chosen by someone else.
 * One size does not fit all--supporting multiple implementations allows ODP
 to adapt to widely differing internals among platforms.
 * Anyone can create an ODP implementation tailored to their platform
-* Distribution and mainteinance of each implementation is as owner wishes
+* Distribution and maintenance of each implementation is as owner wishes
   - Open source or closed source as business needs determine
   - Have independent release cycles and service streams
 * Allows HW and SW innovation in how ODP APIs are implemented on each platform.
@@ -100,7 +103,7 @@  drivers supported by DPDK.
 they are derived from a reference implementation.
 
 .ODP Validation Test Suite
-Third, to enure consistency between different ODP implementations, ODP
+Third, to ensure consistency between different ODP implementations, ODP
 consists of a validation suite that verifies that any given implementation of
 ODP faithfully provides the specified functional behavior of each ODP API.
 As a separate open source component, the validation suite may be used by
@@ -115,16 +118,16 @@  ODP API specification.
 * Key to ensuring application portability across all ODP implementations
 * Tests that ODP implementations conform to the specified functional behavior
 of ODP APIs.
-* Can be run at any time by users and vendors to validat implementations
-od ODP.
+* Can be run at any time by users and vendors to validate implementations
+of ODP.
 
-=== ODP API Specification Versioning ===
+=== ODP API Specification Versioning
 As an evolving standard, the ODP API specification is released under an
 incrementing version number, and corresponding implementations of ODP, as well
 as the validation suite that verifies API conformance, are linked to this
-version number. ODP versions are specified using a stanard three-level
+version number. ODP versions are specified using a standard three-level
 number (major.minor.fixlevel) that are incremented according to the degree of
-change the level represents. Increments to the fixlevel represent clarification
+change the level represents. Increments to the fix level represent clarification
 of the specification or other minor changes that do not affect either the
 syntax or semantics of the specification. Such changes in the API specification
 are expected to be rare. Increments to the minor level
@@ -136,26 +139,26 @@  the major level represent significant structural changes that most likely
 require some level of application source code change, again as documented in
 the release notes for that version.
 
-=== ODP Implementation Versioning ===
+=== ODP Implementation Versioning
 ODP implementations are free to use whatever release naming/numbering
 conventions they wish, as long as it is clear what level of the ODP API a given
 release implements. A recommended convention is to use the same three level
 numbering scheme where the major and minor numbers correspond to the ODP API
-level and the fixlevel represents an implementation-defined service level
+level and the fix level represents an implementation-defined service level
 associated with that API level implementation. The LNG-supplied ODP reference
 implementations follow this convention.
 
-=== ODP Validation Test Suite Versioning ===
+=== ODP Validation Test Suite Versioning
 The ODP validation test suite follows these same naming conventions. The major
 and minor release numbers correspond to the ODP API level that the suite
-validates and the fixlevel represents the service level of the validation
+validates and the fix level represents the service level of the validation
 suite itself for that API level.
 
-=== ODP Design Goals ===
+=== ODP Design Goals
 ODP has three primary goals that follow from its component structure. The first
 is application portability across a wide range of platforms. These platforms
 differ in terms of processor instruction set architecture, number and types of
-application processing cores, memory oranization, as well as the number and
+application processing cores, memory organization, as well as the number and
 type of platform specific hardware acceleration and offload features that
 are available. ODP applications can move from one conforming implementation
 to another with at most a recompile.
@@ -175,7 +178,7 @@  of processing cores that are available to realize application function. The
 result is that an application written to this model does not require redesign
 as it scales from 4, to 40, to 400 cores.
 
-== Organization of this Document ==
+== Organization of this Document
 This document is organized into several sections. The first presents a high
 level overview of the ODP API component areas and their associated abstract
 data types. This section introduces ODP APIs at a conceptual level.
@@ -190,14 +193,14 @@  full reference specification for each API. The latter is intended to be used
 by ODP application programmers, as well as implementers, to understand the
 precise syntax and semantics of each API.
 
-== ODP API Concepts ==
+== ODP API Concepts
 ODP programs are built around several conceptual structures that every
-appliation programmer needs to be familiar with to use ODP effectively. The
+application programmer needs to be familiar with to use ODP effectively. The
 main ODP concepts are:
 Thread, Event, Queue, Pool, Shared Memory, Buffer, Packet, PktIO, Timer,
 and Synchronizer.
 
-=== Thread ===
+=== Thread
 The thread is the fundamental programming unit in ODP.  ODP applications are
 organized into a collection of threads that perform the work that the
 application is designed to do. ODP threads may or may not share memory with
@@ -209,7 +212,7 @@  A control thread is a supervisory thread that organizes
 the operation of worker threads. Worker threads, by contrast, exist to
 perform the main processing logic of the application and employ a run to
 completion model. Worker threads, in particular, are intended to operate on
-dedicated processing cores, especially in many core proessing environments,
+dedicated processing cores, especially in many core processing environments,
 however a given implementation may multitask multiple threads on a single
 core if desired (typically on smaller and lower performance target
 environments).
@@ -219,7 +222,7 @@  _thread mask_ and _scheduler group_ that determine where they can run and
 the type of work that they can handle. These will be discussed in greater
 detail later.
 
-=== Event ===
+=== Event
 Events are what threads process to perform their work. Events can represent
 new work, such as the arrival of a packet that needs to be processed, or they
 can represent the completion of requests that have executed asynchronously.
@@ -232,7 +235,7 @@  References to events are via handles of abstract type +odp_event_t+. Cast
 functions are provided to convert these into specific handles of the
 appropriate type represented by the event.
 
-=== Queue ===
+=== Queue
 A queue is a message passing channel that holds events.  Events can be
 added to a queue via enqueue operations or removed from a queue via dequeue
 operations. The endpoints of a queue will vary depending on how it is used.
@@ -244,7 +247,7 @@  stateful processing on events as well as stateless processing.
 
 Queues are represented by handles of abstract type +odp_queue_t+.
 
-=== Pool ===
+=== Pool
 A pool is a shared memory area from which elements may be drawn. Pools
 represent the backing store for events, among other things. Pools are
 typically created and destroyed by the application during initialization and
@@ -256,32 +259,32 @@  are Buffer and Packet.
 
 Pools are represented by handles of abstract type +odp_pool_t+.
 
-=== Shared Memory ===
+=== Shared Memory
 Shared memory represents raw blocks of storage that are sharable between
 threads. They are the building blocks of pools but can be used directly by
 ODP applications if desired.
 
 Shared memory is represented by handles of abstract type +odp_shm_t+.
 
-=== Buffer ===
+=== Buffer
 A buffer is a fixed sized block of shared storage that is used by ODP
 components and/or applications to realize their function. Buffers contain
 zero or more bytes of application data as well as system maintained
 metadata that provide information about the buffer, such as its size or the
 pool it was allocated from. Metadata is an important ODP concept because it
 allows for arbitrary amounts of side information to be associated with an
-ODP object. Most ODP objects have assocaited metadata and this metadata is
+ODP object. Most ODP objects have associated metadata and this metadata is
 manipulated via accessor functions that act as getters and setters for
-this information. Getter acces functions permit an application to read
+this information. Getter access functions permit an application to read
 a metadata item, while setter access functions permit an application to write
 a metadata item. Note that some metadata is inherently read only and thus
 no setter is provided to manipulate it.  When object have multiple metadata
 items, each has its own associated getter and/or setter access function to
 inspect or manipulate it.
 
-Buffers are represened by handles of abstract type +odp_buffer_t+.
+Buffers are represented by handles of abstract type +odp_buffer_t+.
 
-=== Packet ===
+=== Packet
 Packets are received and transmitted via I/O interfaces and represent
 the basic data that data plane applications manipulate.
 Packets are drawn from pools of type +ODP_POOL_PACKET+.
@@ -294,7 +297,7 @@  with each packet for its own use.
 
 Packets are represented by handles of abstract type +odp_packet_t+.
 
-=== PktIO ===
+=== PktIO
 PktIO is how ODP represents I/O interfaces. A pktio object is a logical
 port capable of receiving and/or transmitting packets. This may be directly
 supported by the underlying platform as an integrated feature,
@@ -302,7 +305,7 @@  or may represent a device attached via a PCIE or other bus.
 
 PktIOs are represented by handles of abstract type +odp_pktio_t+.
 
-=== Timer ===
+=== Timer
 Timers are how ODP applications measure and respond to the passage of time.
 Timers are drawn from specialized pools called timer pools that have their
 own abstract type (+odp_timer_pool_t+). Applications may have many timers
@@ -310,7 +313,7 @@  active at the same time and can set them to use either relative or absolute
 time. When timers expire they create events of type +odp_timeout_t+, which
 serve as notifications of timer expiration.
 
-=== Synchronizer ===
+=== Synchronizer
 Multiple threads operating in parallel typically require various
 synchronization services to permit them to operate in a reliable and
 coordinated manner. ODP provides a rich set of locks, barriers, and similar
@@ -325,7 +328,7 @@  flow of work through an ODP application. These include the Classifier,
 Scheduler, and Traffic Manager.  These components relate to the three
 main stages of packet processing: Receive, Process, and Transmit.
 
-=== Classifier ===
+=== Classifier
 The *Classifier* provides a suite of APIs that control packet receive (RX)
 processing.
 
@@ -362,8 +365,8 @@  Note that the use of the classifier is optional.  Applications may directly
 receive packets from a corresponding PktIO input queue via direct polling
 if they choose.
 
-=== Scheduler ===
-The *Scheduler* provides a suite of APIs that control scalabable event
+=== Scheduler
+The *Scheduler* provides a suite of APIs that control scalable event
 processing.
 
 .ODP Scheduler and Event Processing
@@ -391,10 +394,10 @@  scheduled back to a thread to continue processing with the results of the
 requested asynchronous operation.
 
 Threads themselves can enqueue events to queues for downstream processing
-by other threads, permitting flexibility in how applicaitions structure
+by other threads, permitting flexibility in how applications structure
 themselves to maximize concurrency.
 
-=== Traffic Manager ===
+=== Traffic Manager
 The *Traffic Manager* provides a suite of APIs that control traffic shaping and
 Quality of Service (QoS) processing for packet output.
 
@@ -413,23 +416,33 @@  goals. Again, the advantage here is that on many platforms traffic management
 functions are implemented in hardware, permitting transparent offload of
 this work.
 
-Glossary
---------
-[glossary]
-odp_worker::
-    An opd_worker is a type of odp_thread. It will usually be isolated from the scheduling of any host operating system and is intended for fast-path processing with a low and predictable latency. Odp_workers will not generally receive interrupts and will run to completion.
-odp_control::
-    An odp_control is a type of odp_thread. It will be isolated from the host operating system house keeping tasks but will be scheduled by it and may receive interrupts.
-odp_thread::
-    An odp_thread is a flow of execution that in a Linux environment could be a Linux process or thread.
-event::
-    An event is a notification that can be placed in a queue.
-
-The include structure
----------------------
-Applications only include the 'include/odp.h file which includes the 'platform/<implementation name>/include/plat' files to provide a complete definition of the API on that platform.
-The doxygen documentation defining the behavior of the ODP API is all contained in the public API files, and the actual definitions for an implementation will be found in the per platform directories.
-Per-platform data that might normally be a #define can be recovered via the appropriate access function if the #define is not directly visible to the application.
+== ODP Application Programming
+At the highest level, an *ODP Application* is a program that uses one or more
+ODP APIs. Because ODP is a framework rather than a programming environment,
+applications are free to also use other APIs that may or may not provide the
+same portability characteristics as ODP APIs.
+
+ODP applications vary in terms of what they do and how they operate, but in
+general all share the following characteristics:
+
+. They are organized into one or more _threads_ that execute in parallel.
+. These threads communicate and coordinate their activities using various
+_synchronization_ mechanisms.
+. They receive packets from one or more _packet I/O interfaces_.
+. They examine, transform, or otherwise process packets.
+. They transmit packets to one or more _packet I/O interfaces_.
+
+ODP provides APIs to assist in each of these areas.
+
+=== The include structure
+Applications only include the 'include/odp.h' file, which includes the
+'platform/<implementation name>/include/odp' files to provide a complete
+definition of the API on that platform. The doxygen documentation defining
+the behavior of the ODP API is all contained in the public API files, and the
+actual definitions for an implementation will be found in the per platform
+directories. Per-platform data that might normally be a +#define+ can be
+recovered via the appropriate access function if the #define is not directly
+visible to the application.
 
 .Users include structure
 ----
@@ -442,51 +455,304 @@  Per-platform data that might normally be a #define can be recovered via the appr
 │   └── odp.h   This file should be the only file included by the application.
 ----
 
-Initialization
---------------
-IMPORTANT: ODP depends on the application to perform a graceful shutdown, calling the terminate functions should only be done when the application is sure it has closed the ingress and subsequently drained all queues etc.
+=== Initialization
+IMPORTANT: ODP depends on the application to perform a graceful shutdown,
+calling the terminate functions should only be done when the application is
+sure it has closed the ingress and subsequently drained all queues, etc.
+
+=== Startup
+The first API that must be called by an ODP application is 'odp_init_global()'.
+This takes two pointers. The first, +odp_init_t+, contains ODP initialization
+data that is platform independent and portable, while the second,
++odp_platform_init_t+, is passed unparsed to the implementation
+to be used for platform specific data that is not yet, or may never be
+suitable for the ODP API.
+
+Calling odp_init_global() establishes the ODP API framework and MUST be
+called before any other ODP API may be called. Note that it is only called
+once per application. Following global initialization, each thread in turn
+calls 'odp_init_local()' is called. This establishes the local ODP thread
+context for that thread and MUST be called before other ODP APIs may be
+called by that thread.
+
+=== Shutdown
+Shutdown is the logical reverse of the initialization procedure, with
+'odp_term_local()' called for each thread before 'odp_term_global()' is
+called to terminate ODP.
+
+.ODP Application Structure Flow Diagram
+image::../images/resource_management.png[align="center"]
 
-Startup
-~~~~~~~~
-The first API that must be called is 'odp_init_global()'.
-This takes two pointers, odp_init_t contains ODP initialization data that is platform independent and portable.
-The second odp_platform_init_t is passed un parsed to the  implementation and can be used for platform specific data that is not yet, or may never be suitable for the ODP API.
+== Common Conventions
+Many ODP APIs share common conventions regarding their arguments and return
+types. This section highlights some of the more common and frequently used
+conventions.
+
+=== Handles and Special Designators
+ODP resources are represented via _handles_ that have abstract type
+_odp_resource_t_.  So pools are represented by handles of type +odp_pool_t+,
+queues by handles of type +odp_queue_t+, etc. Each such type
+has a distinguished type _ODP_RESOURCE_INVALID_ that is used to indicate a
+handle that does not refer to a valid resource of that type. Resources are
+typically created via an API named _odp_resource_create()_ that returns a
+handle of type _odp_resource_t_ that represents the created object. This
+returned handle is set to _ODP_RESOURCE_INVALID_ if, for example, the
+resource could not be created due to resource exhaustion. Invalid resources
+do not necessarily represent error conditions. For example, +ODP_EVENT_INVALID+
+in response to an +odp_queue_deq()+ call to get an event from a queue simply
+indicates that the queue is empty.
+
+=== Addressing Scope
+Unless specifically noted in the API, all ODP resources are global to the ODP
+application, whether it runs as a single process or multiple processes. ODP
+handles therefore have common meaning within an ODP application but have no
+meaning outside the scope of the application.
+
+=== Resources and Names
+Many ODP resource objects, such as pools and queues, support an
+application-specified character string _name_ that is associated with an ODP
+object at create time.  This name serves two purposes: documentation, and
+lookup. The lookup function is particularly useful to allow an ODP application
+that is divided into multiple processes to obtain the handle for the common
+resource.
+
+== Queues
+Queues are the fundamental event sequencing mechanism provided by ODP and all
+ODP applications make use of them either explicitly or implicitly. Queues are
+created via the 'odp_queue_create()' API that returns a handle of type
++odp_queue_t+ that is used to refer to this queue in all subsequent APIs that
+reference it. Queues have one of two ODP-defined _types_, POLL, and SCHED that
+determine how they are used. POLL queues directly managed by the ODP
+application while SCHED queues make use of the *ODP scheduler* to provide
+automatic scalable dispatching and synchronization services.
+
+.Operations on POLL queues
+[source,c]
+----
+odp_queue_t poll_q1 = odp_queue_create("poll queue 1", ODP_QUEUE_TYPE_POLL, NULL);
+odp_queue_t poll_q2 = odp_queue_create("poll queue 2", ODP_QUEUE_TYPE_POLL, NULL);
+...
+odp_event_t ev = odp_queue_deq(poll_q1);
+...do something
+int rc = odp_queue_enq(poll_q2, ev);
+----
 
-The second API that must be called is 'odp_init_local()', this must be called once per odp_thread, regardless of odp_thread type.  Odp_threads may be of type ODP_THREAD_WORKER or ODP_THREAD_CONTROL
+The key distinction is that dequeueing events from POLL queues is an
+application responsibility while dequeueing events from SCHED queues is the
+responsibility of the ODP scheduler.
 
-Shutdown
-~~~~~~~~~
-Shutdown is the logical reverse of the initialization procedure, with 'odp_thread_term()' called for each worker before 'odp_term_global()' is called.
+.Operations on SCHED queues
+[source,c]
+----
+odp_queue_param_t qp;
+odp_queue_param_init(&qp);
+odp_schedule_prio_t prio = ...;
+odp_schedule_group_t sched_group = ...;
+qp.sched.prio = prio;
+qp.sched.sync = ODP_SCHED_SYNC_[NONE|ATOMIC|ORDERED];
+qp.sched.group = sched_group;
+qp.lock_count = n; /* Only relevant for ordered queues */
+odp_queue_t sched_q1 = odp_queue_create("sched queue 1", ODP_QUEUE_TYPE_SCHED, &qp);
+
+...thread init processing
+
+while (1) {
+        odp_event_t ev;
+        odp_queue_t which_q;
+        ev = odp_schedule(&which_q, <wait option>);
+        ...process the event
+}
+----
 
-image::../images/resource_management.png[align="center"]
+With scheduled queues, events are sent to a queue, and the the sender chooses
+a queue based on the service it needs. The sender does not need to know
+which ODP thread (on which core) or hardware accelerator will process
+the event, but all the events on a queue are eventually scheduled and processed.
+
+As can be seen, SCHED queues have additional attributes that are specified at
+queue create that control how the scheduler is to process events contained
+on them. These include group, priority, and synchronization class.
+
+=== Scheduler Groups
+The scheduler's dispatching job is to return the next event from the highest
+priority SCHED queue that the caller is eligible to receive events from.
+This latter consideration is determined by the queues _scheduler group_, which
+is set at queue create time, and by the caller's _scheduler group mask_ that
+indicates which scheduler group(s) it belongs to. Scheduler groups are
+represented by handles of type +odp_scheduler_group_t+ and are created by
+the *odp_scheduler_group_create()* API. A number of scheduler groups are
+_predefined_ by ODP.  These include +ODP_SCHED_GROUP_ALL+ (all threads),
++ODP_SCHED_GROUP_WORKER+ (all worker threads), and +ODP_SCHED_GROUP_CONTROL+
+(all control threads). The application is free to create additional scheduler
+groups for its own purpose and threads can join or leave scheduler groups
+using the *odp_scheduler_group_join()* and *odp_scheduler_group_leave()* APIs
+
+=== Scheduler Priority
+The +prio+ field of the +odp_queue_param_t+ specifies the queue's scheduling
+priority, which is how queues within eligible scheduler groups are selected
+for dispatch. Queues have a default scheduling priority of NORMAL but can be
+set to HIGHEST or LOWEST according to application needs.
+
+=== Scheduler Synchronization
+In addition to its dispatching function, which provide automatic scalability to
+ODP applications in many core environments, the other main function of the
+scheduler is to provide event synchronization services that greatly simplify
+application programming in a parallel processing environment. A queue's
+SYNC mode determines how the scheduler handles the synchronization processing
+of multiple events originating from the same queue.
+
+Three types of queue scheduler synchronization area supported: Parallel,
+Atomic, and Ordered.
+
+==== Parallel Queues
+SCHED queues that specify a sync mode of ODP_SCHED_SYNC_NONE are unrestricted
+in how events are processed.
+
+.Parallel Queue Scheduling
+image::../images/parallel_queue.png[align="center"]
 
-Queues
-------
-There are three queue types, atomic, ordered and parallel.
-A queue belongs to a single odp_worker and a odp_worker may have multiple queues.
+All events held on parallel queues are eligible to be scheduled simultaneously
+and any required synchronization between them is the responsibility of the
+application. Events originating from parallel queues thus have the highest
+throughput rate, however they also potentially involve the most work on the
+part of the application. In the Figure above, four threads are calling
+*odp_schedule()* to obtain events to process. The scheduler has assigned
+three events from the first queue to three threads in parallel. The fourth
+thread is processing a single event from the third queue. The second queue
+might either be empty, of lower priority, or not in a scheduler group matching
+any of the threads being serviced by the scheduler.
+
+=== Atomic Queues
+Atomic queues simplify event synchronization because only a single event
+from a given atomic queue may be processed at a time. Events scheduled from
+atomic queues thus can be processed lock free because the locking is being
+done implicitly by the scheduler.
+
+.Atomic Queue Scheduling
+image::../images/atomic_queue.png[align="center"]
 
-Events are sent to a queue, and the the sender chooses a queue based on the service it needs.
-The sender does not need to know which odp_worker (on which core) or HW accelerator will process the event, but all the events on a queue are eventually scheduled and processed.
+In this example, no matter how many events may be held in an atomic queue, only
+one of them can be scheduled at a time. Here two threads process events from
+two different atomic queues. Note that there is no synchronization between
+different atomic queues, only between events originating from the same atomic
+queue. The queue context associated with the atomic queue is held until the
+next call to the scheduler or until the application explicitly releases it
+via a call to *odp_schedule_release_atomic()*.
 
-NOTE: Both ordered and parallel queue types improve throughput over an atomic queue (due to parallel event processing), but the user has to take care of the context data synchronization (if needed).
+Note that while atomic queues simplify programming, the serial nature of
+atomic queues will impair scaling.
 
-Atomic Queue
-~~~~~~~~~~~~
-Only one event at a time may be processed from a given queue. The processing maintains order and context data synchronization but this will impair scaling.
+=== Ordered Queues
+Ordered queues provide the best of both worlds by providing the inherent
+scaleabilty of parallel queues, with the easy synchronization of atomic
+queues.
 
-.Overview Atomic Queue processing
-image::../images/atomic_queue.png[align="center"]
+.Ordered Queue Scheduling
+image::../images/ordered_queue.png[align="center"]
 
-Ordered Queue
-~~~~~~~~~~~~~
-An ordered queue will ensure that the sequencing at the output is identical to that of the input, but multiple events may be processed simultaneously and the order is restored before the events move to the next queue
+When scheduling events from an ordered queue, the scheduler dispatches multiple
+events from the queue in parallel to different threads, however the scheduler
+also ensures that the relative sequence of these events on output queues
+is identical to their sequence from their originating ordered queue.
+
+As with atomic queues, the ordering guarantees associated with ordered queues
+refer to events originating from the same queue, not for those originating on
+different queues. Thus in this figure three thread are processing events 5, 3,
+and 4, respectively from the first ordered queue. Regardless of how these
+threads complete processing, these events will appear in their original
+relative order on their output queue.
+
+==== Order Preservation
+Relative order is preserved independent of whether events are being sent to
+different output queues.  For example, if some events are sent to output queue
+A while others are sent to output queue B then the events on these output
+queues will still be in the same relative order as they were on their
+originating queue.  Similarly, if the processing consumes events so that no
+output is issued for some of them (_e.g.,_ as part of IP fragment reassembly
+processing) then other events will still be correctly ordered with respect to
+these sequence gaps. Finally, if multiple events are enqueued for a given
+order (_e.g.,_ as part of packet segmentation processing for MTU
+considerations), then each of these events will occupy the originator's
+sequence in the target output queue(s). In this case the relative order of these
+events will be in the order that the thread issued *odp_queue_enq()* calls for
+them.
+
+The ordered context associated with the dispatch of an event from an ordered
+queue lasts until the next scheduler call or until explicitly released by
+the thread calling *odp_schedule_release_ordered()*. This call may be used
+as a performance advisory that the thread no longer requires ordering
+guarantees for the current context. As a result, any subsequent enqueues
+within the current scheduler context will be treated as if the thread was
+operating in a parallel queue context.
+
+==== Ordered Locking
+Another powerful feature of the scheduler's handling of ordered queues is
+*ordered locks*. Each ordered queue has associated with it a number of ordered
+locks as specified by the _lock_count_ parameter at queue create time.
+
+Ordered locks provide an efficient means to perform in-order sequential
+processing within an ordered context. For example, supposed events with relative
+order 5, 6, and 7 are executing in parallel by three different threads. An
+ordered lock will enable these threads to synchronize such that they can
+perform some critical section in their originating queue order. The number of
+ordered locks supported for each ordered queue is implementation dependent (and
+queryable via the *odp_config_max_ordered_locks_per_queue()* API). If the
+implementation supports multiple ordered locks then these may be used to
+protect different ordered critical sections within a given ordered context.
+
+==== Summary: Ordered Queues
+To see how these considerations fit together, consider the following code:
+
+.Processing with Ordered Queues
+[source,c]
+----
+void worker_thread()
+        odp_init_local();
+        ...other initialization processing
+
+        while (1) {
+                ev = odp_schedule(&which_q, ODP_SCHED_WAIT);
+                ...process events in parallel
+                odp_schedule_order_lock(0);
+                ...critical section processed in order
+                odp_schedule_order_unlock(0);
+                ...continue processing in parallel
+                odp_queue_enq(dest_q, ev);
+        }
+}
+----
 
-.Overview Ordered Queue processing
-image::../images/ordered_queue.png[align="center"]
+This represents a simplified structure for a typical worker thread operating
+on ordered queues. Multiple events are processed in parallel and the use of
+ordered queues ensures that they will be placed on +dest_q+ in the same order
+as they originated.  While processing in parallel, the use of ordered locks
+enables critical sections to be processed in order within the overall parallel
+flow. When a thread arrives at the _odp_schedule_order_lock()_ call, it waits
+until the locking order for this lock for all prior events has been resolved
+and then enters the critical section. The _odp_schedule_order_unlock()_ call
+releases the critical section and allows the next order to enter it.
 
-Parallel Queue
-~~~~~~~~~~~~~~
-There are no restrictions on the number of events being processed.
+=== Queue Scheduling Summary
 
-.Overview parallel Queue processing
-image::../images/parallel_queue.png[align="center"]
+NOTE: Both ordered and parallel queues improve throughput over atomic queues
+due to parallel event processing, but require that the application take
+steps to ensure context data synchronization if needed.
+
+== Glossary
+[glossary]
+worker thread::
+    A worker is a type of ODP thread. It will usually be isolated from
+    the scheduling of any host operating system and is intended for fast-path
+    processing with a low and predictable latency. Worker threads will not
+    generally receive interrupts and will run to completion.
+control thread::
+    A control threadis a type of ODP thread. It will be isolated from the host
+    operating system house keeping tasks but will be scheduled by it and may
+    receive interrupts.
+thread::
+    An ODP thread is a flow of execution that in a Linux environment could be
+    a Linux process or thread.
+event::
+    An event is a notification that can be placed in a queue.
+queue::
+    A communication channel that holds events