diff mbox

Classification Queue Group

Message ID CAGr7dC3CXnsB9x_xBiy8qOm6sgLMGq0dU8pnnx1baFNJT6VUyg@mail.gmail.com
State New
Headers show

Commit Message

Balasubramanian Manoharan Nov. 7, 2016, 11:16 a.m. UTC
Hi,

This mail thread discusses the design of classification queue group
RFC. The same can be found in the google doc whose link is given
below.
Users can provide their comments either in this mail thread or in the
google doc as per their convenience.

https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

The basic issues with queues as being a single target for a CoS are two fold:

Queues must be created and deleted individually. This imposes a
significant burden when queues are used to represent individual flows
since the application may need to process thousands (or millions) of
flows.
A single PMR can only match a packet to a single queue associated with
a target CoS. This prohibits efficient capture of subfield
classification.
To solve these issues, Tiger Moth introduces the concept of a queue
group. A queue group is an extension to the existing queue
specification in a Class of Service.

Queue groups solve the classification issues associated with
individual queues in three ways:

* The odp_queue_group_create() API can create a large number of
related queues with a single call.
* A single PMR can spread traffic to many queues associated with the
same CoS by assigning packets matching the PMR to a queue group rather
than a queue.
* A hashed PMR subfield is used to distribute individual queues within
a queue group for scheduling purposes.

Comments

Bill Fischofer Nov. 10, 2016, 2:04 a.m. UTC | #1
On Mon, Nov 7, 2016 at 5:16 AM, Bala Manoharan <bala.manoharan@linaro.org>
wrote:

> Hi,

>

> This mail thread discusses the design of classification queue group

> RFC. The same can be found in the google doc whose link is given

> below.

> Users can provide their comments either in this mail thread or in the

> google doc as per their convenience.

>

> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

> slR93LZ8VXqM2o/edit?usp=sharing

>

> The basic issues with queues as being a single target for a CoS are two

> fold:

>

> Queues must be created and deleted individually. This imposes a

> significant burden when queues are used to represent individual flows

> since the application may need to process thousands (or millions) of

> flows.

> A single PMR can only match a packet to a single queue associated with

> a target CoS. This prohibits efficient capture of subfield

> classification.

> To solve these issues, Tiger Moth introduces the concept of a queue

> group. A queue group is an extension to the existing queue

> specification in a Class of Service.

>

> Queue groups solve the classification issues associated with

> individual queues in three ways:

>

> * The odp_queue_group_create() API can create a large number of

> related queues with a single call.

> * A single PMR can spread traffic to many queues associated with the

> same CoS by assigning packets matching the PMR to a queue group rather

> than a queue.

> * A hashed PMR subfield is used to distribute individual queues within

> a queue group for scheduling purposes.

>

>

> diff --git a/include/odp/api/spec/classification.h

> b/include/odp/api/spec/classification.h

> index 6eca9ab..cf56852 100644

> --- a/include/odp/api/spec/classification.h

> +++ b/include/odp/api/spec/classification.h

> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>

> /** A Boolean to denote support of PMR range */

> odp_bool_t pmr_range_supported;

> +

> + /** A Boolean to denote support of queue group */

> + odp_bool_t queue_group_supported;

>


To date we've not introduced optional APIs into ODP so I'm not sure if we'd
want to start here. If we are adding queue groups, all ODP implementations
should be expected to support queue groups, so this flag shouldn't be
needed. Limits on the support (e.g., max number of queue groups supported,
etc.) are appropriate, but there shouldn't be an option to not support them
at all.


> +

> + /** A Boolean to denote support of queue */

> + odp_bool_t queue_supported;

>


Not sure what the intent is here. Is this anticipating that some
implementations might only support sending flows to queue groups and not to
individual queues? That would be a serious functional regression and not
something we'd want to encourage.


> } odp_cls_capability_t;

>

>

> /**

> @@ -162,7 +168,18 @@ typedef enum {

>  * Used to communicate class of service creation options

>  */

> typedef struct odp_cls_cos_param {

> - odp_queue_t queue; /**< Queue associated with CoS */

> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>


Perhaps not the best choice of discriminator names. Perhaps
ODP_COS_TYPE_QUEUE and ODP_COS_TYPE_GROUP might be simpler?


> + */

> + odp_queue_type_e type;

>


We already have an odp_queue_type_t defined for the queue APIs so this name
would be confusingly similar. We're really identifying what the type of the
CoS is so perhaps odp_cos_type_t might be better here? That would be
consistent with the ODP_QUEUE_TYPE_PLAIN and ODP_QUEUE_TYPE_SCHED used in
the odp_queue_type_t enum.


> +

> + typedef union {

> + /** Queue associated with CoS */

> + odp_queue_t queue;

> +

> + /** Queue Group associated with CoS */

> + odp_queue_group_t queue_group;

> + };

> odp_pool_t pool; /**< Pool associated with CoS */

> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> } odp_cls_cos_param_t;

>

>

> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

> index 51d94a2..7dde060 100644

> --- a/include/odp/api/spec/queue.h

> +++ b/include/odp/api/spec/queue.h

> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t

> *param);

>

> +/**

> + * Queue group capability

> + * This capability structure defines system Queue Group capability

> + */

> +typedef struct odp_queue_group_capability_t {

> + /** Number of queues supported per queue group */

> + unsigned supported_queues;

>


We usually specify limits with a max_ prefix, so max_queues would be better
than supported_queues here.


> + /** Supported protocol fields for hashing*/

> + odp_pktin_hash_proto_t supported;

>


"supported" by itself is unclear. Perhaps hashable_fields might be better?


> +}

> +

> +/**

> + * ODP Queue Group parameters

> + * Queue group supports only schedule queues <TBD??>

>


I thought we decided that this would be the case since the notion of
polling the individual queues within a queue group (of which there might be
a very large number) would seem counterproductive. So I think we can drop
the TBD here.


> + */

> +typedef struct odp_queue_group_param_t {

> + /** Number of queue to be created for this queue group

> + * implementation may round up the value to nearest power of 2

> + * and value should be less than the number of queues

> + * supported per queue group

> + */

> + unsigned num_queue;

>


We also need to be able to specify the amount of per-queue context data
required for each queue in the queue group, as well as the
odp_schedule_param_t to be used for each queue to assign queue priorities,
schedule groups, etc.


> +

> + /** Protocol field selection for queue group distribution

> + * Multiple fields can be selected in combination

> + */

> + odp_queue_group_hash_proto_t hash;

> +

> +} odp_queue_group_param_t;

> +

> +/**

> + * Initialize queue group params

> + *

> + * Initialize an odp_queue_group_param_t to its default values for all

> fields.

> + *

> + * @param param   Address of the odp_queue_group_param_t to be initialized

> + */

> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> +

> +/**

> + * Queue Group create

> + *

> + * Create a queue group according to the queue group parameters.

> + * The individual queues belonging to a queue group are created by the

> + * implementation and the distribution of packets into those queues are

> + * decided based on the odp_queue_group_hash_proto_t parameters.

> + * The individual queues within a queue group are both created and deleted

> + * by the implementation.

> + *

> + * @param name    Queue Group name

> + * @param param   Queue Group parameters.

> + *

> + * @return Queue group handle

> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> + */

> +odp_queue_group_t odp_queue_group_create(const char *name,

> + const odp_queue_group_param_t *param);

>


We also need odp_queue_group_lookup(), odp_queue_group_capability(), and
odp_queue_group_destroy() APIs for completeness and symmetry with the queue
APIs.

In addition, an API that would allow a reference to a particular queue
within a queue group by index number would perhaps be useful.  As well as
an API for determining whether a queue is a member of a queue group. For
example:

odp_queue_t odp_queue_group_queue(odp_queue_group_t qgroup, uint32_t ndx);

odp_queue_group_t odp_queue_group(odp_queue_t queue);

These would return ODP_QUEUE_INVALID for out-of-range ndx values or
ODP_QUEUE_GROUP_INVALID if the queue is not a member of a queue group.



> Regards,

> Bala

>
Brian Brooks Nov. 10, 2016, 7:56 a.m. UTC | #2
On 11/07 16:46:12, Bala Manoharan wrote:
> Hi,


Hiya

> This mail thread discusses the design of classification queue group

> RFC. The same can be found in the google doc whose link is given

> below.

> Users can provide their comments either in this mail thread or in the

> google doc as per their convenience.

> 

> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

> 

> The basic issues with queues as being a single target for a CoS are two fold:

> 

> Queues must be created and deleted individually. This imposes a

> significant burden when queues are used to represent individual flows

> since the application may need to process thousands (or millions) of

> flows.


Wondering why there is an issue with creating and deleting queues individually
if queue objects represent millions of flows..

Could an application ever call odp_schedule() and receive an event (e.g. packet)
from a queue (of opaque type odp_queue_t) and that queue has never been created
by the application (via odp_queue_create())? Could that ever happen from the
hardware, and could the application ever handle that?

Or, is it related to memory usage? The reference implementation
struct queue_entry_s is 320 bytes on a 64-bit machine.

  2^28 ~= 268,435,456 queues -> 81.920 GB
  2^26 ~=  67,108,864 queues -> 20.480 GB
  2^22 ~=   4,194,304 queues ->  1.280 GB

Forget about 320 bytes per queue, if each queue was represented by a 32-bit
integer (4 bytes!) the usage would be:

  2^28 ~= 268,435,456 queues ->  1.024 GB
  2^26 ~=  67,108,864 queues ->    256 MB
  2^22 ~=   4,194,304 queues ->     16 MB

That still might be a lot of usage if the application must explicitly create
every queue (before it is used) and require an ODP implementation to map
between every ODP queue object (opaque type) and the internal queue.

Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque
pointer is used to point to some other software object. This object should be
larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it
could just be represented in a 64-bit (or 32-bit) integer type value!

To support millions of queues (flows) should odp_queue_t be an integer type in
the API? A software-only implementation may still use 320 bytes per queue and
use that integer as an index into an array or as a key for lookup operation on a
data structure containing queues. An implementation with hardware assist may
use this integer value directly when interfacing with hardware!

Would it still be necessary to assign a "name" to each queue (flow)?

Would a queue (flow) also require an "op type" to explicitly specify whether
access to the queue (flow) is threadsafe? Atomic queues are threadsafe since
only 1 core at any given time can recieve from it. Parallel queues are also
threadsafe. Are all ODP APIs threadsafe?

> A single PMR can only match a packet to a single queue associated with

> a target CoS. This prohibits efficient capture of subfield

> classification.


odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible
to create a single PMR which matches multiple fields of a packet. I can imagine
a case where a packet matches pmr1 (match Vlan) and also matches pmr2
(match Vlan AND match L3DestIP). Is that an example of subfield classification?
How does the queue relate?

> To solve these issues, Tiger Moth introduces the concept of a queue

> group. A queue group is an extension to the existing queue

> specification in a Class of Service.

> 

> Queue groups solve the classification issues associated with

> individual queues in three ways:

> 

> * The odp_queue_group_create() API can create a large number of

> related queues with a single call.


If the application calls this API, does that mean the ODP implementation
can create a large number of queues? What happens if the application
receives an event on a queue that was created by the implmentation--how
does the application know whether this queue was created by the hardware
according to the ODP Classification or whether the queue was created by
the application?

> * A single PMR can spread traffic to many queues associated with the

> same CoS by assigning packets matching the PMR to a queue group rather

> than a queue.

> * A hashed PMR subfield is used to distribute individual queues within

> a queue group for scheduling purposes.


Is there a way to write a test case for this? Trying to think of what kind of
packets (traffic distribution) and how those packets would get classified and
get assigned to queues.

> diff --git a/include/odp/api/spec/classification.h

> b/include/odp/api/spec/classification.h

> index 6eca9ab..cf56852 100644

> --- a/include/odp/api/spec/classification.h

> +++ b/include/odp/api/spec/classification.h

> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> 

> /** A Boolean to denote support of PMR range */

> odp_bool_t pmr_range_supported;

> +

> + /** A Boolean to denote support of queue group */

> + odp_bool_t queue_group_supported;

> +

> + /** A Boolean to denote support of queue */

> + odp_bool_t queue_supported;

> } odp_cls_capability_t;

> 

> 

> /**

> @@ -162,7 +168,18 @@ typedef enum {

>  * Used to communicate class of service creation options

>  */

> typedef struct odp_cls_cos_param {

> - odp_queue_t queue; /**< Queue associated with CoS */

> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

> + */

> + odp_queue_type_e type;

> +

> + typedef union {

> + /** Queue associated with CoS */

> + odp_queue_t queue;

> +

> + /** Queue Group associated with CoS */

> + odp_queue_group_t queue_group;

> + };

> odp_pool_t pool; /**< Pool associated with CoS */

> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> } odp_cls_cos_param_t;

> 

> 

> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

> index 51d94a2..7dde060 100644

> --- a/include/odp/api/spec/queue.h

> +++ b/include/odp/api/spec/queue.h

> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

> 

> +/**

> + * Queue group capability

> + * This capability structure defines system Queue Group capability

> + */

> +typedef struct odp_queue_group_capability_t {

> + /** Number of queues supported per queue group */

> + unsigned supported_queues;

> + /** Supported protocol fields for hashing*/

> + odp_pktin_hash_proto_t supported;

> +}

> +

> +/**

> + * ODP Queue Group parameters

> + * Queue group supports only schedule queues <TBD??>

> + */

> +typedef struct odp_queue_group_param_t {

> + /** Number of queue to be created for this queue group

> + * implementation may round up the value to nearest power of 2

> + * and value should be less than the number of queues

> + * supported per queue group

> + */

> + unsigned num_queue;

> +

> + /** Protocol field selection for queue group distribution

> + * Multiple fields can be selected in combination

> + */

> + odp_queue_group_hash_proto_t hash;

> +

> +} odp_queue_group_param_t;

> +

> +/**

> + * Initialize queue group params

> + *

> + * Initialize an odp_queue_group_param_t to its default values for all fields.

> + *

> + * @param param   Address of the odp_queue_group_param_t to be initialized

> + */

> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> +

> +/**

> + * Queue Group create

> + *

> + * Create a queue group according to the queue group parameters.

> + * The individual queues belonging to a queue group are created by the

> + * implementation and the distribution of packets into those queues are

> + * decided based on the odp_queue_group_hash_proto_t parameters.

> + * The individual queues within a queue group are both created and deleted

> + * by the implementation.

> + *

> + * @param name    Queue Group name

> + * @param param   Queue Group parameters.

> + *

> + * @return Queue group handle

> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> + */

> +odp_queue_group_t odp_queue_group_create(const char *name,

> + const odp_queue_group_param_t *param);

> Regards,

> Bala
François Ozog Nov. 10, 2016, 8:07 a.m. UTC | #3
Good points Brian.

I can add some data point with real life observations of a DPI box
analyzing two 10Gbps links (four 10 Gbps ports): there are approximately
50M known flows (TCP, UDP...), may be not active (TCP Close_wait).

FF

On 10 November 2016 at 08:56, Brian Brooks <brian.brooks@linaro.org> wrote:

> On 11/07 16:46:12, Bala Manoharan wrote:

> > Hi,

>

> Hiya

>

> > This mail thread discusses the design of classification queue group

> > RFC. The same can be found in the google doc whose link is given

> > below.

> > Users can provide their comments either in this mail thread or in the

> > google doc as per their convenience.

> >

> > https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

> slR93LZ8VXqM2o/edit?usp=sharing

> >

> > The basic issues with queues as being a single target for a CoS are two

> fold:

> >

> > Queues must be created and deleted individually. This imposes a

> > significant burden when queues are used to represent individual flows

> > since the application may need to process thousands (or millions) of

> > flows.

>

> Wondering why there is an issue with creating and deleting queues

> individually

> if queue objects represent millions of flows..

>

> Could an application ever call odp_schedule() and receive an event (e.g.

> packet)

> from a queue (of opaque type odp_queue_t) and that queue has never been

> created

> by the application (via odp_queue_create())? Could that ever happen from

> the

> hardware, and could the application ever handle that?

>

> Or, is it related to memory usage? The reference implementation

> struct queue_entry_s is 320 bytes on a 64-bit machine.

>

>   2^28 ~= 268,435,456 queues -> 81.920 GB

>   2^26 ~=  67,108,864 queues -> 20.480 GB

>   2^22 ~=   4,194,304 queues ->  1.280 GB

>

> Forget about 320 bytes per queue, if each queue was represented by a 32-bit

> integer (4 bytes!) the usage would be:

>

>   2^28 ~= 268,435,456 queues ->  1.024 GB

>   2^26 ~=  67,108,864 queues ->    256 MB

>   2^22 ~=   4,194,304 queues ->     16 MB

>

> That still might be a lot of usage if the application must explicitly

> create

> every queue (before it is used) and require an ODP implementation to map

> between every ODP queue object (opaque type) and the internal queue.

>

> Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An

> opaque

> pointer is used to point to some other software object. This object should

> be

> larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)

> otherwise it

> could just be represented in a 64-bit (or 32-bit) integer type value!

>

> To support millions of queues (flows) should odp_queue_t be an integer

> type in

> the API? A software-only implementation may still use 320 bytes per queue

> and

> use that integer as an index into an array or as a key for lookup

> operation on a

> data structure containing queues. An implementation with hardware assist

> may

> use this integer value directly when interfacing with hardware!

>

> Would it still be necessary to assign a "name" to each queue (flow)?

>

> Would a queue (flow) also require an "op type" to explicitly specify

> whether

> access to the queue (flow) is threadsafe? Atomic queues are threadsafe

> since

> only 1 core at any given time can recieve from it. Parallel queues are also

> threadsafe. Are all ODP APIs threadsafe?

>

> > A single PMR can only match a packet to a single queue associated with

> > a target CoS. This prohibits efficient capture of subfield

> > classification.

>

> odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is

> possible

> to create a single PMR which matches multiple fields of a packet. I can

> imagine

> a case where a packet matches pmr1 (match Vlan) and also matches pmr2

> (match Vlan AND match L3DestIP). Is that an example of subfield

> classification?

> How does the queue relate?

>

> > To solve these issues, Tiger Moth introduces the concept of a queue

> > group. A queue group is an extension to the existing queue

> > specification in a Class of Service.

> >

> > Queue groups solve the classification issues associated with

> > individual queues in three ways:

> >

> > * The odp_queue_group_create() API can create a large number of

> > related queues with a single call.

>

> If the application calls this API, does that mean the ODP implementation

> can create a large number of queues? What happens if the application

> receives an event on a queue that was created by the implmentation--how

> does the application know whether this queue was created by the hardware

> according to the ODP Classification or whether the queue was created by

> the application?

>

> > * A single PMR can spread traffic to many queues associated with the

> > same CoS by assigning packets matching the PMR to a queue group rather

> > than a queue.

> > * A hashed PMR subfield is used to distribute individual queues within

> > a queue group for scheduling purposes.

>

> Is there a way to write a test case for this? Trying to think of what kind

> of

> packets (traffic distribution) and how those packets would get classified

> and

> get assigned to queues.

>

> > diff --git a/include/odp/api/spec/classification.h

> > b/include/odp/api/spec/classification.h

> > index 6eca9ab..cf56852 100644

> > --- a/include/odp/api/spec/classification.h

> > +++ b/include/odp/api/spec/classification.h

> > @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> >

> > /** A Boolean to denote support of PMR range */

> > odp_bool_t pmr_range_supported;

> > +

> > + /** A Boolean to denote support of queue group */

> > + odp_bool_t queue_group_supported;

> > +

> > + /** A Boolean to denote support of queue */

> > + odp_bool_t queue_supported;

> > } odp_cls_capability_t;

> >

> >

> > /**

> > @@ -162,7 +168,18 @@ typedef enum {

> >  * Used to communicate class of service creation options

> >  */

> > typedef struct odp_cls_cos_param {

> > - odp_queue_t queue; /**< Queue associated with CoS */

> > + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> > + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

> > + */

> > + odp_queue_type_e type;

> > +

> > + typedef union {

> > + /** Queue associated with CoS */

> > + odp_queue_t queue;

> > +

> > + /** Queue Group associated with CoS */

> > + odp_queue_group_t queue_group;

> > + };

> > odp_pool_t pool; /**< Pool associated with CoS */

> > odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> > } odp_cls_cos_param_t;

> >

> >

> > diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

> > index 51d94a2..7dde060 100644

> > --- a/include/odp/api/spec/queue.h

> > +++ b/include/odp/api/spec/queue.h

> > @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> > odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t

> *param);

> >

> > +/**

> > + * Queue group capability

> > + * This capability structure defines system Queue Group capability

> > + */

> > +typedef struct odp_queue_group_capability_t {

> > + /** Number of queues supported per queue group */

> > + unsigned supported_queues;

> > + /** Supported protocol fields for hashing*/

> > + odp_pktin_hash_proto_t supported;

> > +}

> > +

> > +/**

> > + * ODP Queue Group parameters

> > + * Queue group supports only schedule queues <TBD??>

> > + */

> > +typedef struct odp_queue_group_param_t {

> > + /** Number of queue to be created for this queue group

> > + * implementation may round up the value to nearest power of 2

> > + * and value should be less than the number of queues

> > + * supported per queue group

> > + */

> > + unsigned num_queue;

> > +

> > + /** Protocol field selection for queue group distribution

> > + * Multiple fields can be selected in combination

> > + */

> > + odp_queue_group_hash_proto_t hash;

> > +

> > +} odp_queue_group_param_t;

> > +

> > +/**

> > + * Initialize queue group params

> > + *

> > + * Initialize an odp_queue_group_param_t to its default values for all

> fields.

> > + *

> > + * @param param   Address of the odp_queue_group_param_t to be

> initialized

> > + */

> > +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> > +

> > +/**

> > + * Queue Group create

> > + *

> > + * Create a queue group according to the queue group parameters.

> > + * The individual queues belonging to a queue group are created by the

> > + * implementation and the distribution of packets into those queues are

> > + * decided based on the odp_queue_group_hash_proto_t parameters.

> > + * The individual queues within a queue group are both created and

> deleted

> > + * by the implementation.

> > + *

> > + * @param name    Queue Group name

> > + * @param param   Queue Group parameters.

> > + *

> > + * @return Queue group handle

> > + * @retval ODP_QUEUE_GROUP_INVALID on failure

> > + */

> > +odp_queue_group_t odp_queue_group_create(const char *name,

> > + const odp_queue_group_param_t *param);

> > Regards,

> > Bala

>




-- 
[image: Linaro] <http://www.linaro.org/>
François-Frédéric Ozog | *Director Linaro Networking Group*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog
Balasubramanian Manoharan Nov. 10, 2016, 9:14 a.m. UTC | #4
Regards,
Bala


On 10 November 2016 at 07:34, Bill Fischofer <bill.fischofer@linaro.org> wrote:
>

>

> On Mon, Nov 7, 2016 at 5:16 AM, Bala Manoharan <bala.manoharan@linaro.org>

> wrote:

>>

>> Hi,

>>

>> This mail thread discusses the design of classification queue group

>> RFC. The same can be found in the google doc whose link is given

>> below.

>> Users can provide their comments either in this mail thread or in the

>> google doc as per their convenience.

>>

>>

>> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>>

>> The basic issues with queues as being a single target for a CoS are two

>> fold:

>>

>> Queues must be created and deleted individually. This imposes a

>> significant burden when queues are used to represent individual flows

>> since the application may need to process thousands (or millions) of

>> flows.

>> A single PMR can only match a packet to a single queue associated with

>> a target CoS. This prohibits efficient capture of subfield

>> classification.

>> To solve these issues, Tiger Moth introduces the concept of a queue

>> group. A queue group is an extension to the existing queue

>> specification in a Class of Service.

>>

>> Queue groups solve the classification issues associated with

>> individual queues in three ways:

>>

>> * The odp_queue_group_create() API can create a large number of

>> related queues with a single call.

>> * A single PMR can spread traffic to many queues associated with the

>> same CoS by assigning packets matching the PMR to a queue group rather

>> than a queue.

>> * A hashed PMR subfield is used to distribute individual queues within

>> a queue group for scheduling purposes.

>>

>>

>> diff --git a/include/odp/api/spec/classification.h

>> b/include/odp/api/spec/classification.h

>> index 6eca9ab..cf56852 100644

>> --- a/include/odp/api/spec/classification.h

>> +++ b/include/odp/api/spec/classification.h

>> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>>

>> /** A Boolean to denote support of PMR range */

>> odp_bool_t pmr_range_supported;

>> +

>> + /** A Boolean to denote support of queue group */

>> + odp_bool_t queue_group_supported;

>

>

> To date we've not introduced optional APIs into ODP so I'm not sure if we'd

> want to start here. If we are adding queue groups, all ODP implementations

> should be expected to support queue groups, so this flag shouldn't be

> needed. Limits on the support (e.g., max number of queue groups supported,

> etc.) are appropriate, but there shouldn't be an option to not support them

> at all.

>

>>

>> +

>> + /** A Boolean to denote support of queue */

>> + odp_bool_t queue_supported;

>

>

> Not sure what the intent is here. Is this anticipating that some

> implementations might only support sending flows to queue groups and not to

> individual queues? That would be a serious functional regression and not

> something we'd want to encourage.


The idea here is that some implementations might have limitations
where it could support either queue or a queue group attached to a CoS
object but not both. This limitation is only for queue created within
a queue group and not general odp_queue_t.

This could be use-ful when some low end hardware does not support
hashing after classification in the HW and in that case supporting
queue group which requires hashing might be costly.

The reason being that the internal handles of a packet flow might have
to be re-aligned so that they can be exposed as either queue or queue
group. So this capability is useful if an implementation supports only
queue groups and not queue or vice-versa. I did not see much harm in
the above boolean since it is added only in the capability struct.

>

>>

>> } odp_cls_capability_t;

>>

>>

>> /**

>> @@ -162,7 +168,18 @@ typedef enum {

>>  * Used to communicate class of service creation options

>>  */

>> typedef struct odp_cls_cos_param {

>> - odp_queue_t queue; /**< Queue associated with CoS */

>> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>

>

> Perhaps not the best choice of discriminator names. Perhaps

> ODP_COS_TYPE_QUEUE and ODP_COS_TYPE_GROUP might be simpler?

>

>>

>> + */

>> + odp_queue_type_e type;

>

>

> We already have an odp_queue_type_t defined for the queue APIs so this name

> would be confusingly similar. We're really identifying what the type of the

> CoS is so perhaps odp_cos_type_t might be better here? That would be

> consistent with the ODP_QUEUE_TYPE_PLAIN and ODP_QUEUE_TYPE_SCHED used in

> the odp_queue_type_t enum.


Agreed.
>

>>

>> +

>> + typedef union {

>> + /** Queue associated with CoS */

>> + odp_queue_t queue;

>> +

>> + /** Queue Group associated with CoS */

>> + odp_queue_group_t queue_group;

>> + };

>> odp_pool_t pool; /**< Pool associated with CoS */

>> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

>> } odp_cls_cos_param_t;

>>

>>

>> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

>> index 51d94a2..7dde060 100644

>> --- a/include/odp/api/spec/queue.h

>> +++ b/include/odp/api/spec/queue.h

>> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t

>> *param);

>>

>> +/**

>> + * Queue group capability

>> + * This capability structure defines system Queue Group capability

>> + */

>> +typedef struct odp_queue_group_capability_t {

>> + /** Number of queues supported per queue group */

>> + unsigned supported_queues;

>

>

> We usually specify limits with a max_ prefix, so max_queues would be better

> than supported_queues here.


I used the term supported since NXP wanted to have the number of
minimum queues supported in queue group.
Maybe we can discuss further on this point.

>

>>

>> + /** Supported protocol fields for hashing*/

>> + odp_pktin_hash_proto_t supported;

>

>

> "supported" by itself is unclear. Perhaps hashable_fields might be better?

>

>>

>> +}

>> +

>> +/**

>> + * ODP Queue Group parameters

>> + * Queue group supports only schedule queues <TBD??>

>

>

> I thought we decided that this would be the case since the notion of polling

> the individual queues within a queue group (of which there might be a very

> large number) would seem counterproductive. So I think we can drop the TBD

> here.


If everyone agrees to this I am most happy to drop this TBD.

>

>>

>> + */

>> +typedef struct odp_queue_group_param_t {

>> + /** Number of queue to be created for this queue group

>> + * implementation may round up the value to nearest power of 2

>> + * and value should be less than the number of queues

>> + * supported per queue group

>> + */

>> + unsigned num_queue;

>

>

> We also need to be able to specify the amount of per-queue context data

> required for each queue in the queue group, as well as the

> odp_schedule_param_t to be used for each queue to assign queue priorities,

> schedule groups, etc.


IMO, context-data per queue could be around 8 byte or 64 bits.
I believe the priority would be same for all the queues within a queue group.
>

>>

>> +

>> + /** Protocol field selection for queue group distribution

>> + * Multiple fields can be selected in combination

>> + */

>> + odp_queue_group_hash_proto_t hash;

>> +

>> +} odp_queue_group_param_t;

>> +

>> +/**

>> + * Initialize queue group params

>> + *

>> + * Initialize an odp_queue_group_param_t to its default values for all

>> fields.

>> + *

>> + * @param param   Address of the odp_queue_group_param_t to be

>> initialized

>> + */

>> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>> +

>> +/**

>> + * Queue Group create

>> + *

>> + * Create a queue group according to the queue group parameters.

>> + * The individual queues belonging to a queue group are created by the

>> + * implementation and the distribution of packets into those queues are

>> + * decided based on the odp_queue_group_hash_proto_t parameters.

>> + * The individual queues within a queue group are both created and

>> deleted

>> + * by the implementation.

>> + *

>> + * @param name    Queue Group name

>> + * @param param   Queue Group parameters.

>> + *

>> + * @return Queue group handle

>> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> + */

>> +odp_queue_group_t odp_queue_group_create(const char *name,

>> + const odp_queue_group_param_t *param);

>

>

> We also need odp_queue_group_lookup(), odp_queue_group_capability(), and

> odp_queue_group_destroy() APIs for completeness and symmetry with the queue

> APIs.

>

> In addition, an API that would allow a reference to a particular queue

> within a queue group by index number would perhaps be useful.  As well as an

> API for determining whether a queue is a member of a queue group. For

> example:

>

> odp_queue_t odp_queue_group_queue(odp_queue_group_t qgroup, uint32_t ndx);


Not sure about the use case for getting a specific queue within a queue group.
IMO Packets/Events should not be enqueued directly to a queue within a
queue group since the packets coming to a queue group are spread by
the implementation based on the hash configuration.
Enqueue into individual queue within a queue group might not be needed.

> odp_queue_group_t odp_queue_group(odp_queue_t queue);

>

> These would return ODP_QUEUE_INVALID for out-of-range ndx values or

> ODP_QUEUE_GROUP_INVALID if the queue is not a member of a queue group.


Could be added.
>

>

>>

>> Regards,

>> Bala

>

>
Balasubramanian Manoharan Nov. 10, 2016, 9:47 a.m. UTC | #5
On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:
> On 11/07 16:46:12, Bala Manoharan wrote:

>> Hi,

>

> Hiya

>

>> This mail thread discusses the design of classification queue group

>> RFC. The same can be found in the google doc whose link is given

>> below.

>> Users can provide their comments either in this mail thread or in the

>> google doc as per their convenience.

>>

>> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>>

>> The basic issues with queues as being a single target for a CoS are two fold:

>>

>> Queues must be created and deleted individually. This imposes a

>> significant burden when queues are used to represent individual flows

>> since the application may need to process thousands (or millions) of

>> flows.

>

> Wondering why there is an issue with creating and deleting queues individually

> if queue objects represent millions of flows..


The queue groups are mainly required for hashing the incoming packets
to multiple flows based on the hash configuration.
So from application point of view it just needs a queue to have
packets belonging to same flow and that packets belonging to different
flows are placed in different queues respectively.It does not matter
who creates the flow/queue.

It is actually simpler if implementation creates a flow since in that
case implementation need not accumulate meta-data for all possible
hash values in a queue group and it can be created when traffic
arrives in that particular flow.

>

> Could an application ever call odp_schedule() and receive an event (e.g. packet)

> from a queue (of opaque type odp_queue_t) and that queue has never been created

> by the application (via odp_queue_create())? Could that ever happen from the

> hardware, and could the application ever handle that?


No. All the queues in the system are created by the application either
directly or in-directly.
In-case of queue groups the queues are in-directly created by the
application by configuring a queue group.

>

> Or, is it related to memory usage? The reference implementation

> struct queue_entry_s is 320 bytes on a 64-bit machine.

>

>   2^28 ~= 268,435,456 queues -> 81.920 GB

>   2^26 ~=  67,108,864 queues -> 20.480 GB

>   2^22 ~=   4,194,304 queues ->  1.280 GB

>

> Forget about 320 bytes per queue, if each queue was represented by a 32-bit

> integer (4 bytes!) the usage would be:

>

>   2^28 ~= 268,435,456 queues ->  1.024 GB

>   2^26 ~=  67,108,864 queues ->    256 MB

>   2^22 ~=   4,194,304 queues ->     16 MB

>

> That still might be a lot of usage if the application must explicitly create

> every queue (before it is used) and require an ODP implementation to map

> between every ODP queue object (opaque type) and the internal queue.

>

> Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque

> pointer is used to point to some other software object. This object should be

> larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it

> could just be represented in a 64-bit (or 32-bit) integer type value!

>

> To support millions of queues (flows) should odp_queue_t be an integer type in

> the API? A software-only implementation may still use 320 bytes per queue and

> use that integer as an index into an array or as a key for lookup operation on a

> data structure containing queues. An implementation with hardware assist may

> use this integer value directly when interfacing with hardware!


I believe I have answered this question based on explanation above.
Pls feel free to point out if something is not clear.

>

> Would it still be necessary to assign a "name" to each queue (flow)?


"name" per queue might not be required since it would mean a character
based lookup across millions of items.

>

> Would a queue (flow) also require an "op type" to explicitly specify whether

> access to the queue (flow) is threadsafe? Atomic queues are threadsafe since

> only 1 core at any given time can recieve from it. Parallel queues are also

> threadsafe. Are all ODP APIs threadsafe?


There are two types of queue enqueue operation ODP_QUEUE_OP_MT and
ODP_QUEUE_OP_MT_UNSAFE.
Rest of the ODP APIs are multi thread safe since in ODP there is no
defined way in which a single packet can be given to more than one
core at the same time, as packets move across different modules
through queues.

>

>> A single PMR can only match a packet to a single queue associated with

>> a target CoS. This prohibits efficient capture of subfield

>> classification.

>

> odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible

> to create a single PMR which matches multiple fields of a packet. I can imagine

> a case where a packet matches pmr1 (match Vlan) and also matches pmr2

> (match Vlan AND match L3DestIP). Is that an example of subfield classification?

> How does the queue relate?


This question is related to classification, If a PMR is configured
with more than one odp_pmr_param_t then the PMR is considered a hit
only if the packet matches all the configured params.

Consider the following,

pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

1) Any packet arriving in pktio1 will be assigned to Default_CoS and
will be first applied with PMR1
2) If the packet matches PMR1 it will be delivered to CoS1
3) If the packet does not match PMR1 then it will remain in Default_CoS.
4) Any packets arriving in CoS1 will be applied with PMR2, If the
packet matches PMR2 then it will be delivered to CoS2.
5). If the packet does not match PMR2 it will remain in CoS1.


Each CoS will be configured with queue groups.
Based on the final CoS of the packet the hash configuration (RSS) of
the queue group will be applied to the packet and the packet will be
spread across the queues within the queue group.

Hope this clarifies.
Bala

>

>> To solve these issues, Tiger Moth introduces the concept of a queue

>> group. A queue group is an extension to the existing queue

>> specification in a Class of Service.

>>

>> Queue groups solve the classification issues associated with

>> individual queues in three ways:

>>

>> * The odp_queue_group_create() API can create a large number of

>> related queues with a single call.

>

> If the application calls this API, does that mean the ODP implementation

> can create a large number of queues? What happens if the application

> receives an event on a queue that was created by the implmentation--how

> does the application know whether this queue was created by the hardware

> according to the ODP Classification or whether the queue was created by

> the application?

>

>> * A single PMR can spread traffic to many queues associated with the

>> same CoS by assigning packets matching the PMR to a queue group rather

>> than a queue.

>> * A hashed PMR subfield is used to distribute individual queues within

>> a queue group for scheduling purposes.

>

> Is there a way to write a test case for this? Trying to think of what kind of

> packets (traffic distribution) and how those packets would get classified and

> get assigned to queues.

>

>> diff --git a/include/odp/api/spec/classification.h

>> b/include/odp/api/spec/classification.h

>> index 6eca9ab..cf56852 100644

>> --- a/include/odp/api/spec/classification.h

>> +++ b/include/odp/api/spec/classification.h

>> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>>

>> /** A Boolean to denote support of PMR range */

>> odp_bool_t pmr_range_supported;

>> +

>> + /** A Boolean to denote support of queue group */

>> + odp_bool_t queue_group_supported;

>> +

>> + /** A Boolean to denote support of queue */

>> + odp_bool_t queue_supported;

>> } odp_cls_capability_t;

>>

>>

>> /**

>> @@ -162,7 +168,18 @@ typedef enum {

>>  * Used to communicate class of service creation options

>>  */

>> typedef struct odp_cls_cos_param {

>> - odp_queue_t queue; /**< Queue associated with CoS */

>> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>> + */

>> + odp_queue_type_e type;

>> +

>> + typedef union {

>> + /** Queue associated with CoS */

>> + odp_queue_t queue;

>> +

>> + /** Queue Group associated with CoS */

>> + odp_queue_group_t queue_group;

>> + };

>> odp_pool_t pool; /**< Pool associated with CoS */

>> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

>> } odp_cls_cos_param_t;

>>

>>

>> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

>> index 51d94a2..7dde060 100644

>> --- a/include/odp/api/spec/queue.h

>> +++ b/include/odp/api/spec/queue.h

>> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

>>

>> +/**

>> + * Queue group capability

>> + * This capability structure defines system Queue Group capability

>> + */

>> +typedef struct odp_queue_group_capability_t {

>> + /** Number of queues supported per queue group */

>> + unsigned supported_queues;

>> + /** Supported protocol fields for hashing*/

>> + odp_pktin_hash_proto_t supported;

>> +}

>> +

>> +/**

>> + * ODP Queue Group parameters

>> + * Queue group supports only schedule queues <TBD??>

>> + */

>> +typedef struct odp_queue_group_param_t {

>> + /** Number of queue to be created for this queue group

>> + * implementation may round up the value to nearest power of 2

>> + * and value should be less than the number of queues

>> + * supported per queue group

>> + */

>> + unsigned num_queue;

>> +

>> + /** Protocol field selection for queue group distribution

>> + * Multiple fields can be selected in combination

>> + */

>> + odp_queue_group_hash_proto_t hash;

>> +

>> +} odp_queue_group_param_t;

>> +

>> +/**

>> + * Initialize queue group params

>> + *

>> + * Initialize an odp_queue_group_param_t to its default values for all fields.

>> + *

>> + * @param param   Address of the odp_queue_group_param_t to be initialized

>> + */

>> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>> +

>> +/**

>> + * Queue Group create

>> + *

>> + * Create a queue group according to the queue group parameters.

>> + * The individual queues belonging to a queue group are created by the

>> + * implementation and the distribution of packets into those queues are

>> + * decided based on the odp_queue_group_hash_proto_t parameters.

>> + * The individual queues within a queue group are both created and deleted

>> + * by the implementation.

>> + *

>> + * @param name    Queue Group name

>> + * @param param   Queue Group parameters.

>> + *

>> + * @return Queue group handle

>> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> + */

>> +odp_queue_group_t odp_queue_group_create(const char *name,

>> + const odp_queue_group_param_t *param);

>> Regards,

>> Bala
Mike Holmes Nov. 10, 2016, 6:44 p.m. UTC | #6
On 10 November 2016 at 04:14, Bala Manoharan <bala.manoharan@linaro.org>
wrote:

> Regards,

> Bala

>

>

> On 10 November 2016 at 07:34, Bill Fischofer <bill.fischofer@linaro.org>

> wrote:

> >

> >

> > On Mon, Nov 7, 2016 at 5:16 AM, Bala Manoharan <

> bala.manoharan@linaro.org>

> > wrote:

> >>

> >> Hi,

> >>

> >> This mail thread discusses the design of classification queue group

> >> RFC. The same can be found in the google doc whose link is given

> >> below.

> >> Users can provide their comments either in this mail thread or in the

> >> google doc as per their convenience.

> >>

> >>

> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

> slR93LZ8VXqM2o/edit?usp=sharing

> >>

> >> The basic issues with queues as being a single target for a CoS are two

> >> fold:

> >>

> >> Queues must be created and deleted individually. This imposes a

> >> significant burden when queues are used to represent individual flows

> >> since the application may need to process thousands (or millions) of

> >> flows.

> >> A single PMR can only match a packet to a single queue associated with

> >> a target CoS. This prohibits efficient capture of subfield

> >> classification.

> >> To solve these issues, Tiger Moth introduces the concept of a queue

> >> group. A queue group is an extension to the existing queue

> >> specification in a Class of Service.

> >>

> >> Queue groups solve the classification issues associated with

> >> individual queues in three ways:

> >>

> >> * The odp_queue_group_create() API can create a large number of

> >> related queues with a single call.

> >> * A single PMR can spread traffic to many queues associated with the

> >> same CoS by assigning packets matching the PMR to a queue group rather

> >> than a queue.

> >> * A hashed PMR subfield is used to distribute individual queues within

> >> a queue group for scheduling purposes.

> >>

> >>

> >> diff --git a/include/odp/api/spec/classification.h

> >> b/include/odp/api/spec/classification.h

> >> index 6eca9ab..cf56852 100644

> >> --- a/include/odp/api/spec/classification.h

> >> +++ b/include/odp/api/spec/classification.h

> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> >>

> >> /** A Boolean to denote support of PMR range */

> >> odp_bool_t pmr_range_supported;

> >> +

> >> + /** A Boolean to denote support of queue group */

> >> + odp_bool_t queue_group_supported;

> >

> >

> > To date we've not introduced optional APIs into ODP so I'm not sure if

> we'd

> > want to start here. If we are adding queue groups, all ODP

> implementations

> > should be expected to support queue groups, so this flag shouldn't be

> > needed. Limits on the support (e.g., max number of queue groups

> supported,

> > etc.) are appropriate, but there shouldn't be an option to not support

> them

> > at all.

> >

> >>

> >> +

> >> + /** A Boolean to denote support of queue */

> >> + odp_bool_t queue_supported;

> >

> >

> > Not sure what the intent is here. Is this anticipating that some

> > implementations might only support sending flows to queue groups and not

> to

> > individual queues? That would be a serious functional regression and not

> > something we'd want to encourage.

>

> The idea here is that some implementations might have limitations

> where it could support either queue or a queue group attached to a CoS

> object but not both. This limitation is only for queue created within

> a queue group and not general odp_queue_t.

>

> This could be use-ful when some low end hardware does not support

> hashing after classification in the HW and in that case supporting

> queue group which requires hashing might be costly.

>


I am wary of "could be usefull", unless we know it is a use case, lets fix
that case when we have it especially if it makes things simpler now



>

> The reason being that the internal handles of a packet flow might have

> to be re-aligned so that they can be exposed as either queue or queue

> group. So this capability is useful if an implementation supports only

> queue groups and not queue or vice-versa. I did not see much harm in

> the above boolean since it is added only in the capability struct.

>

> >

> >>

> >> } odp_cls_capability_t;

> >>

> >>

> >> /**

> >> @@ -162,7 +168,18 @@ typedef enum {

> >>  * Used to communicate class of service creation options

> >>  */

> >> typedef struct odp_cls_cos_param {

> >> - odp_queue_t queue; /**< Queue associated with CoS */

> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

> >

> >

> > Perhaps not the best choice of discriminator names. Perhaps

> > ODP_COS_TYPE_QUEUE and ODP_COS_TYPE_GROUP might be simpler?

> >

> >>

> >> + */

> >> + odp_queue_type_e type;

> >

> >

> > We already have an odp_queue_type_t defined for the queue APIs so this

> name

> > would be confusingly similar. We're really identifying what the type of

> the

> > CoS is so perhaps odp_cos_type_t might be better here? That would be

> > consistent with the ODP_QUEUE_TYPE_PLAIN and ODP_QUEUE_TYPE_SCHED used in

> > the odp_queue_type_t enum.

>

> Agreed.

> >

> >>

> >> +

> >> + typedef union {

> >> + /** Queue associated with CoS */

> >> + odp_queue_t queue;

> >> +

> >> + /** Queue Group associated with CoS */

> >> + odp_queue_group_t queue_group;

> >> + };

> >> odp_pool_t pool; /**< Pool associated with CoS */

> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> >> } odp_cls_cos_param_t;

> >>

> >>

> >> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

> >> index 51d94a2..7dde060 100644

> >> --- a/include/odp/api/spec/queue.h

> >> +++ b/include/odp/api/spec/queue.h

> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> >> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t

> >> *param);

> >>

> >> +/**

> >> + * Queue group capability

> >> + * This capability structure defines system Queue Group capability

> >> + */

> >> +typedef struct odp_queue_group_capability_t {

> >> + /** Number of queues supported per queue group */

> >> + unsigned supported_queues;

> >

> >

> > We usually specify limits with a max_ prefix, so max_queues would be

> better

> > than supported_queues here.

>

> I used the term supported since NXP wanted to have the number of

> minimum queues supported in queue group.

> Maybe we can discuss further on this point.

>

> >

> >>

> >> + /** Supported protocol fields for hashing*/

> >> + odp_pktin_hash_proto_t supported;

> >

> >

> > "supported" by itself is unclear. Perhaps hashable_fields might be

> better?

> >

> >>

> >> +}

> >> +

> >> +/**

> >> + * ODP Queue Group parameters

> >> + * Queue group supports only schedule queues <TBD??>

> >

> >

> > I thought we decided that this would be the case since the notion of

> polling

> > the individual queues within a queue group (of which there might be a

> very

> > large number) would seem counterproductive. So I think we can drop the

> TBD

> > here.

>

> If everyone agrees to this I am most happy to drop this TBD.

>

> >

> >>

> >> + */

> >> +typedef struct odp_queue_group_param_t {

> >> + /** Number of queue to be created for this queue group

> >> + * implementation may round up the value to nearest power of 2

> >> + * and value should be less than the number of queues

> >> + * supported per queue group

> >> + */

> >> + unsigned num_queue;

> >

> >

> > We also need to be able to specify the amount of per-queue context data

> > required for each queue in the queue group, as well as the

> > odp_schedule_param_t to be used for each queue to assign queue

> priorities,

> > schedule groups, etc.

>

> IMO, context-data per queue could be around 8 byte or 64 bits.

> I believe the priority would be same for all the queues within a queue

> group.

> >

> >>

> >> +

> >> + /** Protocol field selection for queue group distribution

> >> + * Multiple fields can be selected in combination

> >> + */

> >> + odp_queue_group_hash_proto_t hash;

> >> +

> >> +} odp_queue_group_param_t;

> >> +

> >> +/**

> >> + * Initialize queue group params

> >> + *

> >> + * Initialize an odp_queue_group_param_t to its default values for all

> >> fields.

> >> + *

> >> + * @param param   Address of the odp_queue_group_param_t to be

> >> initialized

> >> + */

> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> >> +

> >> +/**

> >> + * Queue Group create

> >> + *

> >> + * Create a queue group according to the queue group parameters.

> >> + * The individual queues belonging to a queue group are created by the

> >> + * implementation and the distribution of packets into those queues are

> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

> >> + * The individual queues within a queue group are both created and

> >> deleted

> >> + * by the implementation.

> >> + *

> >> + * @param name    Queue Group name

> >> + * @param param   Queue Group parameters.

> >> + *

> >> + * @return Queue group handle

> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> >> + */

> >> +odp_queue_group_t odp_queue_group_create(const char *name,

> >> + const odp_queue_group_param_t *param);

> >

> >

> > We also need odp_queue_group_lookup(), odp_queue_group_capability(), and

> > odp_queue_group_destroy() APIs for completeness and symmetry with the

> queue

> > APIs.

> >

> > In addition, an API that would allow a reference to a particular queue

> > within a queue group by index number would perhaps be useful.  As well

> as an

> > API for determining whether a queue is a member of a queue group. For

> > example:

> >

> > odp_queue_t odp_queue_group_queue(odp_queue_group_t qgroup, uint32_t

> ndx);

>

> Not sure about the use case for getting a specific queue within a queue

> group.

> IMO Packets/Events should not be enqueued directly to a queue within a

> queue group since the packets coming to a queue group are spread by

> the implementation based on the hash configuration.

> Enqueue into individual queue within a queue group might not be needed.

>

> > odp_queue_group_t odp_queue_group(odp_queue_t queue);

> >

> > These would return ODP_QUEUE_INVALID for out-of-range ndx values or

> > ODP_QUEUE_GROUP_INVALID if the queue is not a member of a queue group.

>

> Could be added.

> >

> >

> >>

> >> Regards,

> >> Bala

> >

> >

>




-- 
Mike Holmes
Program Manager - Linaro Networking Group
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs
"Work should be fun and collaborative, the rest follows"
Brian Brooks Nov. 11, 2016, 7:56 a.m. UTC | #7
On 11/10 15:17:15, Bala Manoharan wrote:
> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

> > On 11/07 16:46:12, Bala Manoharan wrote:

> >> Hi,

> >

> > Hiya

> >

> >> This mail thread discusses the design of classification queue group

> >> RFC. The same can be found in the google doc whose link is given

> >> below.

> >> Users can provide their comments either in this mail thread or in the

> >> google doc as per their convenience.

> >>

> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

> >>

> >> The basic issues with queues as being a single target for a CoS are two fold:

> >>

> >> Queues must be created and deleted individually. This imposes a

> >> significant burden when queues are used to represent individual flows

> >> since the application may need to process thousands (or millions) of

> >> flows.

> >

> > Wondering why there is an issue with creating and deleting queues individually

> > if queue objects represent millions of flows..

> 

> The queue groups are mainly required for hashing the incoming packets

> to multiple flows based on the hash configuration.

> So from application point of view it just needs a queue to have

> packets belonging to same flow and that packets belonging to different

> flows are placed in different queues respectively.It does not matter

> who creates the flow/queue.


When the application receives an event from odp_schedule() call, how does it
know whether the odp_queue_t was previously created by the application from
odp_queue_create() or whether it was created by the implementation?

> It is actually simpler if implementation creates a flow since in that

> case implementation need not accumulate meta-data for all possible

> hash values in a queue group and it can be created when traffic

> arrives in that particular flow.

> 

> >

> > Could an application ever call odp_schedule() and receive an event (e.g. packet)

> > from a queue (of opaque type odp_queue_t) and that queue has never been created

> > by the application (via odp_queue_create())? Could that ever happen from the

> > hardware, and could the application ever handle that?

> 

> No. All the queues in the system are created by the application either

> directly or in-directly.

> In-case of queue groups the queues are in-directly created by the

> application by configuring a queue group.

> 

> > Or, is it related to memory usage? The reference implementation

> > struct queue_entry_s is 320 bytes on a 64-bit machine.

> >

> >   2^28 ~= 268,435,456 queues -> 81.920 GB

> >   2^26 ~=  67,108,864 queues -> 20.480 GB

> >   2^22 ~=   4,194,304 queues ->  1.280 GB

> >

> > Forget about 320 bytes per queue, if each queue was represented by a 32-bit

> > integer (4 bytes!) the usage would be:

> >

> >   2^28 ~= 268,435,456 queues ->  1.024 GB

> >   2^26 ~=  67,108,864 queues ->    256 MB

> >   2^22 ~=   4,194,304 queues ->     16 MB

> >

> > That still might be a lot of usage if the application must explicitly create

> > every queue (before it is used) and require an ODP implementation to map

> > between every ODP queue object (opaque type) and the internal queue.

> >

> > Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque

> > pointer is used to point to some other software object. This object should be

> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it

> > could just be represented in a 64-bit (or 32-bit) integer type value!

> >

> > To support millions of queues (flows) should odp_queue_t be an integer type in

> > the API? A software-only implementation may still use 320 bytes per queue and

> > use that integer as an index into an array or as a key for lookup operation on a

> > data structure containing queues. An implementation with hardware assist may

> > use this integer value directly when interfacing with hardware!

> 

> I believe I have answered this question based on explanation above.

> Pls feel free to point out if something is not clear.

> 

> >

> > Would it still be necessary to assign a "name" to each queue (flow)?

> 

> "name" per queue might not be required since it would mean a character

> based lookup across millions of items.

> 

> >

> > Would a queue (flow) also require an "op type" to explicitly specify whether

> > access to the queue (flow) is threadsafe? Atomic queues are threadsafe since

> > only 1 core at any given time can recieve from it. Parallel queues are also

> > threadsafe. Are all ODP APIs threadsafe?

> 

> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

> ODP_QUEUE_OP_MT_UNSAFE.

> Rest of the ODP APIs are multi thread safe since in ODP there is no

> defined way in which a single packet can be given to more than one

> core at the same time, as packets move across different modules

> through queues.

> 

> >

> >> A single PMR can only match a packet to a single queue associated with

> >> a target CoS. This prohibits efficient capture of subfield

> >> classification.

> >

> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible

> > to create a single PMR which matches multiple fields of a packet. I can imagine

> > a case where a packet matches pmr1 (match Vlan) and also matches pmr2

> > (match Vlan AND match L3DestIP). Is that an example of subfield classification?

> > How does the queue relate?

> 

> This question is related to classification, If a PMR is configured

> with more than one odp_pmr_param_t then the PMR is considered a hit

> only if the packet matches all the configured params.

> 

> Consider the following,

> 

> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

> 

> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

> will be first applied with PMR1

> 2) If the packet matches PMR1 it will be delivered to CoS1

> 3) If the packet does not match PMR1 then it will remain in Default_CoS.

> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

> packet matches PMR2 then it will be delivered to CoS2.

> 5). If the packet does not match PMR2 it will remain in CoS1.

> 

> 

> Each CoS will be configured with queue groups.

> Based on the final CoS of the packet the hash configuration (RSS) of

> the queue group will be applied to the packet and the packet will be

> spread across the queues within the queue group.


Got it. So Classification PMR CoS happens entirely before Queue Groups.
And with Queue Groups it allows a single PMR to match a packet and assign
that packet to 1 out of Many queues instead of just 1 queue only.

> Hope this clarifies.

> Bala

> 

> >

> >> To solve these issues, Tiger Moth introduces the concept of a queue

> >> group. A queue group is an extension to the existing queue

> >> specification in a Class of Service.

> >>

> >> Queue groups solve the classification issues associated with

> >> individual queues in three ways:

> >>

> >> * The odp_queue_group_create() API can create a large number of

> >> related queues with a single call.

> >

> > If the application calls this API, does that mean the ODP implementation

> > can create a large number of queues? What happens if the application

> > receives an event on a queue that was created by the implmentation--how

> > does the application know whether this queue was created by the hardware

> > according to the ODP Classification or whether the queue was created by

> > the application?

> >

> >> * A single PMR can spread traffic to many queues associated with the

> >> same CoS by assigning packets matching the PMR to a queue group rather

> >> than a queue.

> >> * A hashed PMR subfield is used to distribute individual queues within

> >> a queue group for scheduling purposes.

> >

> > Is there a way to write a test case for this? Trying to think of what kind of

> > packets (traffic distribution) and how those packets would get classified and

> > get assigned to queues.

> >

> >> diff --git a/include/odp/api/spec/classification.h

> >> b/include/odp/api/spec/classification.h

> >> index 6eca9ab..cf56852 100644

> >> --- a/include/odp/api/spec/classification.h

> >> +++ b/include/odp/api/spec/classification.h

> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> >>

> >> /** A Boolean to denote support of PMR range */

> >> odp_bool_t pmr_range_supported;

> >> +

> >> + /** A Boolean to denote support of queue group */

> >> + odp_bool_t queue_group_supported;

> >> +

> >> + /** A Boolean to denote support of queue */

> >> + odp_bool_t queue_supported;

> >> } odp_cls_capability_t;

> >>

> >>

> >> /**

> >> @@ -162,7 +168,18 @@ typedef enum {

> >>  * Used to communicate class of service creation options

> >>  */

> >> typedef struct odp_cls_cos_param {

> >> - odp_queue_t queue; /**< Queue associated with CoS */

> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

> >> + */

> >> + odp_queue_type_e type;

> >> +

> >> + typedef union {

> >> + /** Queue associated with CoS */

> >> + odp_queue_t queue;

> >> +

> >> + /** Queue Group associated with CoS */

> >> + odp_queue_group_t queue_group;

> >> + };

> >> odp_pool_t pool; /**< Pool associated with CoS */

> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> >> } odp_cls_cos_param_t;

> >>

> >>

> >> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

> >> index 51d94a2..7dde060 100644

> >> --- a/include/odp/api/spec/queue.h

> >> +++ b/include/odp/api/spec/queue.h

> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> >> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

> >>

> >> +/**

> >> + * Queue group capability

> >> + * This capability structure defines system Queue Group capability

> >> + */

> >> +typedef struct odp_queue_group_capability_t {

> >> + /** Number of queues supported per queue group */

> >> + unsigned supported_queues;

> >> + /** Supported protocol fields for hashing*/

> >> + odp_pktin_hash_proto_t supported;

> >> +}

> >> +

> >> +/**

> >> + * ODP Queue Group parameters

> >> + * Queue group supports only schedule queues <TBD??>

> >> + */

> >> +typedef struct odp_queue_group_param_t {

> >> + /** Number of queue to be created for this queue group

> >> + * implementation may round up the value to nearest power of 2


Wondering what this means for obtaining the max number of queues
supported by the system via odp_queue_capability()..

powers of 2..

If the platform supports 2^16 (65,536) queues, odp_queue_capability()
max_queues should report 65,536 queues, right?

If an odp_queue_group_t is created requesting 2^4 (16) queues, should
odp_queue_capability() now return (65,536 - 16) 65520 queues or
(2^12) 4096 queues?

Could there be a dramatic effect on the total number of queues when
many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t
created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits
used and effective number of queues is (16+16+16+16) 64 queues.

Is it be possible to flexibly utilize all 2^16 queues the platform
supports regardless of whether the queue was created by the implementation
or explicitly created by the application?

If so, is there a way to store this extra bit of information--whether
a queue was created by the implementation or the application?
One of the 16 bits might work.
But, this reduces the number of queues to (2^15) 32768.
..at least they are fully utilizable by both implementation and application.

When the application receives an odp_event_t from odp_queue_t after
a call to odp_schedule(), could the application call..
odp_queue_domain() to check whether this odp_queue_t was created by
the implementation or the application? Function returns that bit.

If the queue domain is implementation, could it be an event
(newly arrived packet) that came through Classification PMR CoS (CPC)?
The packet is assigned to a odp_queue_t (flow) (created by the implementation)
as defined by the CPC that was setup by the application.
Might want efficient access to packet metadata which was populated
as an effect of the packet passing through CPC stage.

If the queue domain is application, could it be an event
(crypto compl, or any synchronization point against ipblock or
device over PCI bus that indicates some assist/acceleration work
has finished) comes from a odp_queue_t previously created by the
application via a call to odp_queue_create() (which sets that bit)?
This queue would be any queue (not necessarily a packet 'flow')
created by the data plane software (application).

> >> + * and value should be less than the number of queues

> >> + * supported per queue group

> >> + */

> >> + unsigned num_queue;

> >> +

> >> + /** Protocol field selection for queue group distribution

> >> + * Multiple fields can be selected in combination

> >> + */

> >> + odp_queue_group_hash_proto_t hash;

> >> +

> >> +} odp_queue_group_param_t;

> >> +

> >> +/**

> >> + * Initialize queue group params

> >> + *

> >> + * Initialize an odp_queue_group_param_t to its default values for all fields.

> >> + *

> >> + * @param param   Address of the odp_queue_group_param_t to be initialized

> >> + */

> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> >> +

> >> +/**

> >> + * Queue Group create

> >> + *

> >> + * Create a queue group according to the queue group parameters.

> >> + * The individual queues belonging to a queue group are created by the

> >> + * implementation and the distribution of packets into those queues are

> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

> >> + * The individual queues within a queue group are both created and deleted

> >> + * by the implementation.

> >> + *

> >> + * @param name    Queue Group name

> >> + * @param param   Queue Group parameters.

> >> + *

> >> + * @return Queue group handle

> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> >> + */

> >> +odp_queue_group_t odp_queue_group_create(const char *name,

> >> + const odp_queue_group_param_t *param);

> >> Regards,

> >> Bala
Balasubramanian Manoharan Nov. 14, 2016, 8:12 a.m. UTC | #8
Regards,
Bala


On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:
> On 11/10 15:17:15, Bala Manoharan wrote:

>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

>> > On 11/07 16:46:12, Bala Manoharan wrote:

>> >> Hi,

>> >

>> > Hiya

>> >

>> >> This mail thread discusses the design of classification queue group

>> >> RFC. The same can be found in the google doc whose link is given

>> >> below.

>> >> Users can provide their comments either in this mail thread or in the

>> >> google doc as per their convenience.

>> >>

>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>> >>

>> >> The basic issues with queues as being a single target for a CoS are two fold:

>> >>

>> >> Queues must be created and deleted individually. This imposes a

>> >> significant burden when queues are used to represent individual flows

>> >> since the application may need to process thousands (or millions) of

>> >> flows.

>> >

>> > Wondering why there is an issue with creating and deleting queues individually

>> > if queue objects represent millions of flows..

>>

>> The queue groups are mainly required for hashing the incoming packets

>> to multiple flows based on the hash configuration.

>> So from application point of view it just needs a queue to have

>> packets belonging to same flow and that packets belonging to different

>> flows are placed in different queues respectively.It does not matter

>> who creates the flow/queue.

>

> When the application receives an event from odp_schedule() call, how does it

> know whether the odp_queue_t was previously created by the application from

> odp_queue_create() or whether it was created by the implementation?


odp_schedule() call returns the queue from which the event was part of.
The type of the queue can be got from odp_queue_type() API.
But the question is there an use-case where the application need to know?
The application has the information of the queue it has created and
the queues created by implementation are destroyed by implementation.

>

>> It is actually simpler if implementation creates a flow since in that

>> case implementation need not accumulate meta-data for all possible

>> hash values in a queue group and it can be created when traffic

>> arrives in that particular flow.

>>

>> >

>> > Could an application ever call odp_schedule() and receive an event (e.g. packet)

>> > from a queue (of opaque type odp_queue_t) and that queue has never been created

>> > by the application (via odp_queue_create())? Could that ever happen from the

>> > hardware, and could the application ever handle that?

>>

>> No. All the queues in the system are created by the application either

>> directly or in-directly.

>> In-case of queue groups the queues are in-directly created by the

>> application by configuring a queue group.

>>

>> > Or, is it related to memory usage? The reference implementation

>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>> >

>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>> >

>> > Forget about 320 bytes per queue, if each queue was represented by a 32-bit

>> > integer (4 bytes!) the usage would be:

>> >

>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>> >   2^26 ~=  67,108,864 queues ->    256 MB

>> >   2^22 ~=   4,194,304 queues ->     16 MB

>> >

>> > That still might be a lot of usage if the application must explicitly create

>> > every queue (before it is used) and require an ODP implementation to map

>> > between every ODP queue object (opaque type) and the internal queue.

>> >

>> > Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque

>> > pointer is used to point to some other software object. This object should be

>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it

>> > could just be represented in a 64-bit (or 32-bit) integer type value!

>> >

>> > To support millions of queues (flows) should odp_queue_t be an integer type in

>> > the API? A software-only implementation may still use 320 bytes per queue and

>> > use that integer as an index into an array or as a key for lookup operation on a

>> > data structure containing queues. An implementation with hardware assist may

>> > use this integer value directly when interfacing with hardware!

>>

>> I believe I have answered this question based on explanation above.

>> Pls feel free to point out if something is not clear.

>>

>> >

>> > Would it still be necessary to assign a "name" to each queue (flow)?

>>

>> "name" per queue might not be required since it would mean a character

>> based lookup across millions of items.

>>

>> >

>> > Would a queue (flow) also require an "op type" to explicitly specify whether

>> > access to the queue (flow) is threadsafe? Atomic queues are threadsafe since

>> > only 1 core at any given time can recieve from it. Parallel queues are also

>> > threadsafe. Are all ODP APIs threadsafe?

>>

>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

>> ODP_QUEUE_OP_MT_UNSAFE.

>> Rest of the ODP APIs are multi thread safe since in ODP there is no

>> defined way in which a single packet can be given to more than one

>> core at the same time, as packets move across different modules

>> through queues.

>>

>> >

>> >> A single PMR can only match a packet to a single queue associated with

>> >> a target CoS. This prohibits efficient capture of subfield

>> >> classification.

>> >

>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible

>> > to create a single PMR which matches multiple fields of a packet. I can imagine

>> > a case where a packet matches pmr1 (match Vlan) and also matches pmr2

>> > (match Vlan AND match L3DestIP). Is that an example of subfield classification?

>> > How does the queue relate?

>>

>> This question is related to classification, If a PMR is configured

>> with more than one odp_pmr_param_t then the PMR is considered a hit

>> only if the packet matches all the configured params.

>>

>> Consider the following,

>>

>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>>

>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

>> will be first applied with PMR1

>> 2) If the packet matches PMR1 it will be delivered to CoS1

>> 3) If the packet does not match PMR1 then it will remain in Default_CoS.

>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

>> packet matches PMR2 then it will be delivered to CoS2.

>> 5). If the packet does not match PMR2 it will remain in CoS1.

>>

>>

>> Each CoS will be configured with queue groups.

>> Based on the final CoS of the packet the hash configuration (RSS) of

>> the queue group will be applied to the packet and the packet will be

>> spread across the queues within the queue group.

>

> Got it. So Classification PMR CoS happens entirely before Queue Groups.

> And with Queue Groups it allows a single PMR to match a packet and assign

> that packet to 1 out of Many queues instead of just 1 queue only.

>

>> Hope this clarifies.

>> Bala

>>

>> >

>> >> To solve these issues, Tiger Moth introduces the concept of a queue

>> >> group. A queue group is an extension to the existing queue

>> >> specification in a Class of Service.

>> >>

>> >> Queue groups solve the classification issues associated with

>> >> individual queues in three ways:

>> >>

>> >> * The odp_queue_group_create() API can create a large number of

>> >> related queues with a single call.

>> >

>> > If the application calls this API, does that mean the ODP implementation

>> > can create a large number of queues? What happens if the application

>> > receives an event on a queue that was created by the implmentation--how

>> > does the application know whether this queue was created by the hardware

>> > according to the ODP Classification or whether the queue was created by

>> > the application?

>> >

>> >> * A single PMR can spread traffic to many queues associated with the

>> >> same CoS by assigning packets matching the PMR to a queue group rather

>> >> than a queue.

>> >> * A hashed PMR subfield is used to distribute individual queues within

>> >> a queue group for scheduling purposes.

>> >

>> > Is there a way to write a test case for this? Trying to think of what kind of

>> > packets (traffic distribution) and how those packets would get classified and

>> > get assigned to queues.

>> >

>> >> diff --git a/include/odp/api/spec/classification.h

>> >> b/include/odp/api/spec/classification.h

>> >> index 6eca9ab..cf56852 100644

>> >> --- a/include/odp/api/spec/classification.h

>> >> +++ b/include/odp/api/spec/classification.h

>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>> >>

>> >> /** A Boolean to denote support of PMR range */

>> >> odp_bool_t pmr_range_supported;

>> >> +

>> >> + /** A Boolean to denote support of queue group */

>> >> + odp_bool_t queue_group_supported;

>> >> +

>> >> + /** A Boolean to denote support of queue */

>> >> + odp_bool_t queue_supported;

>> >> } odp_cls_capability_t;

>> >>

>> >>

>> >> /**

>> >> @@ -162,7 +168,18 @@ typedef enum {

>> >>  * Used to communicate class of service creation options

>> >>  */

>> >> typedef struct odp_cls_cos_param {

>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>> >> + */

>> >> + odp_queue_type_e type;

>> >> +

>> >> + typedef union {

>> >> + /** Queue associated with CoS */

>> >> + odp_queue_t queue;

>> >> +

>> >> + /** Queue Group associated with CoS */

>> >> + odp_queue_group_t queue_group;

>> >> + };

>> >> odp_pool_t pool; /**< Pool associated with CoS */

>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

>> >> } odp_cls_cos_param_t;

>> >>

>> >>

>> >> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

>> >> index 51d94a2..7dde060 100644

>> >> --- a/include/odp/api/spec/queue.h

>> >> +++ b/include/odp/api/spec/queue.h

>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> >> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

>> >>

>> >> +/**

>> >> + * Queue group capability

>> >> + * This capability structure defines system Queue Group capability

>> >> + */

>> >> +typedef struct odp_queue_group_capability_t {

>> >> + /** Number of queues supported per queue group */

>> >> + unsigned supported_queues;

>> >> + /** Supported protocol fields for hashing*/

>> >> + odp_pktin_hash_proto_t supported;

>> >> +}

>> >> +

>> >> +/**

>> >> + * ODP Queue Group parameters

>> >> + * Queue group supports only schedule queues <TBD??>

>> >> + */

>> >> +typedef struct odp_queue_group_param_t {

>> >> + /** Number of queue to be created for this queue group

>> >> + * implementation may round up the value to nearest power of 2

>

> Wondering what this means for obtaining the max number of queues

> supported by the system via odp_queue_capability()..

>

> powers of 2..

>

> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

> max_queues should report 65,536 queues, right?

>

> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

> odp_queue_capability() now return (65,536 - 16) 65520 queues or

> (2^12) 4096 queues?


odp_queue_capability() is called before creating the creating the queue group,
so if an implementation has the limitation that it can only support
2^16 queues then application
has to configure only 2^16 queues in the queue group.

>

> Could there be a dramatic effect on the total number of queues when

> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

> used and effective number of queues is (16+16+16+16) 64 queues.

>

> Is it be possible to flexibly utilize all 2^16 queues the platform

> supports regardless of whether the queue was created by the implementation

> or explicitly created by the application?


This limitation is per queue group and there can be a limitation of
total queue group in the system.
Usually the total queue group supported would be a limited number.

>

> If so, is there a way to store this extra bit of information--whether

> a queue was created by the implementation or the application?

> One of the 16 bits might work.

> But, this reduces the number of queues to (2^15) 32768.

> ..at least they are fully utilizable by both implementation and application.


There are different queue types and we can ODP_QUEUE_GROUP_T as a new
type to differentiate
a queue created by odp_queue_create() and using queue group create function.

During destroying of the resources, the application destroys the
queues created by application and
implementation destroys the queues within a queue group when
application destroys queue group.

>

> When the application receives an odp_event_t from odp_queue_t after

> a call to odp_schedule(), could the application call..

> odp_queue_domain() to check whether this odp_queue_t was created by

> the implementation or the application? Function returns that bit.


Since we have a queue type which can be got using the function
odp_queue_type_t I think this
odp_queue_domain() API is not needed.

>

> If the queue domain is implementation, could it be an event

> (newly arrived packet) that came through Classification PMR CoS (CPC)?

> The packet is assigned to a odp_queue_t (flow) (created by the implementation)

> as defined by the CPC that was setup by the application.

> Might want efficient access to packet metadata which was populated

> as an effect of the packet passing through CPC stage.

>

> If the queue domain is application, could it be an event

> (crypto compl, or any synchronization point against ipblock or

> device over PCI bus that indicates some assist/acceleration work

> has finished) comes from a odp_queue_t previously created by the

> application via a call to odp_queue_create() (which sets that bit)?

> This queue would be any queue (not necessarily a packet 'flow')

> created by the data plane software (application).


We already have a queue type and event type which differentiates the
events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow
queues can be
created only using HW since it is mainly useful for spreading the
packets across multiple flows.

-Bala
>

>> >> + * and value should be less than the number of queues

>> >> + * supported per queue group

>> >> + */

>> >> + unsigned num_queue;

>> >> +

>> >> + /** Protocol field selection for queue group distribution

>> >> + * Multiple fields can be selected in combination

>> >> + */

>> >> + odp_queue_group_hash_proto_t hash;

>> >> +

>> >> +} odp_queue_group_param_t;

>> >> +

>> >> +/**

>> >> + * Initialize queue group params

>> >> + *

>> >> + * Initialize an odp_queue_group_param_t to its default values for all fields.

>> >> + *

>> >> + * @param param   Address of the odp_queue_group_param_t to be initialized

>> >> + */

>> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>> >> +

>> >> +/**

>> >> + * Queue Group create

>> >> + *

>> >> + * Create a queue group according to the queue group parameters.

>> >> + * The individual queues belonging to a queue group are created by the

>> >> + * implementation and the distribution of packets into those queues are

>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

>> >> + * The individual queues within a queue group are both created and deleted

>> >> + * by the implementation.

>> >> + *

>> >> + * @param name    Queue Group name

>> >> + * @param param   Queue Group parameters.

>> >> + *

>> >> + * @return Queue group handle

>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> >> + */

>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>> >> + const odp_queue_group_param_t *param);

>> >> Regards,

>> >> Bala
Brian Brooks Nov. 15, 2016, 5:13 p.m. UTC | #9
On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan
<bala.manoharan@linaro.org> wrote:
> Regards,

> Bala

>

>

> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

>> On 11/10 15:17:15, Bala Manoharan wrote:

>>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

>>> > On 11/07 16:46:12, Bala Manoharan wrote:

>>> >> Hi,

>>> >

>>> > Hiya

>>> >

>>> >> This mail thread discusses the design of classification queue group

>>> >> RFC. The same can be found in the google doc whose link is given

>>> >> below.

>>> >> Users can provide their comments either in this mail thread or in the

>>> >> google doc as per their convenience.

>>> >>

>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>>> >>

>>> >> The basic issues with queues as being a single target for a CoS are two fold:

>>> >>

>>> >> Queues must be created and deleted individually. This imposes a

>>> >> significant burden when queues are used to represent individual flows

>>> >> since the application may need to process thousands (or millions) of

>>> >> flows.

>>> >

>>> > Wondering why there is an issue with creating and deleting queues individually

>>> > if queue objects represent millions of flows..

>>>

>>> The queue groups are mainly required for hashing the incoming packets

>>> to multiple flows based on the hash configuration.

>>> So from application point of view it just needs a queue to have

>>> packets belonging to same flow and that packets belonging to different

>>> flows are placed in different queues respectively.It does not matter

>>> who creates the flow/queue.

>>

>> When the application receives an event from odp_schedule() call, how does it

>> know whether the odp_queue_t was previously created by the application from

>> odp_queue_create() or whether it was created by the implementation?

>

> odp_schedule() call returns the queue from which the event was part of.

> The type of the queue can be got from odp_queue_type() API.

> But the question is there an use-case where the application need to know?

> The application has the information of the queue it has created and

> the queues created by implementation are destroyed by implementation.


If certain fields of the packet are hashed to a queue handle, and this queue
handle has not previously been created via odp_queue_create(), there might
be a use case where the application needs to be aware of a new "flow"..
Maybe the application ages flows.

>

>>

>>> It is actually simpler if implementation creates a flow since in that

>>> case implementation need not accumulate meta-data for all possible

>>> hash values in a queue group and it can be created when traffic

>>> arrives in that particular flow.

>>>

>>> >

>>> > Could an application ever call odp_schedule() and receive an event (e.g. packet)

>>> > from a queue (of opaque type odp_queue_t) and that queue has never been created

>>> > by the application (via odp_queue_create())? Could that ever happen from the

>>> > hardware, and could the application ever handle that?

>>>

>>> No. All the queues in the system are created by the application either

>>> directly or in-directly.

>>> In-case of queue groups the queues are in-directly created by the

>>> application by configuring a queue group.

>>>

>>> > Or, is it related to memory usage? The reference implementation

>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>>> >

>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>>> >

>>> > Forget about 320 bytes per queue, if each queue was represented by a 32-bit

>>> > integer (4 bytes!) the usage would be:

>>> >

>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>>> >   2^26 ~=  67,108,864 queues ->    256 MB

>>> >   2^22 ~=   4,194,304 queues ->     16 MB

>>> >

>>> > That still might be a lot of usage if the application must explicitly create

>>> > every queue (before it is used) and require an ODP implementation to map

>>> > between every ODP queue object (opaque type) and the internal queue.

>>> >

>>> > Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque

>>> > pointer is used to point to some other software object. This object should be

>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it

>>> > could just be represented in a 64-bit (or 32-bit) integer type value!

>>> >

>>> > To support millions of queues (flows) should odp_queue_t be an integer type in

>>> > the API? A software-only implementation may still use 320 bytes per queue and

>>> > use that integer as an index into an array or as a key for lookup operation on a

>>> > data structure containing queues. An implementation with hardware assist may

>>> > use this integer value directly when interfacing with hardware!

>>>

>>> I believe I have answered this question based on explanation above.

>>> Pls feel free to point out if something is not clear.

>>>

>>> >

>>> > Would it still be necessary to assign a "name" to each queue (flow)?

>>>

>>> "name" per queue might not be required since it would mean a character

>>> based lookup across millions of items.

>>>

>>> >

>>> > Would a queue (flow) also require an "op type" to explicitly specify whether

>>> > access to the queue (flow) is threadsafe? Atomic queues are threadsafe since

>>> > only 1 core at any given time can recieve from it. Parallel queues are also

>>> > threadsafe. Are all ODP APIs threadsafe?

>>>

>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

>>> ODP_QUEUE_OP_MT_UNSAFE.

>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

>>> defined way in which a single packet can be given to more than one

>>> core at the same time, as packets move across different modules

>>> through queues.

>>>

>>> >

>>> >> A single PMR can only match a packet to a single queue associated with

>>> >> a target CoS. This prohibits efficient capture of subfield

>>> >> classification.

>>> >

>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible

>>> > to create a single PMR which matches multiple fields of a packet. I can imagine

>>> > a case where a packet matches pmr1 (match Vlan) and also matches pmr2

>>> > (match Vlan AND match L3DestIP). Is that an example of subfield classification?

>>> > How does the queue relate?

>>>

>>> This question is related to classification, If a PMR is configured

>>> with more than one odp_pmr_param_t then the PMR is considered a hit

>>> only if the packet matches all the configured params.

>>>

>>> Consider the following,

>>>

>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>>>

>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

>>> will be first applied with PMR1

>>> 2) If the packet matches PMR1 it will be delivered to CoS1

>>> 3) If the packet does not match PMR1 then it will remain in Default_CoS.

>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

>>> packet matches PMR2 then it will be delivered to CoS2.

>>> 5). If the packet does not match PMR2 it will remain in CoS1.

>>>

>>>

>>> Each CoS will be configured with queue groups.

>>> Based on the final CoS of the packet the hash configuration (RSS) of

>>> the queue group will be applied to the packet and the packet will be

>>> spread across the queues within the queue group.

>>

>> Got it. So Classification PMR CoS happens entirely before Queue Groups.

>> And with Queue Groups it allows a single PMR to match a packet and assign

>> that packet to 1 out of Many queues instead of just 1 queue only.

>>

>>> Hope this clarifies.

>>> Bala

>>>

>>> >

>>> >> To solve these issues, Tiger Moth introduces the concept of a queue

>>> >> group. A queue group is an extension to the existing queue

>>> >> specification in a Class of Service.

>>> >>

>>> >> Queue groups solve the classification issues associated with

>>> >> individual queues in three ways:

>>> >>

>>> >> * The odp_queue_group_create() API can create a large number of

>>> >> related queues with a single call.

>>> >

>>> > If the application calls this API, does that mean the ODP implementation

>>> > can create a large number of queues? What happens if the application

>>> > receives an event on a queue that was created by the implmentation--how

>>> > does the application know whether this queue was created by the hardware

>>> > according to the ODP Classification or whether the queue was created by

>>> > the application?

>>> >

>>> >> * A single PMR can spread traffic to many queues associated with the

>>> >> same CoS by assigning packets matching the PMR to a queue group rather

>>> >> than a queue.

>>> >> * A hashed PMR subfield is used to distribute individual queues within

>>> >> a queue group for scheduling purposes.

>>> >

>>> > Is there a way to write a test case for this? Trying to think of what kind of

>>> > packets (traffic distribution) and how those packets would get classified and

>>> > get assigned to queues.

>>> >

>>> >> diff --git a/include/odp/api/spec/classification.h

>>> >> b/include/odp/api/spec/classification.h

>>> >> index 6eca9ab..cf56852 100644

>>> >> --- a/include/odp/api/spec/classification.h

>>> >> +++ b/include/odp/api/spec/classification.h

>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>>> >>

>>> >> /** A Boolean to denote support of PMR range */

>>> >> odp_bool_t pmr_range_supported;

>>> >> +

>>> >> + /** A Boolean to denote support of queue group */

>>> >> + odp_bool_t queue_group_supported;

>>> >> +

>>> >> + /** A Boolean to denote support of queue */

>>> >> + odp_bool_t queue_supported;

>>> >> } odp_cls_capability_t;

>>> >>

>>> >>

>>> >> /**

>>> >> @@ -162,7 +168,18 @@ typedef enum {

>>> >>  * Used to communicate class of service creation options

>>> >>  */

>>> >> typedef struct odp_cls_cos_param {

>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>>> >> + */

>>> >> + odp_queue_type_e type;

>>> >> +

>>> >> + typedef union {

>>> >> + /** Queue associated with CoS */

>>> >> + odp_queue_t queue;

>>> >> +

>>> >> + /** Queue Group associated with CoS */

>>> >> + odp_queue_group_t queue_group;

>>> >> + };

>>> >> odp_pool_t pool; /**< Pool associated with CoS */

>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

>>> >> } odp_cls_cos_param_t;

>>> >>

>>> >>

>>> >> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

>>> >> index 51d94a2..7dde060 100644

>>> >> --- a/include/odp/api/spec/queue.h

>>> >> +++ b/include/odp/api/spec/queue.h

>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>>> >> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

>>> >>

>>> >> +/**

>>> >> + * Queue group capability

>>> >> + * This capability structure defines system Queue Group capability

>>> >> + */

>>> >> +typedef struct odp_queue_group_capability_t {

>>> >> + /** Number of queues supported per queue group */

>>> >> + unsigned supported_queues;

>>> >> + /** Supported protocol fields for hashing*/

>>> >> + odp_pktin_hash_proto_t supported;

>>> >> +}

>>> >> +

>>> >> +/**

>>> >> + * ODP Queue Group parameters

>>> >> + * Queue group supports only schedule queues <TBD??>

>>> >> + */

>>> >> +typedef struct odp_queue_group_param_t {

>>> >> + /** Number of queue to be created for this queue group

>>> >> + * implementation may round up the value to nearest power of 2

>>

>> Wondering what this means for obtaining the max number of queues

>> supported by the system via odp_queue_capability()..

>>

>> powers of 2..

>>

>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

>> max_queues should report 65,536 queues, right?

>>

>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

>> (2^12) 4096 queues?

>

> odp_queue_capability() is called before creating the creating the queue group,

> so if an implementation has the limitation that it can only support

> 2^16 queues then application

> has to configure only 2^16 queues in the queue group.


In this use case, wouldn't all queues then be reserved for creation
by the implementation? And, now odp_queue_create() will always return
the null handle?

What happens if you do:
1. odp_queue_capability() -> 2^16 queues
2. odp_queue_group_create( 2^4 queues )
3. odp_queue_capability() -> ???

>>

>> Could there be a dramatic effect on the total number of queues when

>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

>> used and effective number of queues is (16+16+16+16) 64 queues.

>>

>> Is it be possible to flexibly utilize all 2^16 queues the platform

>> supports regardless of whether the queue was created by the implementation

>> or explicitly created by the application?

>

> This limitation is per queue group and there can be a limitation of

> total queue group in the system.

> Usually the total queue group supported would be a limited number.

>

>>

>> If so, is there a way to store this extra bit of information--whether

>> a queue was created by the implementation or the application?

>> One of the 16 bits might work.

>> But, this reduces the number of queues to (2^15) 32768.

>> ..at least they are fully utilizable by both implementation and application.

>

> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

> type to differentiate

> a queue created by odp_queue_create() and using queue group create function.

>

> During destroying of the resources, the application destroys the

> queues created by application and

> implementation destroys the queues within a queue group when

> application destroys queue group.

>

>>

>> When the application receives an odp_event_t from odp_queue_t after

>> a call to odp_schedule(), could the application call..

>> odp_queue_domain() to check whether this odp_queue_t was created by

>> the implementation or the application? Function returns that bit.

>

> Since we have a queue type which can be got using the function

> odp_queue_type_t I think this

> odp_queue_domain() API is not needed.


Petri pointed out this week that a packet_io's destination queue may also
be another case that could use a queue_group.
Perhaps what is needed is a way to connect blocks (ODP objects)
together like legos using something other than an odp_queue_t --
because what flows through these blocks are events from one
(and now more!) odp_queue_t.  Whether the queue was created by
the implementation or application is a separate concern.

>>

>> If the queue domain is implementation, could it be an event

>> (newly arrived packet) that came through Classification PMR CoS (CPC)?

>> The packet is assigned to a odp_queue_t (flow) (created by the implementation)

>> as defined by the CPC that was setup by the application.

>> Might want efficient access to packet metadata which was populated

>> as an effect of the packet passing through CPC stage.

>>

>> If the queue domain is application, could it be an event

>> (crypto compl, or any synchronization point against ipblock or

>> device over PCI bus that indicates some assist/acceleration work

>> has finished) comes from a odp_queue_t previously created by the

>> application via a call to odp_queue_create() (which sets that bit)?

>> This queue would be any queue (not necessarily a packet 'flow')

>> created by the data plane software (application).

>

> We already have a queue type and event type which differentiates the

> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

> queues can be

> created only using HW since it is mainly useful for spreading the

> packets across multiple flows.

>

> -Bala

>>

>>> >> + * and value should be less than the number of queues

>>> >> + * supported per queue group

>>> >> + */

>>> >> + unsigned num_queue;

>>> >> +

>>> >> + /** Protocol field selection for queue group distribution

>>> >> + * Multiple fields can be selected in combination

>>> >> + */

>>> >> + odp_queue_group_hash_proto_t hash;

>>> >> +

>>> >> +} odp_queue_group_param_t;

>>> >> +

>>> >> +/**

>>> >> + * Initialize queue group params

>>> >> + *

>>> >> + * Initialize an odp_queue_group_param_t to its default values for all fields.

>>> >> + *

>>> >> + * @param param   Address of the odp_queue_group_param_t to be initialized

>>> >> + */

>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>>> >> +

>>> >> +/**

>>> >> + * Queue Group create

>>> >> + *

>>> >> + * Create a queue group according to the queue group parameters.

>>> >> + * The individual queues belonging to a queue group are created by the

>>> >> + * implementation and the distribution of packets into those queues are

>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

>>> >> + * The individual queues within a queue group are both created and deleted

>>> >> + * by the implementation.

>>> >> + *

>>> >> + * @param name    Queue Group name

>>> >> + * @param param   Queue Group parameters.

>>> >> + *

>>> >> + * @return Queue group handle

>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>>> >> + */

>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>>> >> + const odp_queue_group_param_t *param);

>>> >> Regards,

>>> >> Bala
Balasubramanian Manoharan Nov. 17, 2016, 9:05 a.m. UTC | #10
Regards,
Bala


On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org> wrote:
> On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

> <bala.manoharan@linaro.org> wrote:

>> Regards,

>> Bala

>>

>>

>> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

>>> On 11/10 15:17:15, Bala Manoharan wrote:

>>>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org> wrote:

>>>> > On 11/07 16:46:12, Bala Manoharan wrote:

>>>> >> Hi,

>>>> >

>>>> > Hiya

>>>> >

>>>> >> This mail thread discusses the design of classification queue group

>>>> >> RFC. The same can be found in the google doc whose link is given

>>>> >> below.

>>>> >> Users can provide their comments either in this mail thread or in the

>>>> >> google doc as per their convenience.

>>>> >>

>>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>>>> >>

>>>> >> The basic issues with queues as being a single target for a CoS are two fold:

>>>> >>

>>>> >> Queues must be created and deleted individually. This imposes a

>>>> >> significant burden when queues are used to represent individual flows

>>>> >> since the application may need to process thousands (or millions) of

>>>> >> flows.

>>>> >

>>>> > Wondering why there is an issue with creating and deleting queues individually

>>>> > if queue objects represent millions of flows..

>>>>

>>>> The queue groups are mainly required for hashing the incoming packets

>>>> to multiple flows based on the hash configuration.

>>>> So from application point of view it just needs a queue to have

>>>> packets belonging to same flow and that packets belonging to different

>>>> flows are placed in different queues respectively.It does not matter

>>>> who creates the flow/queue.

>>>

>>> When the application receives an event from odp_schedule() call, how does it

>>> know whether the odp_queue_t was previously created by the application from

>>> odp_queue_create() or whether it was created by the implementation?

>>

>> odp_schedule() call returns the queue from which the event was part of.

>> The type of the queue can be got from odp_queue_type() API.

>> But the question is there an use-case where the application need to know?

>> The application has the information of the queue it has created and

>> the queues created by implementation are destroyed by implementation.

>

> If certain fields of the packet are hashed to a queue handle, and this queue

> handle has not previously been created via odp_queue_create(), there might

> be a use case where the application needs to be aware of a new "flow"..

> Maybe the application ages flows.


It is very difficult in network traffic to predict the exact flows
which will be coming in an interface.
The application can configure for all the possible flows in that case.

>

>>

>>>

>>>> It is actually simpler if implementation creates a flow since in that

>>>> case implementation need not accumulate meta-data for all possible

>>>> hash values in a queue group and it can be created when traffic

>>>> arrives in that particular flow.

>>>>

>>>> >

>>>> > Could an application ever call odp_schedule() and receive an event (e.g. packet)

>>>> > from a queue (of opaque type odp_queue_t) and that queue has never been created

>>>> > by the application (via odp_queue_create())? Could that ever happen from the

>>>> > hardware, and could the application ever handle that?

>>>>

>>>> No. All the queues in the system are created by the application either

>>>> directly or in-directly.

>>>> In-case of queue groups the queues are in-directly created by the

>>>> application by configuring a queue group.

>>>>

>>>> > Or, is it related to memory usage? The reference implementation

>>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>>>> >

>>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>>>> >

>>>> > Forget about 320 bytes per queue, if each queue was represented by a 32-bit

>>>> > integer (4 bytes!) the usage would be:

>>>> >

>>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>>>> >   2^26 ~=  67,108,864 queues ->    256 MB

>>>> >   2^22 ~=   4,194,304 queues ->     16 MB

>>>> >

>>>> > That still might be a lot of usage if the application must explicitly create

>>>> > every queue (before it is used) and require an ODP implementation to map

>>>> > between every ODP queue object (opaque type) and the internal queue.

>>>> >

>>>> > Lets say ODP API has two classes of handles: 1) pointers, 2) integers. An opaque

>>>> > pointer is used to point to some other software object. This object should be

>>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode) otherwise it

>>>> > could just be represented in a 64-bit (or 32-bit) integer type value!

>>>> >

>>>> > To support millions of queues (flows) should odp_queue_t be an integer type in

>>>> > the API? A software-only implementation may still use 320 bytes per queue and

>>>> > use that integer as an index into an array or as a key for lookup operation on a

>>>> > data structure containing queues. An implementation with hardware assist may

>>>> > use this integer value directly when interfacing with hardware!

>>>>

>>>> I believe I have answered this question based on explanation above.

>>>> Pls feel free to point out if something is not clear.

>>>>

>>>> >

>>>> > Would it still be necessary to assign a "name" to each queue (flow)?

>>>>

>>>> "name" per queue might not be required since it would mean a character

>>>> based lookup across millions of items.

>>>>

>>>> >

>>>> > Would a queue (flow) also require an "op type" to explicitly specify whether

>>>> > access to the queue (flow) is threadsafe? Atomic queues are threadsafe since

>>>> > only 1 core at any given time can recieve from it. Parallel queues are also

>>>> > threadsafe. Are all ODP APIs threadsafe?

>>>>

>>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

>>>> ODP_QUEUE_OP_MT_UNSAFE.

>>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

>>>> defined way in which a single packet can be given to more than one

>>>> core at the same time, as packets move across different modules

>>>> through queues.

>>>>

>>>> >

>>>> >> A single PMR can only match a packet to a single queue associated with

>>>> >> a target CoS. This prohibits efficient capture of subfield

>>>> >> classification.

>>>> >

>>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is possible

>>>> > to create a single PMR which matches multiple fields of a packet. I can imagine

>>>> > a case where a packet matches pmr1 (match Vlan) and also matches pmr2

>>>> > (match Vlan AND match L3DestIP). Is that an example of subfield classification?

>>>> > How does the queue relate?

>>>>

>>>> This question is related to classification, If a PMR is configured

>>>> with more than one odp_pmr_param_t then the PMR is considered a hit

>>>> only if the packet matches all the configured params.

>>>>

>>>> Consider the following,

>>>>

>>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>>>>

>>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

>>>> will be first applied with PMR1

>>>> 2) If the packet matches PMR1 it will be delivered to CoS1

>>>> 3) If the packet does not match PMR1 then it will remain in Default_CoS.

>>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

>>>> packet matches PMR2 then it will be delivered to CoS2.

>>>> 5). If the packet does not match PMR2 it will remain in CoS1.

>>>>

>>>>

>>>> Each CoS will be configured with queue groups.

>>>> Based on the final CoS of the packet the hash configuration (RSS) of

>>>> the queue group will be applied to the packet and the packet will be

>>>> spread across the queues within the queue group.

>>>

>>> Got it. So Classification PMR CoS happens entirely before Queue Groups.

>>> And with Queue Groups it allows a single PMR to match a packet and assign

>>> that packet to 1 out of Many queues instead of just 1 queue only.

>>>

>>>> Hope this clarifies.

>>>> Bala

>>>>

>>>> >

>>>> >> To solve these issues, Tiger Moth introduces the concept of a queue

>>>> >> group. A queue group is an extension to the existing queue

>>>> >> specification in a Class of Service.

>>>> >>

>>>> >> Queue groups solve the classification issues associated with

>>>> >> individual queues in three ways:

>>>> >>

>>>> >> * The odp_queue_group_create() API can create a large number of

>>>> >> related queues with a single call.

>>>> >

>>>> > If the application calls this API, does that mean the ODP implementation

>>>> > can create a large number of queues? What happens if the application

>>>> > receives an event on a queue that was created by the implmentation--how

>>>> > does the application know whether this queue was created by the hardware

>>>> > according to the ODP Classification or whether the queue was created by

>>>> > the application?

>>>> >

>>>> >> * A single PMR can spread traffic to many queues associated with the

>>>> >> same CoS by assigning packets matching the PMR to a queue group rather

>>>> >> than a queue.

>>>> >> * A hashed PMR subfield is used to distribute individual queues within

>>>> >> a queue group for scheduling purposes.

>>>> >

>>>> > Is there a way to write a test case for this? Trying to think of what kind of

>>>> > packets (traffic distribution) and how those packets would get classified and

>>>> > get assigned to queues.

>>>> >

>>>> >> diff --git a/include/odp/api/spec/classification.h

>>>> >> b/include/odp/api/spec/classification.h

>>>> >> index 6eca9ab..cf56852 100644

>>>> >> --- a/include/odp/api/spec/classification.h

>>>> >> +++ b/include/odp/api/spec/classification.h

>>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>>>> >>

>>>> >> /** A Boolean to denote support of PMR range */

>>>> >> odp_bool_t pmr_range_supported;

>>>> >> +

>>>> >> + /** A Boolean to denote support of queue group */

>>>> >> + odp_bool_t queue_group_supported;

>>>> >> +

>>>> >> + /** A Boolean to denote support of queue */

>>>> >> + odp_bool_t queue_supported;

>>>> >> } odp_cls_capability_t;

>>>> >>

>>>> >>

>>>> >> /**

>>>> >> @@ -162,7 +168,18 @@ typedef enum {

>>>> >>  * Used to communicate class of service creation options

>>>> >>  */

>>>> >> typedef struct odp_cls_cos_param {

>>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.

>>>> >> + */

>>>> >> + odp_queue_type_e type;

>>>> >> +

>>>> >> + typedef union {

>>>> >> + /** Queue associated with CoS */

>>>> >> + odp_queue_t queue;

>>>> >> +

>>>> >> + /** Queue Group associated with CoS */

>>>> >> + odp_queue_group_t queue_group;

>>>> >> + };

>>>> >> odp_pool_t pool; /**< Pool associated with CoS */

>>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

>>>> >> } odp_cls_cos_param_t;

>>>> >>

>>>> >>

>>>> >> diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h

>>>> >> index 51d94a2..7dde060 100644

>>>> >> --- a/include/odp/api/spec/queue.h

>>>> >> +++ b/include/odp/api/spec/queue.h

>>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>>>> >> odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

>>>> >>

>>>> >> +/**

>>>> >> + * Queue group capability

>>>> >> + * This capability structure defines system Queue Group capability

>>>> >> + */

>>>> >> +typedef struct odp_queue_group_capability_t {

>>>> >> + /** Number of queues supported per queue group */

>>>> >> + unsigned supported_queues;

>>>> >> + /** Supported protocol fields for hashing*/

>>>> >> + odp_pktin_hash_proto_t supported;

>>>> >> +}

>>>> >> +

>>>> >> +/**

>>>> >> + * ODP Queue Group parameters

>>>> >> + * Queue group supports only schedule queues <TBD??>

>>>> >> + */

>>>> >> +typedef struct odp_queue_group_param_t {

>>>> >> + /** Number of queue to be created for this queue group

>>>> >> + * implementation may round up the value to nearest power of 2

>>>

>>> Wondering what this means for obtaining the max number of queues

>>> supported by the system via odp_queue_capability()..

>>>

>>> powers of 2..

>>>

>>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

>>> max_queues should report 65,536 queues, right?

>>>

>>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

>>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

>>> (2^12) 4096 queues?

>>

>> odp_queue_capability() is called before creating the creating the queue group,

>> so if an implementation has the limitation that it can only support

>> 2^16 queues then application

>> has to configure only 2^16 queues in the queue group.

>

> In this use case, wouldn't all queues then be reserved for creation

> by the implementation? And, now odp_queue_create() will always return

> the null handle?

>

> What happens if you do:

> 1. odp_queue_capability() -> 2^16 queues

> 2. odp_queue_group_create( 2^4 queues )

> 3. odp_queue_capability() -> ???


This is a limit on the number of queue supported by a queue group.
This does not reflect the number of queues created using
odp_queue_create() function.
The implementation updates the maximum number of queues it can support
within a queue group, the application is free to configure any number
less than the maximum supported.

>

>>>

>>> Could there be a dramatic effect on the total number of queues when

>>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

>>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

>>> used and effective number of queues is (16+16+16+16) 64 queues.

>>>

>>> Is it be possible to flexibly utilize all 2^16 queues the platform

>>> supports regardless of whether the queue was created by the implementation

>>> or explicitly created by the application?

>>

>> This limitation is per queue group and there can be a limitation of

>> total queue group in the system.

>> Usually the total queue group supported would be a limited number.

>>

>>>

>>> If so, is there a way to store this extra bit of information--whether

>>> a queue was created by the implementation or the application?

>>> One of the 16 bits might work.

>>> But, this reduces the number of queues to (2^15) 32768.

>>> ..at least they are fully utilizable by both implementation and application.

>>

>> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

>> type to differentiate

>> a queue created by odp_queue_create() and using queue group create function.

>>

>> During destroying of the resources, the application destroys the

>> queues created by application and

>> implementation destroys the queues within a queue group when

>> application destroys queue group.

>>

>>>

>>> When the application receives an odp_event_t from odp_queue_t after

>>> a call to odp_schedule(), could the application call..

>>> odp_queue_domain() to check whether this odp_queue_t was created by

>>> the implementation or the application? Function returns that bit.

>>

>> Since we have a queue type which can be got using the function

>> odp_queue_type_t I think this

>> odp_queue_domain() API is not needed.

>

> Petri pointed out this week that a packet_io's destination queue may also

> be another case that could use a queue_group.

> Perhaps what is needed is a way to connect blocks (ODP objects)

> together like legos using something other than an odp_queue_t --

> because what flows through these blocks are events from one

> (and now more!) odp_queue_t.  Whether the queue was created by

> the implementation or application is a separate concern.

>

>>>

>>> If the queue domain is implementation, could it be an event

>>> (newly arrived packet) that came through Classification PMR CoS (CPC)?

>>> The packet is assigned to a odp_queue_t (flow) (created by the implementation)

>>> as defined by the CPC that was setup by the application.

>>> Might want efficient access to packet metadata which was populated

>>> as an effect of the packet passing through CPC stage.

>>>

>>> If the queue domain is application, could it be an event

>>> (crypto compl, or any synchronization point against ipblock or

>>> device over PCI bus that indicates some assist/acceleration work

>>> has finished) comes from a odp_queue_t previously created by the

>>> application via a call to odp_queue_create() (which sets that bit)?

>>> This queue would be any queue (not necessarily a packet 'flow')

>>> created by the data plane software (application).

>>

>> We already have a queue type and event type which differentiates the

>> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

>> queues can be

>> created only using HW since it is mainly useful for spreading the

>> packets across multiple flows.

>>

>> -Bala

>>>

>>>> >> + * and value should be less than the number of queues

>>>> >> + * supported per queue group

>>>> >> + */

>>>> >> + unsigned num_queue;

>>>> >> +

>>>> >> + /** Protocol field selection for queue group distribution

>>>> >> + * Multiple fields can be selected in combination

>>>> >> + */

>>>> >> + odp_queue_group_hash_proto_t hash;

>>>> >> +

>>>> >> +} odp_queue_group_param_t;

>>>> >> +

>>>> >> +/**

>>>> >> + * Initialize queue group params

>>>> >> + *

>>>> >> + * Initialize an odp_queue_group_param_t to its default values for all fields.

>>>> >> + *

>>>> >> + * @param param   Address of the odp_queue_group_param_t to be initialized

>>>> >> + */

>>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>>>> >> +

>>>> >> +/**

>>>> >> + * Queue Group create

>>>> >> + *

>>>> >> + * Create a queue group according to the queue group parameters.

>>>> >> + * The individual queues belonging to a queue group are created by the

>>>> >> + * implementation and the distribution of packets into those queues are

>>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

>>>> >> + * The individual queues within a queue group are both created and deleted

>>>> >> + * by the implementation.

>>>> >> + *

>>>> >> + * @param name    Queue Group name

>>>> >> + * @param param   Queue Group parameters.

>>>> >> + *

>>>> >> + * @return Queue group handle

>>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>>>> >> + */

>>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>>>> >> + const odp_queue_group_param_t *param);

>>>> >> Regards,

>>>> >> Bala
Bill Fischofer Nov. 17, 2016, 7:59 p.m. UTC | #11
On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan <bala.manoharan@linaro.org>
wrote:

> Regards,

> Bala

>

>

> On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org>

> wrote:

> > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

> > <bala.manoharan@linaro.org> wrote:

> >> Regards,

> >> Bala

> >>

> >>

> >> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

> wrote:

> >>> On 11/10 15:17:15, Bala Manoharan wrote:

> >>>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

> wrote:

> >>>> > On 11/07 16:46:12, Bala Manoharan wrote:

> >>>> >> Hi,

> >>>> >

> >>>> > Hiya

> >>>> >

> >>>> >> This mail thread discusses the design of classification queue group

> >>>> >> RFC. The same can be found in the google doc whose link is given

> >>>> >> below.

> >>>> >> Users can provide their comments either in this mail thread or in

> the

> >>>> >> google doc as per their convenience.

> >>>> >>

> >>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

> slR93LZ8VXqM2o/edit?usp=sharing

> >>>> >>

> >>>> >> The basic issues with queues as being a single target for a CoS

> are two fold:

> >>>> >>

> >>>> >> Queues must be created and deleted individually. This imposes a

> >>>> >> significant burden when queues are used to represent individual

> flows

> >>>> >> since the application may need to process thousands (or millions)

> of

> >>>> >> flows.

> >>>> >

> >>>> > Wondering why there is an issue with creating and deleting queues

> individually

> >>>> > if queue objects represent millions of flows..

> >>>>

> >>>> The queue groups are mainly required for hashing the incoming packets

> >>>> to multiple flows based on the hash configuration.

> >>>> So from application point of view it just needs a queue to have

> >>>> packets belonging to same flow and that packets belonging to different

> >>>> flows are placed in different queues respectively.It does not matter

> >>>> who creates the flow/queue.

> >>>

> >>> When the application receives an event from odp_schedule() call, how

> does it

> >>> know whether the odp_queue_t was previously created by the application

> from

> >>> odp_queue_create() or whether it was created by the implementation?

> >>

> >> odp_schedule() call returns the queue from which the event was part of.

> >> The type of the queue can be got from odp_queue_type() API.

> >> But the question is there an use-case where the application need to

> know?

> >> The application has the information of the queue it has created and

> >> the queues created by implementation are destroyed by implementation.

> >

> > If certain fields of the packet are hashed to a queue handle, and this

> queue

> > handle has not previously been created via odp_queue_create(), there

> might

> > be a use case where the application needs to be aware of a new "flow"..

> > Maybe the application ages flows.

>


I'm not sure I understand the concern being raised here. Packet fields are
matched against PMRs to get a matching CoS. That CoS, in turn, is
associated with with a queue or a queue group. If the latter then specified
subfields within the packet are hashed to generate an index into that queue
group to select the individual queue within the target queue group that is
to receive the packet. Whether these queues have been preallocated at
odp_queue_group_create() time, or allocated dynamically on first reference
is up to the implementation, however especially in the case of "large"
queue groups it can be expected that the number of actual queues in use
will be sparse so a deferred allocation strategy will most likely be used.

Applications are aware of flows because that's what an individual queue
coming out of the classifier represents. An interesting question arises is
if a higher-level protocol (e.g., a TCP FIN sequence) ends a given flow,
meaning that the context represented by an individual queue within a queue
group can be released. Especially in the case of sparse queue groups it
might be worthwhile to have an API that can communicate this flow release
back to the classifier to facilitate queue resource management.


>

> It is very difficult in network traffic to predict the exact flows

> which will be coming in an interface.

> The application can configure for all the possible flows in that case.

>


Not sure what you mean by the application configuring. If the hash is for a
UDP port, for example, then the queue group has 64K (logical) queues
associated with it.  Which of these are active (and hence require
instantiation) depends on the inbound traffic that is received, which may
be unpredictable. But the management of this is an ODP implementation
concern rather than an application concern, unless we extend the API with a
flow release hint as suggested above.


>

> >

> >>

> >>>

> >>>> It is actually simpler if implementation creates a flow since in that

> >>>> case implementation need not accumulate meta-data for all possible

> >>>> hash values in a queue group and it can be created when traffic

> >>>> arrives in that particular flow.

> >>>>

> >>>> >

> >>>> > Could an application ever call odp_schedule() and receive an event

> (e.g. packet)

> >>>> > from a queue (of opaque type odp_queue_t) and that queue has never

> been created

> >>>> > by the application (via odp_queue_create())? Could that ever happen

> from the

> >>>> > hardware, and could the application ever handle that?

> >>>>

> >>>> No. All the queues in the system are created by the application either

> >>>> directly or in-directly.

> >>>> In-case of queue groups the queues are in-directly created by the

> >>>> application by configuring a queue group.

> >>>>

> >>>> > Or, is it related to memory usage? The reference implementation

> >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

> >>>> >

> >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

> >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

> >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

> >>>> >

> >>>> > Forget about 320 bytes per queue, if each queue was represented by

> a 32-bit

> >>>> > integer (4 bytes!) the usage would be:

> >>>> >

> >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

> >>>> >   2^26 ~=  67,108,864 queues ->    256 MB

> >>>> >   2^22 ~=   4,194,304 queues ->     16 MB

> >>>> >

> >>>> > That still might be a lot of usage if the application must

> explicitly create

> >>>> > every queue (before it is used) and require an ODP implementation

> to map

> >>>> > between every ODP queue object (opaque type) and the internal queue.

> >>>> >

> >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)

> integers. An opaque

> >>>> > pointer is used to point to some other software object. This object

> should be

> >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)

> otherwise it

> >>>> > could just be represented in a 64-bit (or 32-bit) integer type

> value!

> >>>> >

> >>>> > To support millions of queues (flows) should odp_queue_t be an

> integer type in

> >>>> > the API? A software-only implementation may still use 320 bytes per

> queue and

> >>>> > use that integer as an index into an array or as a key for lookup

> operation on a

> >>>> > data structure containing queues. An implementation with hardware

> assist may

> >>>> > use this integer value directly when interfacing with hardware!

> >>>>

> >>>> I believe I have answered this question based on explanation above.

> >>>> Pls feel free to point out if something is not clear.

> >>>>

> >>>> >

> >>>> > Would it still be necessary to assign a "name" to each queue (flow)?

> >>>>

> >>>> "name" per queue might not be required since it would mean a character

> >>>> based lookup across millions of items.

> >>>>

> >>>> >

> >>>> > Would a queue (flow) also require an "op type" to explicitly

> specify whether

> >>>> > access to the queue (flow) is threadsafe? Atomic queues are

> threadsafe since

> >>>> > only 1 core at any given time can recieve from it. Parallel queues

> are also

> >>>> > threadsafe. Are all ODP APIs threadsafe?

> >>>>

> >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

> >>>> ODP_QUEUE_OP_MT_UNSAFE.

> >>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

> >>>> defined way in which a single packet can be given to more than one

> >>>> core at the same time, as packets move across different modules

> >>>> through queues.

> >>>>

> >>>> >

> >>>> >> A single PMR can only match a packet to a single queue associated

> with

> >>>> >> a target CoS. This prohibits efficient capture of subfield

> >>>> >> classification.

> >>>> >

> >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is

> possible

> >>>> > to create a single PMR which matches multiple fields of a packet. I

> can imagine

> >>>> > a case where a packet matches pmr1 (match Vlan) and also matches

> pmr2

> >>>> > (match Vlan AND match L3DestIP). Is that an example of subfield

> classification?

> >>>> > How does the queue relate?

> >>>>

> >>>> This question is related to classification, If a PMR is configured

> >>>> with more than one odp_pmr_param_t then the PMR is considered a hit

> >>>> only if the packet matches all the configured params.

> >>>>

> >>>> Consider the following,

> >>>>

> >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

> >>>>

> >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

> >>>> will be first applied with PMR1

> >>>> 2) If the packet matches PMR1 it will be delivered to CoS1

> >>>> 3) If the packet does not match PMR1 then it will remain in

> Default_CoS.

> >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

> >>>> packet matches PMR2 then it will be delivered to CoS2.

> >>>> 5). If the packet does not match PMR2 it will remain in CoS1.

> >>>>

> >>>>

> >>>> Each CoS will be configured with queue groups.

> >>>> Based on the final CoS of the packet the hash configuration (RSS) of

> >>>> the queue group will be applied to the packet and the packet will be

> >>>> spread across the queues within the queue group.

> >>>

> >>> Got it. So Classification PMR CoS happens entirely before Queue Groups.

> >>> And with Queue Groups it allows a single PMR to match a packet and

> assign

> >>> that packet to 1 out of Many queues instead of just 1 queue only.

> >>>

> >>>> Hope this clarifies.

> >>>> Bala

> >>>>

> >>>> >

> >>>> >> To solve these issues, Tiger Moth introduces the concept of a queue

> >>>> >> group. A queue group is an extension to the existing queue

> >>>> >> specification in a Class of Service.

> >>>> >>

> >>>> >> Queue groups solve the classification issues associated with

> >>>> >> individual queues in three ways:

> >>>> >>

> >>>> >> * The odp_queue_group_create() API can create a large number of

> >>>> >> related queues with a single call.

> >>>> >

> >>>> > If the application calls this API, does that mean the ODP

> implementation

> >>>> > can create a large number of queues? What happens if the application

> >>>> > receives an event on a queue that was created by the

> implmentation--how

> >>>> > does the application know whether this queue was created by the

> hardware

> >>>> > according to the ODP Classification or whether the queue was

> created by

> >>>> > the application?

> >>>> >

> >>>> >> * A single PMR can spread traffic to many queues associated with

> the

> >>>> >> same CoS by assigning packets matching the PMR to a queue group

> rather

> >>>> >> than a queue.

> >>>> >> * A hashed PMR subfield is used to distribute individual queues

> within

> >>>> >> a queue group for scheduling purposes.

> >>>> >

> >>>> > Is there a way to write a test case for this? Trying to think of

> what kind of

> >>>> > packets (traffic distribution) and how those packets would get

> classified and

> >>>> > get assigned to queues.

> >>>> >

> >>>> >> diff --git a/include/odp/api/spec/classification.h

> >>>> >> b/include/odp/api/spec/classification.h

> >>>> >> index 6eca9ab..cf56852 100644

> >>>> >> --- a/include/odp/api/spec/classification.h

> >>>> >> +++ b/include/odp/api/spec/classification.h

> >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> >>>> >>

> >>>> >> /** A Boolean to denote support of PMR range */

> >>>> >> odp_bool_t pmr_range_supported;

> >>>> >> +

> >>>> >> + /** A Boolean to denote support of queue group */

> >>>> >> + odp_bool_t queue_group_supported;

> >>>> >> +

> >>>> >> + /** A Boolean to denote support of queue */

> >>>> >> + odp_bool_t queue_supported;

> >>>> >> } odp_cls_capability_t;

> >>>> >>

> >>>> >>

> >>>> >> /**

> >>>> >> @@ -162,7 +168,18 @@ typedef enum {

> >>>> >>  * Used to communicate class of service creation options

> >>>> >>  */

> >>>> >> typedef struct odp_cls_cos_param {

> >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

> >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with

> CoS.

> >>>> >> + */

> >>>> >> + odp_queue_type_e type;

> >>>> >> +

> >>>> >> + typedef union {

> >>>> >> + /** Queue associated with CoS */

> >>>> >> + odp_queue_t queue;

> >>>> >> +

> >>>> >> + /** Queue Group associated with CoS */

> >>>> >> + odp_queue_group_t queue_group;

> >>>> >> + };

> >>>> >> odp_pool_t pool; /**< Pool associated with CoS */

> >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */

> >>>> >> } odp_cls_cos_param_t;

> >>>> >>

> >>>> >>

> >>>> >> diff --git a/include/odp/api/spec/queue.h

> b/include/odp/api/spec/queue.h

> >>>> >> index 51d94a2..7dde060 100644

> >>>> >> --- a/include/odp/api/spec/queue.h

> >>>> >> +++ b/include/odp/api/spec/queue.h

> >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> >>>> >> odp_queue_t odp_queue_create(const char *name, const

> odp_queue_param_t *param);

> >>>> >>

> >>>> >> +/**

> >>>> >> + * Queue group capability

> >>>> >> + * This capability structure defines system Queue Group capability

> >>>> >> + */

> >>>> >> +typedef struct odp_queue_group_capability_t {

> >>>> >> + /** Number of queues supported per queue group */

> >>>> >> + unsigned supported_queues;

> >>>> >> + /** Supported protocol fields for hashing*/

> >>>> >> + odp_pktin_hash_proto_t supported;

> >>>> >> +}

> >>>> >> +

> >>>> >> +/**

> >>>> >> + * ODP Queue Group parameters

> >>>> >> + * Queue group supports only schedule queues <TBD??>

> >>>> >> + */

> >>>> >> +typedef struct odp_queue_group_param_t {

> >>>> >> + /** Number of queue to be created for this queue group

> >>>> >> + * implementation may round up the value to nearest power of 2

> >>>

> >>> Wondering what this means for obtaining the max number of queues

> >>> supported by the system via odp_queue_capability()..

> >>>

> >>> powers of 2..

> >>>

> >>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

> >>> max_queues should report 65,536 queues, right?

> >>>

> >>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

> >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

> >>> (2^12) 4096 queues?

> >>

> >> odp_queue_capability() is called before creating the creating the queue

> group,

> >> so if an implementation has the limitation that it can only support

> >> 2^16 queues then application

> >> has to configure only 2^16 queues in the queue group.

> >

> > In this use case, wouldn't all queues then be reserved for creation

> > by the implementation? And, now odp_queue_create() will always return

> > the null handle?

> >

> > What happens if you do:

> > 1. odp_queue_capability() -> 2^16 queues

> > 2. odp_queue_group_create( 2^4 queues )

> > 3. odp_queue_capability() -> ???

>

> This is a limit on the number of queue supported by a queue group.

> This does not reflect the number of queues created using

> odp_queue_create() function.

> The implementation updates the maximum number of queues it can support

> within a queue group, the application is free to configure any number

> less than the maximum supported.

>


Capabilities in ODP are used to specify implementation limits, not current
allocations. For example, in odp-linux there is currently a limit of 64
pools that can be created. It doesn't matter how many are currently created
as that is simply the system limit on odp_pool_create(). The same would
apply for queues and queue groups. An implementation may be limited to N
queue groups that can contain a maximum of K queues each. Separately the
implementation might have a limit of X total queues it can support. How
these are divided among individual queues or queues that are members of
queue groups should not affect these capability limits, which are static.

When an allocation request is made and an internal limit is exceeded the
allocation request simply fails. The capabilities are there to guide the
application in its allocation requests so that such "surprises" are rare.


>

> >

> >>>

> >>> Could there be a dramatic effect on the total number of queues when

> >>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

> >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

> >>> used and effective number of queues is (16+16+16+16) 64 queues.

> >>>

> >>> Is it be possible to flexibly utilize all 2^16 queues the platform

> >>> supports regardless of whether the queue was created by the

> implementation

> >>> or explicitly created by the application?

> >>

> >> This limitation is per queue group and there can be a limitation of

> >> total queue group in the system.

> >> Usually the total queue group supported would be a limited number.

> >>

> >>>

> >>> If so, is there a way to store this extra bit of information--whether

> >>> a queue was created by the implementation or the application?

> >>> One of the 16 bits might work.

> >>> But, this reduces the number of queues to (2^15) 32768.

> >>> ..at least they are fully utilizable by both implementation and

> application.

> >>

> >> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

> >> type to differentiate

> >> a queue created by odp_queue_create() and using queue group create

> function.

> >>

> >> During destroying of the resources, the application destroys the

> >> queues created by application and

> >> implementation destroys the queues within a queue group when

> >> application destroys queue group.

> >>

> >>>

> >>> When the application receives an odp_event_t from odp_queue_t after

> >>> a call to odp_schedule(), could the application call..

> >>> odp_queue_domain() to check whether this odp_queue_t was created by

> >>> the implementation or the application? Function returns that bit.

> >>

> >> Since we have a queue type which can be got using the function

> >> odp_queue_type_t I think this

> >> odp_queue_domain() API is not needed.

> >

> > Petri pointed out this week that a packet_io's destination queue may also

> > be another case that could use a queue_group.

> > Perhaps what is needed is a way to connect blocks (ODP objects)

> > together like legos using something other than an odp_queue_t --

> > because what flows through these blocks are events from one

> > (and now more!) odp_queue_t.  Whether the queue was created by

> > the implementation or application is a separate concern.

>


One possible extension area similar to this would be link bonding where
multiple pktios are bonded together for increased throughput and/or
failover (depending on whether the bond is active/active or
active/standby). We alluded to this in a recent ARCH call where TM talks to
a single PktIO however that PktIO might represent multiple links in this
case. A more generalized "group" concept might be an easy way to achieve
that here.


> >

> >>>

> >>> If the queue domain is implementation, could it be an event

> >>> (newly arrived packet) that came through Classification PMR CoS (CPC)?

> >>> The packet is assigned to a odp_queue_t (flow) (created by the

> implementation)

> >>> as defined by the CPC that was setup by the application.

> >>> Might want efficient access to packet metadata which was populated

> >>> as an effect of the packet passing through CPC stage.

> >>>

> >>> If the queue domain is application, could it be an event

> >>> (crypto compl, or any synchronization point against ipblock or

> >>> device over PCI bus that indicates some assist/acceleration work

> >>> has finished) comes from a odp_queue_t previously created by the

> >>> application via a call to odp_queue_create() (which sets that bit)?

> >>> This queue would be any queue (not necessarily a packet 'flow')

> >>> created by the data plane software (application).

> >>

> >> We already have a queue type and event type which differentiates the

> >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

> >> queues can be

> >> created only using HW since it is mainly useful for spreading the

> >> packets across multiple flows.

> >>

> >> -Bala

> >>>

> >>>> >> + * and value should be less than the number of queues

> >>>> >> + * supported per queue group

> >>>> >> + */

> >>>> >> + unsigned num_queue;

> >>>> >> +

> >>>> >> + /** Protocol field selection for queue group distribution

> >>>> >> + * Multiple fields can be selected in combination

> >>>> >> + */

> >>>> >> + odp_queue_group_hash_proto_t hash;

> >>>> >> +

> >>>> >> +} odp_queue_group_param_t;

> >>>> >> +

> >>>> >> +/**

> >>>> >> + * Initialize queue group params

> >>>> >> + *

> >>>> >> + * Initialize an odp_queue_group_param_t to its default values

> for all fields.

> >>>> >> + *

> >>>> >> + * @param param   Address of the odp_queue_group_param_t to be

> initialized

> >>>> >> + */

> >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

> >>>> >> +

> >>>> >> +/**

> >>>> >> + * Queue Group create

> >>>> >> + *

> >>>> >> + * Create a queue group according to the queue group parameters.

> >>>> >> + * The individual queues belonging to a queue group are created

> by the

> >>>> >> + * implementation and the distribution of packets into those

> queues are

> >>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

> >>>> >> + * The individual queues within a queue group are both created

> and deleted

> >>>> >> + * by the implementation.

> >>>> >> + *

> >>>> >> + * @param name    Queue Group name

> >>>> >> + * @param param   Queue Group parameters.

> >>>> >> + *

> >>>> >> + * @return Queue group handle

> >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> >>>> >> + */

> >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

> >>>> >> + const odp_queue_group_param_t *param);

> >>>> >> Regards,

> >>>> >> Bala

>
Honnappa Nagarahalli Nov. 18, 2016, 4:35 a.m. UTC | #12
Hi Bala,
   I am trying to catch up on this conversation. I have few questions here,
not sure if they are discussed already.

1) Why is there a need for such large number of queues (Million) in a
system? If the system meets the line rate, the packets (and the flows)
waiting to be processed is small. So, a small number of queues should
suffice, if we want to have to queue per flow. I agree that the system will
receive packets belonging to million flows over a period of time.

2) Are there any thoughts on implementing a software scheduler for
linux-generic which supports large number of queues?

3) What will be the arbitration method (round robin, strict priority etc)
among the queues in the queue group? If it is weighted round robin or
strict priority, how are the priorities assigned to these queues?

4) Is the intent, to propagate the flow ID from classification down the
packet processing pipeline?

Thank you,
Honnappa


On 17 November 2016 at 13:59, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan <bala.manoharan@linaro.org

> >

> wrote:

>

> > Regards,

> > Bala

> >

> >

> > On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org>

> > wrote:

> > > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

> > > <bala.manoharan@linaro.org> wrote:

> > >> Regards,

> > >> Bala

> > >>

> > >>

> > >> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

> > wrote:

> > >>> On 11/10 15:17:15, Bala Manoharan wrote:

> > >>>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org

> >

> > wrote:

> > >>>> > On 11/07 16:46:12, Bala Manoharan wrote:

> > >>>> >> Hi,

> > >>>> >

> > >>>> > Hiya

> > >>>> >

> > >>>> >> This mail thread discusses the design of classification queue

> group

> > >>>> >> RFC. The same can be found in the google doc whose link is given

> > >>>> >> below.

> > >>>> >> Users can provide their comments either in this mail thread or in

> > the

> > >>>> >> google doc as per their convenience.

> > >>>> >>

> > >>>> >> https://docs.google.com/document/d/

> 1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

> > slR93LZ8VXqM2o/edit?usp=sharing

> > >>>> >>

> > >>>> >> The basic issues with queues as being a single target for a CoS

> > are two fold:

> > >>>> >>

> > >>>> >> Queues must be created and deleted individually. This imposes a

> > >>>> >> significant burden when queues are used to represent individual

> > flows

> > >>>> >> since the application may need to process thousands (or millions)

> > of

> > >>>> >> flows.

> > >>>> >

> > >>>> > Wondering why there is an issue with creating and deleting queues

> > individually

> > >>>> > if queue objects represent millions of flows..

> > >>>>

> > >>>> The queue groups are mainly required for hashing the incoming

> packets

> > >>>> to multiple flows based on the hash configuration.

> > >>>> So from application point of view it just needs a queue to have

> > >>>> packets belonging to same flow and that packets belonging to

> different

> > >>>> flows are placed in different queues respectively.It does not matter

> > >>>> who creates the flow/queue.

> > >>>

> > >>> When the application receives an event from odp_schedule() call, how

> > does it

> > >>> know whether the odp_queue_t was previously created by the

> application

> > from

> > >>> odp_queue_create() or whether it was created by the implementation?

> > >>

> > >> odp_schedule() call returns the queue from which the event was part

> of.

> > >> The type of the queue can be got from odp_queue_type() API.

> > >> But the question is there an use-case where the application need to

> > know?

> > >> The application has the information of the queue it has created and

> > >> the queues created by implementation are destroyed by implementation.

> > >

> > > If certain fields of the packet are hashed to a queue handle, and this

> > queue

> > > handle has not previously been created via odp_queue_create(), there

> > might

> > > be a use case where the application needs to be aware of a new "flow"..

> > > Maybe the application ages flows.

> >

>

> I'm not sure I understand the concern being raised here. Packet fields are

> matched against PMRs to get a matching CoS. That CoS, in turn, is

> associated with with a queue or a queue group. If the latter then specified

> subfields within the packet are hashed to generate an index into that queue

> group to select the individual queue within the target queue group that is

> to receive the packet. Whether these queues have been preallocated at

> odp_queue_group_create() time, or allocated dynamically on first reference

> is up to the implementation, however especially in the case of "large"

> queue groups it can be expected that the number of actual queues in use

> will be sparse so a deferred allocation strategy will most likely be used.

>

> Applications are aware of flows because that's what an individual queue

> coming out of the classifier represents. An interesting question arises is

> if a higher-level protocol (e.g., a TCP FIN sequence) ends a given flow,

> meaning that the context represented by an individual queue within a queue

> group can be released. Especially in the case of sparse queue groups it

> might be worthwhile to have an API that can communicate this flow release

> back to the classifier to facilitate queue resource management.

>

>

> >

> > It is very difficult in network traffic to predict the exact flows

> > which will be coming in an interface.

> > The application can configure for all the possible flows in that case.

> >

>

> Not sure what you mean by the application configuring. If the hash is for a

> UDP port, for example, then the queue group has 64K (logical) queues

> associated with it.  Which of these are active (and hence require

> instantiation) depends on the inbound traffic that is received, which may

> be unpredictable. But the management of this is an ODP implementation

> concern rather than an application concern, unless we extend the API with a

> flow release hint as suggested above.

>

>

> >

> > >

> > >>

> > >>>

> > >>>> It is actually simpler if implementation creates a flow since in

> that

> > >>>> case implementation need not accumulate meta-data for all possible

> > >>>> hash values in a queue group and it can be created when traffic

> > >>>> arrives in that particular flow.

> > >>>>

> > >>>> >

> > >>>> > Could an application ever call odp_schedule() and receive an event

> > (e.g. packet)

> > >>>> > from a queue (of opaque type odp_queue_t) and that queue has never

> > been created

> > >>>> > by the application (via odp_queue_create())? Could that ever

> happen

> > from the

> > >>>> > hardware, and could the application ever handle that?

> > >>>>

> > >>>> No. All the queues in the system are created by the application

> either

> > >>>> directly or in-directly.

> > >>>> In-case of queue groups the queues are in-directly created by the

> > >>>> application by configuring a queue group.

> > >>>>

> > >>>> > Or, is it related to memory usage? The reference implementation

> > >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

> > >>>> >

> > >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

> > >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

> > >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

> > >>>> >

> > >>>> > Forget about 320 bytes per queue, if each queue was represented by

> > a 32-bit

> > >>>> > integer (4 bytes!) the usage would be:

> > >>>> >

> > >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

> > >>>> >   2^26 ~=  67,108,864 queues ->    256 MB

> > >>>> >   2^22 ~=   4,194,304 queues ->     16 MB

> > >>>> >

> > >>>> > That still might be a lot of usage if the application must

> > explicitly create

> > >>>> > every queue (before it is used) and require an ODP implementation

> > to map

> > >>>> > between every ODP queue object (opaque type) and the internal

> queue.

> > >>>> >

> > >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)

> > integers. An opaque

> > >>>> > pointer is used to point to some other software object. This

> object

> > should be

> > >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)

> > otherwise it

> > >>>> > could just be represented in a 64-bit (or 32-bit) integer type

> > value!

> > >>>> >

> > >>>> > To support millions of queues (flows) should odp_queue_t be an

> > integer type in

> > >>>> > the API? A software-only implementation may still use 320 bytes

> per

> > queue and

> > >>>> > use that integer as an index into an array or as a key for lookup

> > operation on a

> > >>>> > data structure containing queues. An implementation with hardware

> > assist may

> > >>>> > use this integer value directly when interfacing with hardware!

> > >>>>

> > >>>> I believe I have answered this question based on explanation above.

> > >>>> Pls feel free to point out if something is not clear.

> > >>>>

> > >>>> >

> > >>>> > Would it still be necessary to assign a "name" to each queue

> (flow)?

> > >>>>

> > >>>> "name" per queue might not be required since it would mean a

> character

> > >>>> based lookup across millions of items.

> > >>>>

> > >>>> >

> > >>>> > Would a queue (flow) also require an "op type" to explicitly

> > specify whether

> > >>>> > access to the queue (flow) is threadsafe? Atomic queues are

> > threadsafe since

> > >>>> > only 1 core at any given time can recieve from it. Parallel queues

> > are also

> > >>>> > threadsafe. Are all ODP APIs threadsafe?

> > >>>>

> > >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

> > >>>> ODP_QUEUE_OP_MT_UNSAFE.

> > >>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

> > >>>> defined way in which a single packet can be given to more than one

> > >>>> core at the same time, as packets move across different modules

> > >>>> through queues.

> > >>>>

> > >>>> >

> > >>>> >> A single PMR can only match a packet to a single queue associated

> > with

> > >>>> >> a target CoS. This prohibits efficient capture of subfield

> > >>>> >> classification.

> > >>>> >

> > >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it

> is

> > possible

> > >>>> > to create a single PMR which matches multiple fields of a packet.

> I

> > can imagine

> > >>>> > a case where a packet matches pmr1 (match Vlan) and also matches

> > pmr2

> > >>>> > (match Vlan AND match L3DestIP). Is that an example of subfield

> > classification?

> > >>>> > How does the queue relate?

> > >>>>

> > >>>> This question is related to classification, If a PMR is configured

> > >>>> with more than one odp_pmr_param_t then the PMR is considered a hit

> > >>>> only if the packet matches all the configured params.

> > >>>>

> > >>>> Consider the following,

> > >>>>

> > >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

> > >>>>

> > >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

> > >>>> will be first applied with PMR1

> > >>>> 2) If the packet matches PMR1 it will be delivered to CoS1

> > >>>> 3) If the packet does not match PMR1 then it will remain in

> > Default_CoS.

> > >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

> > >>>> packet matches PMR2 then it will be delivered to CoS2.

> > >>>> 5). If the packet does not match PMR2 it will remain in CoS1.

> > >>>>

> > >>>>

> > >>>> Each CoS will be configured with queue groups.

> > >>>> Based on the final CoS of the packet the hash configuration (RSS) of

> > >>>> the queue group will be applied to the packet and the packet will be

> > >>>> spread across the queues within the queue group.

> > >>>

> > >>> Got it. So Classification PMR CoS happens entirely before Queue

> Groups.

> > >>> And with Queue Groups it allows a single PMR to match a packet and

> > assign

> > >>> that packet to 1 out of Many queues instead of just 1 queue only.

> > >>>

> > >>>> Hope this clarifies.

> > >>>> Bala

> > >>>>

> > >>>> >

> > >>>> >> To solve these issues, Tiger Moth introduces the concept of a

> queue

> > >>>> >> group. A queue group is an extension to the existing queue

> > >>>> >> specification in a Class of Service.

> > >>>> >>

> > >>>> >> Queue groups solve the classification issues associated with

> > >>>> >> individual queues in three ways:

> > >>>> >>

> > >>>> >> * The odp_queue_group_create() API can create a large number of

> > >>>> >> related queues with a single call.

> > >>>> >

> > >>>> > If the application calls this API, does that mean the ODP

> > implementation

> > >>>> > can create a large number of queues? What happens if the

> application

> > >>>> > receives an event on a queue that was created by the

> > implmentation--how

> > >>>> > does the application know whether this queue was created by the

> > hardware

> > >>>> > according to the ODP Classification or whether the queue was

> > created by

> > >>>> > the application?

> > >>>> >

> > >>>> >> * A single PMR can spread traffic to many queues associated with

> > the

> > >>>> >> same CoS by assigning packets matching the PMR to a queue group

> > rather

> > >>>> >> than a queue.

> > >>>> >> * A hashed PMR subfield is used to distribute individual queues

> > within

> > >>>> >> a queue group for scheduling purposes.

> > >>>> >

> > >>>> > Is there a way to write a test case for this? Trying to think of

> > what kind of

> > >>>> > packets (traffic distribution) and how those packets would get

> > classified and

> > >>>> > get assigned to queues.

> > >>>> >

> > >>>> >> diff --git a/include/odp/api/spec/classification.h

> > >>>> >> b/include/odp/api/spec/classification.h

> > >>>> >> index 6eca9ab..cf56852 100644

> > >>>> >> --- a/include/odp/api/spec/classification.h

> > >>>> >> +++ b/include/odp/api/spec/classification.h

> > >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

> > >>>> >>

> > >>>> >> /** A Boolean to denote support of PMR range */

> > >>>> >> odp_bool_t pmr_range_supported;

> > >>>> >> +

> > >>>> >> + /** A Boolean to denote support of queue group */

> > >>>> >> + odp_bool_t queue_group_supported;

> > >>>> >> +

> > >>>> >> + /** A Boolean to denote support of queue */

> > >>>> >> + odp_bool_t queue_supported;

> > >>>> >> } odp_cls_capability_t;

> > >>>> >>

> > >>>> >>

> > >>>> >> /**

> > >>>> >> @@ -162,7 +168,18 @@ typedef enum {

> > >>>> >>  * Used to communicate class of service creation options

> > >>>> >>  */

> > >>>> >> typedef struct odp_cls_cos_param {

> > >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

> > >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

> > >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked

> with

> > CoS.

> > >>>> >> + */

> > >>>> >> + odp_queue_type_e type;

> > >>>> >> +

> > >>>> >> + typedef union {

> > >>>> >> + /** Queue associated with CoS */

> > >>>> >> + odp_queue_t queue;

> > >>>> >> +

> > >>>> >> + /** Queue Group associated with CoS */

> > >>>> >> + odp_queue_group_t queue_group;

> > >>>> >> + };

> > >>>> >> odp_pool_t pool; /**< Pool associated with CoS */

> > >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS

> */

> > >>>> >> } odp_cls_cos_param_t;

> > >>>> >>

> > >>>> >>

> > >>>> >> diff --git a/include/odp/api/spec/queue.h

> > b/include/odp/api/spec/queue.h

> > >>>> >> index 51d94a2..7dde060 100644

> > >>>> >> --- a/include/odp/api/spec/queue.h

> > >>>> >> +++ b/include/odp/api/spec/queue.h

> > >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

> > >>>> >> odp_queue_t odp_queue_create(const char *name, const

> > odp_queue_param_t *param);

> > >>>> >>

> > >>>> >> +/**

> > >>>> >> + * Queue group capability

> > >>>> >> + * This capability structure defines system Queue Group

> capability

> > >>>> >> + */

> > >>>> >> +typedef struct odp_queue_group_capability_t {

> > >>>> >> + /** Number of queues supported per queue group */

> > >>>> >> + unsigned supported_queues;

> > >>>> >> + /** Supported protocol fields for hashing*/

> > >>>> >> + odp_pktin_hash_proto_t supported;

> > >>>> >> +}

> > >>>> >> +

> > >>>> >> +/**

> > >>>> >> + * ODP Queue Group parameters

> > >>>> >> + * Queue group supports only schedule queues <TBD??>

> > >>>> >> + */

> > >>>> >> +typedef struct odp_queue_group_param_t {

> > >>>> >> + /** Number of queue to be created for this queue group

> > >>>> >> + * implementation may round up the value to nearest power of 2

> > >>>

> > >>> Wondering what this means for obtaining the max number of queues

> > >>> supported by the system via odp_queue_capability()..

> > >>>

> > >>> powers of 2..

> > >>>

> > >>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

> > >>> max_queues should report 65,536 queues, right?

> > >>>

> > >>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

> > >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

> > >>> (2^12) 4096 queues?

> > >>

> > >> odp_queue_capability() is called before creating the creating the

> queue

> > group,

> > >> so if an implementation has the limitation that it can only support

> > >> 2^16 queues then application

> > >> has to configure only 2^16 queues in the queue group.

> > >

> > > In this use case, wouldn't all queues then be reserved for creation

> > > by the implementation? And, now odp_queue_create() will always return

> > > the null handle?

> > >

> > > What happens if you do:

> > > 1. odp_queue_capability() -> 2^16 queues

> > > 2. odp_queue_group_create( 2^4 queues )

> > > 3. odp_queue_capability() -> ???

> >

> > This is a limit on the number of queue supported by a queue group.

> > This does not reflect the number of queues created using

> > odp_queue_create() function.

> > The implementation updates the maximum number of queues it can support

> > within a queue group, the application is free to configure any number

> > less than the maximum supported.

> >

>

> Capabilities in ODP are used to specify implementation limits, not current

> allocations. For example, in odp-linux there is currently a limit of 64

> pools that can be created. It doesn't matter how many are currently created

> as that is simply the system limit on odp_pool_create(). The same would

> apply for queues and queue groups. An implementation may be limited to N

> queue groups that can contain a maximum of K queues each. Separately the

> implementation might have a limit of X total queues it can support. How

> these are divided among individual queues or queues that are members of

> queue groups should not affect these capability limits, which are static.

>

> When an allocation request is made and an internal limit is exceeded the

> allocation request simply fails. The capabilities are there to guide the

> application in its allocation requests so that such "surprises" are rare.

>

>

> >

> > >

> > >>>

> > >>> Could there be a dramatic effect on the total number of queues when

> > >>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

> > >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

> > >>> used and effective number of queues is (16+16+16+16) 64 queues.

> > >>>

> > >>> Is it be possible to flexibly utilize all 2^16 queues the platform

> > >>> supports regardless of whether the queue was created by the

> > implementation

> > >>> or explicitly created by the application?

> > >>

> > >> This limitation is per queue group and there can be a limitation of

> > >> total queue group in the system.

> > >> Usually the total queue group supported would be a limited number.

> > >>

> > >>>

> > >>> If so, is there a way to store this extra bit of information--whether

> > >>> a queue was created by the implementation or the application?

> > >>> One of the 16 bits might work.

> > >>> But, this reduces the number of queues to (2^15) 32768.

> > >>> ..at least they are fully utilizable by both implementation and

> > application.

> > >>

> > >> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

> > >> type to differentiate

> > >> a queue created by odp_queue_create() and using queue group create

> > function.

> > >>

> > >> During destroying of the resources, the application destroys the

> > >> queues created by application and

> > >> implementation destroys the queues within a queue group when

> > >> application destroys queue group.

> > >>

> > >>>

> > >>> When the application receives an odp_event_t from odp_queue_t after

> > >>> a call to odp_schedule(), could the application call..

> > >>> odp_queue_domain() to check whether this odp_queue_t was created by

> > >>> the implementation or the application? Function returns that bit.

> > >>

> > >> Since we have a queue type which can be got using the function

> > >> odp_queue_type_t I think this

> > >> odp_queue_domain() API is not needed.

> > >

> > > Petri pointed out this week that a packet_io's destination queue may

> also

> > > be another case that could use a queue_group.

> > > Perhaps what is needed is a way to connect blocks (ODP objects)

> > > together like legos using something other than an odp_queue_t --

> > > because what flows through these blocks are events from one

> > > (and now more!) odp_queue_t.  Whether the queue was created by

> > > the implementation or application is a separate concern.

> >

>

> One possible extension area similar to this would be link bonding where

> multiple pktios are bonded together for increased throughput and/or

> failover (depending on whether the bond is active/active or

> active/standby). We alluded to this in a recent ARCH call where TM talks to

> a single PktIO however that PktIO might represent multiple links in this

> case. A more generalized "group" concept might be an easy way to achieve

> that here.

>

>

> > >

> > >>>

> > >>> If the queue domain is implementation, could it be an event

> > >>> (newly arrived packet) that came through Classification PMR CoS

> (CPC)?

> > >>> The packet is assigned to a odp_queue_t (flow) (created by the

> > implementation)

> > >>> as defined by the CPC that was setup by the application.

> > >>> Might want efficient access to packet metadata which was populated

> > >>> as an effect of the packet passing through CPC stage.

> > >>>

> > >>> If the queue domain is application, could it be an event

> > >>> (crypto compl, or any synchronization point against ipblock or

> > >>> device over PCI bus that indicates some assist/acceleration work

> > >>> has finished) comes from a odp_queue_t previously created by the

> > >>> application via a call to odp_queue_create() (which sets that bit)?

> > >>> This queue would be any queue (not necessarily a packet 'flow')

> > >>> created by the data plane software (application).

> > >>

> > >> We already have a queue type and event type which differentiates the

> > >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

> > >> queues can be

> > >> created only using HW since it is mainly useful for spreading the

> > >> packets across multiple flows.

> > >>

> > >> -Bala

> > >>>

> > >>>> >> + * and value should be less than the number of queues

> > >>>> >> + * supported per queue group

> > >>>> >> + */

> > >>>> >> + unsigned num_queue;

> > >>>> >> +

> > >>>> >> + /** Protocol field selection for queue group distribution

> > >>>> >> + * Multiple fields can be selected in combination

> > >>>> >> + */

> > >>>> >> + odp_queue_group_hash_proto_t hash;

> > >>>> >> +

> > >>>> >> +} odp_queue_group_param_t;

> > >>>> >> +

> > >>>> >> +/**

> > >>>> >> + * Initialize queue group params

> > >>>> >> + *

> > >>>> >> + * Initialize an odp_queue_group_param_t to its default values

> > for all fields.

> > >>>> >> + *

> > >>>> >> + * @param param   Address of the odp_queue_group_param_t to be

> > initialized

> > >>>> >> + */

> > >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t

> *param);

> > >>>> >> +

> > >>>> >> +/**

> > >>>> >> + * Queue Group create

> > >>>> >> + *

> > >>>> >> + * Create a queue group according to the queue group parameters.

> > >>>> >> + * The individual queues belonging to a queue group are created

> > by the

> > >>>> >> + * implementation and the distribution of packets into those

> > queues are

> > >>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

> > >>>> >> + * The individual queues within a queue group are both created

> > and deleted

> > >>>> >> + * by the implementation.

> > >>>> >> + *

> > >>>> >> + * @param name    Queue Group name

> > >>>> >> + * @param param   Queue Group parameters.

> > >>>> >> + *

> > >>>> >> + * @return Queue group handle

> > >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

> > >>>> >> + */

> > >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

> > >>>> >> + const odp_queue_group_param_t *param);

> > >>>> >> Regards,

> > >>>> >> Bala

> >

>
Balasubramanian Manoharan Nov. 18, 2016, 11:54 a.m. UTC | #13
Regards,
Bala


On 18 November 2016 at 10:05, Honnappa Nagarahalli
<honnappa.nagarahalli@linaro.org> wrote:
> Hi Bala,

>    I am trying to catch up on this conversation. I have few questions here,

> not sure if they are discussed already.

>

> 1) Why is there a need for such large number of queues (Million) in a

> system? If the system meets the line rate, the packets (and the flows)

> waiting to be processed is small. So, a small number of queues should

> suffice, if we want to have to queue per flow. I agree that the system will

> receive packets belonging to million flows over a period of time.


Queue group is a mechanism to distribute the incoming network traffic
into multiple flows and this queue group is a refined version of RSS
hashing since in this case the packets belonging to different CoS
could be configured with different hash parameters. The individual
queues created within a queue group indicate a flow.

Regarding the requirement of millions of flows, it is required when
processing atomic queues or critical sections per flow where the
application has to maintain synchronizations of packets belonging to a
particular flow.

This queue group proposal is flexible in the sense that the
implementation exposes the maximum number of queues it can support
within a queue group and the number of queues per queue group is
configured by the application. So if required the application can only
configure a smaller number of queues.

>

> 2) Are there any thoughts on implementing a software scheduler for

> linux-generic which supports large number of queues?


I am not sure if the existing SW scheduler will be able to handle
millions of queues efficiently if not it can be enhanced.

>

> 3) What will be the arbitration method (round robin, strict priority etc)

> among the queues in the queue group? If it is weighted round robin or strict

> priority, how are the priorities assigned to these queues?


The priorities are assigned per queue group and all the queues
belonging to a single queue group have the same priority.
The concept of queue group is mainly to distribute the incoming
traffic to multiple flows for efficient processing by the scheduler.

On the other hand, odp_queue_create() API has priority as a input
parameter value.

>

> 4) Is the intent, to propagate the flow ID from classification down the

> packet processing pipeline?


Yes.

Regards,
Bala
>

> Thank you,

> Honnappa

>

>

> On 17 November 2016 at 13:59, Bill Fischofer <bill.fischofer@linaro.org>

> wrote:

>>

>> On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan

>> <bala.manoharan@linaro.org>

>> wrote:

>>

>> > Regards,

>> > Bala

>> >

>> >

>> > On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org>

>> > wrote:

>> > > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

>> > > <bala.manoharan@linaro.org> wrote:

>> > >> Regards,

>> > >> Bala

>> > >>

>> > >>

>> > >> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

>> > wrote:

>> > >>> On 11/10 15:17:15, Bala Manoharan wrote:

>> > >>>> On 10 November 2016 at 13:26, Brian Brooks

>> > >>>> <brian.brooks@linaro.org>

>> > wrote:

>> > >>>> > On 11/07 16:46:12, Bala Manoharan wrote:

>> > >>>> >> Hi,

>> > >>>> >

>> > >>>> > Hiya

>> > >>>> >

>> > >>>> >> This mail thread discusses the design of classification queue

>> > >>>> >> group

>> > >>>> >> RFC. The same can be found in the google doc whose link is given

>> > >>>> >> below.

>> > >>>> >> Users can provide their comments either in this mail thread or

>> > >>>> >> in

>> > the

>> > >>>> >> google doc as per their convenience.

>> > >>>> >>

>> > >>>> >>

>> > >>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

>> > slR93LZ8VXqM2o/edit?usp=sharing

>> > >>>> >>

>> > >>>> >> The basic issues with queues as being a single target for a CoS

>> > are two fold:

>> > >>>> >>

>> > >>>> >> Queues must be created and deleted individually. This imposes a

>> > >>>> >> significant burden when queues are used to represent individual

>> > flows

>> > >>>> >> since the application may need to process thousands (or

>> > >>>> >> millions)

>> > of

>> > >>>> >> flows.

>> > >>>> >

>> > >>>> > Wondering why there is an issue with creating and deleting queues

>> > individually

>> > >>>> > if queue objects represent millions of flows..

>> > >>>>

>> > >>>> The queue groups are mainly required for hashing the incoming

>> > >>>> packets

>> > >>>> to multiple flows based on the hash configuration.

>> > >>>> So from application point of view it just needs a queue to have

>> > >>>> packets belonging to same flow and that packets belonging to

>> > >>>> different

>> > >>>> flows are placed in different queues respectively.It does not

>> > >>>> matter

>> > >>>> who creates the flow/queue.

>> > >>>

>> > >>> When the application receives an event from odp_schedule() call, how

>> > does it

>> > >>> know whether the odp_queue_t was previously created by the

>> > >>> application

>> > from

>> > >>> odp_queue_create() or whether it was created by the implementation?

>> > >>

>> > >> odp_schedule() call returns the queue from which the event was part

>> > >> of.

>> > >> The type of the queue can be got from odp_queue_type() API.

>> > >> But the question is there an use-case where the application need to

>> > know?

>> > >> The application has the information of the queue it has created and

>> > >> the queues created by implementation are destroyed by implementation.

>> > >

>> > > If certain fields of the packet are hashed to a queue handle, and this

>> > queue

>> > > handle has not previously been created via odp_queue_create(), there

>> > might

>> > > be a use case where the application needs to be aware of a new

>> > > "flow"..

>> > > Maybe the application ages flows.

>> >

>>

>> I'm not sure I understand the concern being raised here. Packet fields are

>> matched against PMRs to get a matching CoS. That CoS, in turn, is

>> associated with with a queue or a queue group. If the latter then

>> specified

>> subfields within the packet are hashed to generate an index into that

>> queue

>> group to select the individual queue within the target queue group that is

>> to receive the packet. Whether these queues have been preallocated at

>> odp_queue_group_create() time, or allocated dynamically on first reference

>> is up to the implementation, however especially in the case of "large"

>> queue groups it can be expected that the number of actual queues in use

>> will be sparse so a deferred allocation strategy will most likely be used.

>>

>> Applications are aware of flows because that's what an individual queue

>> coming out of the classifier represents. An interesting question arises is

>> if a higher-level protocol (e.g., a TCP FIN sequence) ends a given flow,

>> meaning that the context represented by an individual queue within a queue

>> group can be released. Especially in the case of sparse queue groups it

>> might be worthwhile to have an API that can communicate this flow release

>> back to the classifier to facilitate queue resource management.

>>

>>

>> >

>> > It is very difficult in network traffic to predict the exact flows

>> > which will be coming in an interface.

>> > The application can configure for all the possible flows in that case.

>> >

>>

>> Not sure what you mean by the application configuring. If the hash is for

>> a

>> UDP port, for example, then the queue group has 64K (logical) queues

>> associated with it.  Which of these are active (and hence require

>> instantiation) depends on the inbound traffic that is received, which may

>> be unpredictable. But the management of this is an ODP implementation

>> concern rather than an application concern, unless we extend the API with

>> a

>> flow release hint as suggested above.

>>

>>

>> >

>> > >

>> > >>

>> > >>>

>> > >>>> It is actually simpler if implementation creates a flow since in

>> > >>>> that

>> > >>>> case implementation need not accumulate meta-data for all possible

>> > >>>> hash values in a queue group and it can be created when traffic

>> > >>>> arrives in that particular flow.

>> > >>>>

>> > >>>> >

>> > >>>> > Could an application ever call odp_schedule() and receive an

>> > >>>> > event

>> > (e.g. packet)

>> > >>>> > from a queue (of opaque type odp_queue_t) and that queue has

>> > >>>> > never

>> > been created

>> > >>>> > by the application (via odp_queue_create())? Could that ever

>> > >>>> > happen

>> > from the

>> > >>>> > hardware, and could the application ever handle that?

>> > >>>>

>> > >>>> No. All the queues in the system are created by the application

>> > >>>> either

>> > >>>> directly or in-directly.

>> > >>>> In-case of queue groups the queues are in-directly created by the

>> > >>>> application by configuring a queue group.

>> > >>>>

>> > >>>> > Or, is it related to memory usage? The reference implementation

>> > >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>> > >>>> >

>> > >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>> > >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>> > >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>> > >>>> >

>> > >>>> > Forget about 320 bytes per queue, if each queue was represented

>> > >>>> > by

>> > a 32-bit

>> > >>>> > integer (4 bytes!) the usage would be:

>> > >>>> >

>> > >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>> > >>>> >   2^26 ~=  67,108,864 queues ->    256 MB

>> > >>>> >   2^22 ~=   4,194,304 queues ->     16 MB

>> > >>>> >

>> > >>>> > That still might be a lot of usage if the application must

>> > explicitly create

>> > >>>> > every queue (before it is used) and require an ODP implementation

>> > to map

>> > >>>> > between every ODP queue object (opaque type) and the internal

>> > >>>> > queue.

>> > >>>> >

>> > >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)

>> > integers. An opaque

>> > >>>> > pointer is used to point to some other software object. This

>> > >>>> > object

>> > should be

>> > >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)

>> > otherwise it

>> > >>>> > could just be represented in a 64-bit (or 32-bit) integer type

>> > value!

>> > >>>> >

>> > >>>> > To support millions of queues (flows) should odp_queue_t be an

>> > integer type in

>> > >>>> > the API? A software-only implementation may still use 320 bytes

>> > >>>> > per

>> > queue and

>> > >>>> > use that integer as an index into an array or as a key for lookup

>> > operation on a

>> > >>>> > data structure containing queues. An implementation with hardware

>> > assist may

>> > >>>> > use this integer value directly when interfacing with hardware!

>> > >>>>

>> > >>>> I believe I have answered this question based on explanation above.

>> > >>>> Pls feel free to point out if something is not clear.

>> > >>>>

>> > >>>> >

>> > >>>> > Would it still be necessary to assign a "name" to each queue

>> > >>>> > (flow)?

>> > >>>>

>> > >>>> "name" per queue might not be required since it would mean a

>> > >>>> character

>> > >>>> based lookup across millions of items.

>> > >>>>

>> > >>>> >

>> > >>>> > Would a queue (flow) also require an "op type" to explicitly

>> > specify whether

>> > >>>> > access to the queue (flow) is threadsafe? Atomic queues are

>> > threadsafe since

>> > >>>> > only 1 core at any given time can recieve from it. Parallel

>> > >>>> > queues

>> > are also

>> > >>>> > threadsafe. Are all ODP APIs threadsafe?

>> > >>>>

>> > >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

>> > >>>> ODP_QUEUE_OP_MT_UNSAFE.

>> > >>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

>> > >>>> defined way in which a single packet can be given to more than one

>> > >>>> core at the same time, as packets move across different modules

>> > >>>> through queues.

>> > >>>>

>> > >>>> >

>> > >>>> >> A single PMR can only match a packet to a single queue

>> > >>>> >> associated

>> > with

>> > >>>> >> a target CoS. This prohibits efficient capture of subfield

>> > >>>> >> classification.

>> > >>>> >

>> > >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it

>> > >>>> > is

>> > possible

>> > >>>> > to create a single PMR which matches multiple fields of a packet.

>> > >>>> > I

>> > can imagine

>> > >>>> > a case where a packet matches pmr1 (match Vlan) and also matches

>> > pmr2

>> > >>>> > (match Vlan AND match L3DestIP). Is that an example of subfield

>> > classification?

>> > >>>> > How does the queue relate?

>> > >>>>

>> > >>>> This question is related to classification, If a PMR is configured

>> > >>>> with more than one odp_pmr_param_t then the PMR is considered a hit

>> > >>>> only if the packet matches all the configured params.

>> > >>>>

>> > >>>> Consider the following,

>> > >>>>

>> > >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>> > >>>>

>> > >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS

>> > >>>> and

>> > >>>> will be first applied with PMR1

>> > >>>> 2) If the packet matches PMR1 it will be delivered to CoS1

>> > >>>> 3) If the packet does not match PMR1 then it will remain in

>> > Default_CoS.

>> > >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

>> > >>>> packet matches PMR2 then it will be delivered to CoS2.

>> > >>>> 5). If the packet does not match PMR2 it will remain in CoS1.

>> > >>>>

>> > >>>>

>> > >>>> Each CoS will be configured with queue groups.

>> > >>>> Based on the final CoS of the packet the hash configuration (RSS)

>> > >>>> of

>> > >>>> the queue group will be applied to the packet and the packet will

>> > >>>> be

>> > >>>> spread across the queues within the queue group.

>> > >>>

>> > >>> Got it. So Classification PMR CoS happens entirely before Queue

>> > >>> Groups.

>> > >>> And with Queue Groups it allows a single PMR to match a packet and

>> > assign

>> > >>> that packet to 1 out of Many queues instead of just 1 queue only.

>> > >>>

>> > >>>> Hope this clarifies.

>> > >>>> Bala

>> > >>>>

>> > >>>> >

>> > >>>> >> To solve these issues, Tiger Moth introduces the concept of a

>> > >>>> >> queue

>> > >>>> >> group. A queue group is an extension to the existing queue

>> > >>>> >> specification in a Class of Service.

>> > >>>> >>

>> > >>>> >> Queue groups solve the classification issues associated with

>> > >>>> >> individual queues in three ways:

>> > >>>> >>

>> > >>>> >> * The odp_queue_group_create() API can create a large number of

>> > >>>> >> related queues with a single call.

>> > >>>> >

>> > >>>> > If the application calls this API, does that mean the ODP

>> > implementation

>> > >>>> > can create a large number of queues? What happens if the

>> > >>>> > application

>> > >>>> > receives an event on a queue that was created by the

>> > implmentation--how

>> > >>>> > does the application know whether this queue was created by the

>> > hardware

>> > >>>> > according to the ODP Classification or whether the queue was

>> > created by

>> > >>>> > the application?

>> > >>>> >

>> > >>>> >> * A single PMR can spread traffic to many queues associated with

>> > the

>> > >>>> >> same CoS by assigning packets matching the PMR to a queue group

>> > rather

>> > >>>> >> than a queue.

>> > >>>> >> * A hashed PMR subfield is used to distribute individual queues

>> > within

>> > >>>> >> a queue group for scheduling purposes.

>> > >>>> >

>> > >>>> > Is there a way to write a test case for this? Trying to think of

>> > what kind of

>> > >>>> > packets (traffic distribution) and how those packets would get

>> > classified and

>> > >>>> > get assigned to queues.

>> > >>>> >

>> > >>>> >> diff --git a/include/odp/api/spec/classification.h

>> > >>>> >> b/include/odp/api/spec/classification.h

>> > >>>> >> index 6eca9ab..cf56852 100644

>> > >>>> >> --- a/include/odp/api/spec/classification.h

>> > >>>> >> +++ b/include/odp/api/spec/classification.h

>> > >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>> > >>>> >>

>> > >>>> >> /** A Boolean to denote support of PMR range */

>> > >>>> >> odp_bool_t pmr_range_supported;

>> > >>>> >> +

>> > >>>> >> + /** A Boolean to denote support of queue group */

>> > >>>> >> + odp_bool_t queue_group_supported;

>> > >>>> >> +

>> > >>>> >> + /** A Boolean to denote support of queue */

>> > >>>> >> + odp_bool_t queue_supported;

>> > >>>> >> } odp_cls_capability_t;

>> > >>>> >>

>> > >>>> >>

>> > >>>> >> /**

>> > >>>> >> @@ -162,7 +168,18 @@ typedef enum {

>> > >>>> >>  * Used to communicate class of service creation options

>> > >>>> >>  */

>> > >>>> >> typedef struct odp_cls_cos_param {

>> > >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>> > >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> > >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked

>> > >>>> >> with

>> > CoS.

>> > >>>> >> + */

>> > >>>> >> + odp_queue_type_e type;

>> > >>>> >> +

>> > >>>> >> + typedef union {

>> > >>>> >> + /** Queue associated with CoS */

>> > >>>> >> + odp_queue_t queue;

>> > >>>> >> +

>> > >>>> >> + /** Queue Group associated with CoS */

>> > >>>> >> + odp_queue_group_t queue_group;

>> > >>>> >> + };

>> > >>>> >> odp_pool_t pool; /**< Pool associated with CoS */

>> > >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS

>> > >>>> >> */

>> > >>>> >> } odp_cls_cos_param_t;

>> > >>>> >>

>> > >>>> >>

>> > >>>> >> diff --git a/include/odp/api/spec/queue.h

>> > b/include/odp/api/spec/queue.h

>> > >>>> >> index 51d94a2..7dde060 100644

>> > >>>> >> --- a/include/odp/api/spec/queue.h

>> > >>>> >> +++ b/include/odp/api/spec/queue.h

>> > >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> > >>>> >> odp_queue_t odp_queue_create(const char *name, const

>> > odp_queue_param_t *param);

>> > >>>> >>

>> > >>>> >> +/**

>> > >>>> >> + * Queue group capability

>> > >>>> >> + * This capability structure defines system Queue Group

>> > >>>> >> capability

>> > >>>> >> + */

>> > >>>> >> +typedef struct odp_queue_group_capability_t {

>> > >>>> >> + /** Number of queues supported per queue group */

>> > >>>> >> + unsigned supported_queues;

>> > >>>> >> + /** Supported protocol fields for hashing*/

>> > >>>> >> + odp_pktin_hash_proto_t supported;

>> > >>>> >> +}

>> > >>>> >> +

>> > >>>> >> +/**

>> > >>>> >> + * ODP Queue Group parameters

>> > >>>> >> + * Queue group supports only schedule queues <TBD??>

>> > >>>> >> + */

>> > >>>> >> +typedef struct odp_queue_group_param_t {

>> > >>>> >> + /** Number of queue to be created for this queue group

>> > >>>> >> + * implementation may round up the value to nearest power of 2

>> > >>>

>> > >>> Wondering what this means for obtaining the max number of queues

>> > >>> supported by the system via odp_queue_capability()..

>> > >>>

>> > >>> powers of 2..

>> > >>>

>> > >>> If the platform supports 2^16 (65,536) queues,

>> > >>> odp_queue_capability()

>> > >>> max_queues should report 65,536 queues, right?

>> > >>>

>> > >>> If an odp_queue_group_t is created requesting 2^4 (16) queues,

>> > >>> should

>> > >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

>> > >>> (2^12) 4096 queues?

>> > >>

>> > >> odp_queue_capability() is called before creating the creating the

>> > >> queue

>> > group,

>> > >> so if an implementation has the limitation that it can only support

>> > >> 2^16 queues then application

>> > >> has to configure only 2^16 queues in the queue group.

>> > >

>> > > In this use case, wouldn't all queues then be reserved for creation

>> > > by the implementation? And, now odp_queue_create() will always return

>> > > the null handle?

>> > >

>> > > What happens if you do:

>> > > 1. odp_queue_capability() -> 2^16 queues

>> > > 2. odp_queue_group_create( 2^4 queues )

>> > > 3. odp_queue_capability() -> ???

>> >

>> > This is a limit on the number of queue supported by a queue group.

>> > This does not reflect the number of queues created using

>> > odp_queue_create() function.

>> > The implementation updates the maximum number of queues it can support

>> > within a queue group, the application is free to configure any number

>> > less than the maximum supported.

>> >

>>

>> Capabilities in ODP are used to specify implementation limits, not current

>> allocations. For example, in odp-linux there is currently a limit of 64

>> pools that can be created. It doesn't matter how many are currently

>> created

>> as that is simply the system limit on odp_pool_create(). The same would

>> apply for queues and queue groups. An implementation may be limited to N

>> queue groups that can contain a maximum of K queues each. Separately the

>> implementation might have a limit of X total queues it can support. How

>> these are divided among individual queues or queues that are members of

>> queue groups should not affect these capability limits, which are static.

>>

>> When an allocation request is made and an internal limit is exceeded the

>> allocation request simply fails. The capabilities are there to guide the

>> application in its allocation requests so that such "surprises" are rare.

>>

>>

>> >

>> > >

>> > >>>

>> > >>> Could there be a dramatic effect on the total number of queues when

>> > >>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

>> > >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16

>> > >>> bits

>> > >>> used and effective number of queues is (16+16+16+16) 64 queues.

>> > >>>

>> > >>> Is it be possible to flexibly utilize all 2^16 queues the platform

>> > >>> supports regardless of whether the queue was created by the

>> > implementation

>> > >>> or explicitly created by the application?

>> > >>

>> > >> This limitation is per queue group and there can be a limitation of

>> > >> total queue group in the system.

>> > >> Usually the total queue group supported would be a limited number.

>> > >>

>> > >>>

>> > >>> If so, is there a way to store this extra bit of

>> > >>> information--whether

>> > >>> a queue was created by the implementation or the application?

>> > >>> One of the 16 bits might work.

>> > >>> But, this reduces the number of queues to (2^15) 32768.

>> > >>> ..at least they are fully utilizable by both implementation and

>> > application.

>> > >>

>> > >> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

>> > >> type to differentiate

>> > >> a queue created by odp_queue_create() and using queue group create

>> > function.

>> > >>

>> > >> During destroying of the resources, the application destroys the

>> > >> queues created by application and

>> > >> implementation destroys the queues within a queue group when

>> > >> application destroys queue group.

>> > >>

>> > >>>

>> > >>> When the application receives an odp_event_t from odp_queue_t after

>> > >>> a call to odp_schedule(), could the application call..

>> > >>> odp_queue_domain() to check whether this odp_queue_t was created by

>> > >>> the implementation or the application? Function returns that bit.

>> > >>

>> > >> Since we have a queue type which can be got using the function

>> > >> odp_queue_type_t I think this

>> > >> odp_queue_domain() API is not needed.

>> > >

>> > > Petri pointed out this week that a packet_io's destination queue may

>> > > also

>> > > be another case that could use a queue_group.

>> > > Perhaps what is needed is a way to connect blocks (ODP objects)

>> > > together like legos using something other than an odp_queue_t --

>> > > because what flows through these blocks are events from one

>> > > (and now more!) odp_queue_t.  Whether the queue was created by

>> > > the implementation or application is a separate concern.

>> >

>>

>> One possible extension area similar to this would be link bonding where

>> multiple pktios are bonded together for increased throughput and/or

>> failover (depending on whether the bond is active/active or

>> active/standby). We alluded to this in a recent ARCH call where TM talks

>> to

>> a single PktIO however that PktIO might represent multiple links in this

>> case. A more generalized "group" concept might be an easy way to achieve

>> that here.

>>

>>

>> > >

>> > >>>

>> > >>> If the queue domain is implementation, could it be an event

>> > >>> (newly arrived packet) that came through Classification PMR CoS

>> > >>> (CPC)?

>> > >>> The packet is assigned to a odp_queue_t (flow) (created by the

>> > implementation)

>> > >>> as defined by the CPC that was setup by the application.

>> > >>> Might want efficient access to packet metadata which was populated

>> > >>> as an effect of the packet passing through CPC stage.

>> > >>>

>> > >>> If the queue domain is application, could it be an event

>> > >>> (crypto compl, or any synchronization point against ipblock or

>> > >>> device over PCI bus that indicates some assist/acceleration work

>> > >>> has finished) comes from a odp_queue_t previously created by the

>> > >>> application via a call to odp_queue_create() (which sets that bit)?

>> > >>> This queue would be any queue (not necessarily a packet 'flow')

>> > >>> created by the data plane software (application).

>> > >>

>> > >> We already have a queue type and event type which differentiates the

>> > >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

>> > >> queues can be

>> > >> created only using HW since it is mainly useful for spreading the

>> > >> packets across multiple flows.

>> > >>

>> > >> -Bala

>> > >>>

>> > >>>> >> + * and value should be less than the number of queues

>> > >>>> >> + * supported per queue group

>> > >>>> >> + */

>> > >>>> >> + unsigned num_queue;

>> > >>>> >> +

>> > >>>> >> + /** Protocol field selection for queue group distribution

>> > >>>> >> + * Multiple fields can be selected in combination

>> > >>>> >> + */

>> > >>>> >> + odp_queue_group_hash_proto_t hash;

>> > >>>> >> +

>> > >>>> >> +} odp_queue_group_param_t;

>> > >>>> >> +

>> > >>>> >> +/**

>> > >>>> >> + * Initialize queue group params

>> > >>>> >> + *

>> > >>>> >> + * Initialize an odp_queue_group_param_t to its default values

>> > for all fields.

>> > >>>> >> + *

>> > >>>> >> + * @param param   Address of the odp_queue_group_param_t to be

>> > initialized

>> > >>>> >> + */

>> > >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t

>> > >>>> >> *param);

>> > >>>> >> +

>> > >>>> >> +/**

>> > >>>> >> + * Queue Group create

>> > >>>> >> + *

>> > >>>> >> + * Create a queue group according to the queue group

>> > >>>> >> parameters.

>> > >>>> >> + * The individual queues belonging to a queue group are created

>> > by the

>> > >>>> >> + * implementation and the distribution of packets into those

>> > queues are

>> > >>>> >> + * decided based on the odp_queue_group_hash_proto_t

>> > >>>> >> parameters.

>> > >>>> >> + * The individual queues within a queue group are both created

>> > and deleted

>> > >>>> >> + * by the implementation.

>> > >>>> >> + *

>> > >>>> >> + * @param name    Queue Group name

>> > >>>> >> + * @param param   Queue Group parameters.

>> > >>>> >> + *

>> > >>>> >> + * @return Queue group handle

>> > >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> > >>>> >> + */

>> > >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>> > >>>> >> + const odp_queue_group_param_t *param);

>> > >>>> >> Regards,

>> > >>>> >> Bala

>> >

>

>
Balasubramanian Manoharan Nov. 18, 2016, 12:22 p.m. UTC | #14
Regards,
Bala


On 18 November 2016 at 01:29, Bill Fischofer <bill.fischofer@linaro.org> wrote:
>

>

> On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan <bala.manoharan@linaro.org>

> wrote:

>>

>> Regards,

>> Bala

>>

>>

>> On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org>

>> wrote:

>> > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

>> > <bala.manoharan@linaro.org> wrote:

>> >> Regards,

>> >> Bala

>> >>

>> >>

>> >> On 11 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

>> >> wrote:

>> >>> On 11/10 15:17:15, Bala Manoharan wrote:

>> >>>> On 10 November 2016 at 13:26, Brian Brooks <brian.brooks@linaro.org>

>> >>>> wrote:

>> >>>> > On 11/07 16:46:12, Bala Manoharan wrote:

>> >>>> >> Hi,

>> >>>> >

>> >>>> > Hiya

>> >>>> >

>> >>>> >> This mail thread discusses the design of classification queue

>> >>>> >> group

>> >>>> >> RFC. The same can be found in the google doc whose link is given

>> >>>> >> below.

>> >>>> >> Users can provide their comments either in this mail thread or in

>> >>>> >> the

>> >>>> >> google doc as per their convenience.

>> >>>> >>

>> >>>> >>

>> >>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9slR93LZ8VXqM2o/edit?usp=sharing

>> >>>> >>

>> >>>> >> The basic issues with queues as being a single target for a CoS

>> >>>> >> are two fold:

>> >>>> >>

>> >>>> >> Queues must be created and deleted individually. This imposes a

>> >>>> >> significant burden when queues are used to represent individual

>> >>>> >> flows

>> >>>> >> since the application may need to process thousands (or millions)

>> >>>> >> of

>> >>>> >> flows.

>> >>>> >

>> >>>> > Wondering why there is an issue with creating and deleting queues

>> >>>> > individually

>> >>>> > if queue objects represent millions of flows..

>> >>>>

>> >>>> The queue groups are mainly required for hashing the incoming packets

>> >>>> to multiple flows based on the hash configuration.

>> >>>> So from application point of view it just needs a queue to have

>> >>>> packets belonging to same flow and that packets belonging to

>> >>>> different

>> >>>> flows are placed in different queues respectively.It does not matter

>> >>>> who creates the flow/queue.

>> >>>

>> >>> When the application receives an event from odp_schedule() call, how

>> >>> does it

>> >>> know whether the odp_queue_t was previously created by the application

>> >>> from

>> >>> odp_queue_create() or whether it was created by the implementation?

>> >>

>> >> odp_schedule() call returns the queue from which the event was part of.

>> >> The type of the queue can be got from odp_queue_type() API.

>> >> But the question is there an use-case where the application need to

>> >> know?

>> >> The application has the information of the queue it has created and

>> >> the queues created by implementation are destroyed by implementation.

>> >

>> > If certain fields of the packet are hashed to a queue handle, and this

>> > queue

>> > handle has not previously been created via odp_queue_create(), there

>> > might

>> > be a use case where the application needs to be aware of a new "flow"..

>> > Maybe the application ages flows.

>

>

> I'm not sure I understand the concern being raised here. Packet fields are

> matched against PMRs to get a matching CoS. That CoS, in turn, is associated

> with with a queue or a queue group. If the latter then specified subfields

> within the packet are hashed to generate an index into that queue group to

> select the individual queue within the target queue group that is to receive

> the packet. Whether these queues have been preallocated at

> odp_queue_group_create() time, or allocated dynamically on first reference

> is up to the implementation, however especially in the case of "large" queue

> groups it can be expected that the number of actual queues in use will be

> sparse so a deferred allocation strategy will most likely be used.

>

> Applications are aware of flows because that's what an individual queue

> coming out of the classifier represents. An interesting question arises is

> if a higher-level protocol (e.g., a TCP FIN sequence) ends a given flow,

> meaning that the context represented by an individual queue within a queue

> group can be released. Especially in the case of sparse queue groups it

> might be worthwhile to have an API that can communicate this flow release

> back to the classifier to facilitate queue resource management.

>

>>

>>

>> It is very difficult in network traffic to predict the exact flows

>> which will be coming in an interface.

>> The application can configure for all the possible flows in that case.

>

>

> Not sure what you mean by the application configuring. If the hash is for a

> UDP port, for example, then the queue group has 64K (logical) queues

> associated with it.  Which of these are active (and hence require

> instantiation) depends on the inbound traffic that is received, which may be

> unpredictable. But the management of this is an ODP implementation concern

> rather than an application concern, unless we extend the API with a flow

> release hint as suggested above.


In your above example the logical queue number need not be 64K, the
application can configure the maximum queue and if it configures let
say 2^10 in which case the different possible values will be 1024 and
even in that case only the application can better predict the network
traffic.

One way in which this could be done is by exposing a API like,
uint64_t odp_queue_group_hash_result(odp_queue_group_t group, odp_packet_t pkt)

Where the application can construct the possible expected packet and
send to odp_queue_group_hash_result() API to get the hash value as a
result.
>

>>

>>

>> >

>> >>

>> >>>

>> >>>> It is actually simpler if implementation creates a flow since in that

>> >>>> case implementation need not accumulate meta-data for all possible

>> >>>> hash values in a queue group and it can be created when traffic

>> >>>> arrives in that particular flow.

>> >>>>

>> >>>> >

>> >>>> > Could an application ever call odp_schedule() and receive an event

>> >>>> > (e.g. packet)

>> >>>> > from a queue (of opaque type odp_queue_t) and that queue has never

>> >>>> > been created

>> >>>> > by the application (via odp_queue_create())? Could that ever happen

>> >>>> > from the

>> >>>> > hardware, and could the application ever handle that?

>> >>>>

>> >>>> No. All the queues in the system are created by the application

>> >>>> either

>> >>>> directly or in-directly.

>> >>>> In-case of queue groups the queues are in-directly created by the

>> >>>> application by configuring a queue group.

>> >>>>

>> >>>> > Or, is it related to memory usage? The reference implementation

>> >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>> >>>> >

>> >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>> >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>> >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>> >>>> >

>> >>>> > Forget about 320 bytes per queue, if each queue was represented by

>> >>>> > a 32-bit

>> >>>> > integer (4 bytes!) the usage would be:

>> >>>> >

>> >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>> >>>> >   2^26 ~=  67,108,864 queues ->    256 MB

>> >>>> >   2^22 ~=   4,194,304 queues ->     16 MB

>> >>>> >

>> >>>> > That still might be a lot of usage if the application must

>> >>>> > explicitly create

>> >>>> > every queue (before it is used) and require an ODP implementation

>> >>>> > to map

>> >>>> > between every ODP queue object (opaque type) and the internal

>> >>>> > queue.

>> >>>> >

>> >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)

>> >>>> > integers. An opaque

>> >>>> > pointer is used to point to some other software object. This object

>> >>>> > should be

>> >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)

>> >>>> > otherwise it

>> >>>> > could just be represented in a 64-bit (or 32-bit) integer type

>> >>>> > value!

>> >>>> >

>> >>>> > To support millions of queues (flows) should odp_queue_t be an

>> >>>> > integer type in

>> >>>> > the API? A software-only implementation may still use 320 bytes per

>> >>>> > queue and

>> >>>> > use that integer as an index into an array or as a key for lookup

>> >>>> > operation on a

>> >>>> > data structure containing queues. An implementation with hardware

>> >>>> > assist may

>> >>>> > use this integer value directly when interfacing with hardware!

>> >>>>

>> >>>> I believe I have answered this question based on explanation above.

>> >>>> Pls feel free to point out if something is not clear.

>> >>>>

>> >>>> >

>> >>>> > Would it still be necessary to assign a "name" to each queue

>> >>>> > (flow)?

>> >>>>

>> >>>> "name" per queue might not be required since it would mean a

>> >>>> character

>> >>>> based lookup across millions of items.

>> >>>>

>> >>>> >

>> >>>> > Would a queue (flow) also require an "op type" to explicitly

>> >>>> > specify whether

>> >>>> > access to the queue (flow) is threadsafe? Atomic queues are

>> >>>> > threadsafe since

>> >>>> > only 1 core at any given time can recieve from it. Parallel queues

>> >>>> > are also

>> >>>> > threadsafe. Are all ODP APIs threadsafe?

>> >>>>

>> >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and

>> >>>> ODP_QUEUE_OP_MT_UNSAFE.

>> >>>> Rest of the ODP APIs are multi thread safe since in ODP there is no

>> >>>> defined way in which a single packet can be given to more than one

>> >>>> core at the same time, as packets move across different modules

>> >>>> through queues.

>> >>>>

>> >>>> >

>> >>>> >> A single PMR can only match a packet to a single queue associated

>> >>>> >> with

>> >>>> >> a target CoS. This prohibits efficient capture of subfield

>> >>>> >> classification.

>> >>>> >

>> >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it is

>> >>>> > possible

>> >>>> > to create a single PMR which matches multiple fields of a packet. I

>> >>>> > can imagine

>> >>>> > a case where a packet matches pmr1 (match Vlan) and also matches

>> >>>> > pmr2

>> >>>> > (match Vlan AND match L3DestIP). Is that an example of subfield

>> >>>> > classification?

>> >>>> > How does the queue relate?

>> >>>>

>> >>>> This question is related to classification, If a PMR is configured

>> >>>> with more than one odp_pmr_param_t then the PMR is considered a hit

>> >>>> only if the packet matches all the configured params.

>> >>>>

>> >>>> Consider the following,

>> >>>>

>> >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>> >>>>

>> >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and

>> >>>> will be first applied with PMR1

>> >>>> 2) If the packet matches PMR1 it will be delivered to CoS1

>> >>>> 3) If the packet does not match PMR1 then it will remain in

>> >>>> Default_CoS.

>> >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the

>> >>>> packet matches PMR2 then it will be delivered to CoS2.

>> >>>> 5). If the packet does not match PMR2 it will remain in CoS1.

>> >>>>

>> >>>>

>> >>>> Each CoS will be configured with queue groups.

>> >>>> Based on the final CoS of the packet the hash configuration (RSS) of

>> >>>> the queue group will be applied to the packet and the packet will be

>> >>>> spread across the queues within the queue group.

>> >>>

>> >>> Got it. So Classification PMR CoS happens entirely before Queue

>> >>> Groups.

>> >>> And with Queue Groups it allows a single PMR to match a packet and

>> >>> assign

>> >>> that packet to 1 out of Many queues instead of just 1 queue only.

>> >>>

>> >>>> Hope this clarifies.

>> >>>> Bala

>> >>>>

>> >>>> >

>> >>>> >> To solve these issues, Tiger Moth introduces the concept of a

>> >>>> >> queue

>> >>>> >> group. A queue group is an extension to the existing queue

>> >>>> >> specification in a Class of Service.

>> >>>> >>

>> >>>> >> Queue groups solve the classification issues associated with

>> >>>> >> individual queues in three ways:

>> >>>> >>

>> >>>> >> * The odp_queue_group_create() API can create a large number of

>> >>>> >> related queues with a single call.

>> >>>> >

>> >>>> > If the application calls this API, does that mean the ODP

>> >>>> > implementation

>> >>>> > can create a large number of queues? What happens if the

>> >>>> > application

>> >>>> > receives an event on a queue that was created by the

>> >>>> > implmentation--how

>> >>>> > does the application know whether this queue was created by the

>> >>>> > hardware

>> >>>> > according to the ODP Classification or whether the queue was

>> >>>> > created by

>> >>>> > the application?

>> >>>> >

>> >>>> >> * A single PMR can spread traffic to many queues associated with

>> >>>> >> the

>> >>>> >> same CoS by assigning packets matching the PMR to a queue group

>> >>>> >> rather

>> >>>> >> than a queue.

>> >>>> >> * A hashed PMR subfield is used to distribute individual queues

>> >>>> >> within

>> >>>> >> a queue group for scheduling purposes.

>> >>>> >

>> >>>> > Is there a way to write a test case for this? Trying to think of

>> >>>> > what kind of

>> >>>> > packets (traffic distribution) and how those packets would get

>> >>>> > classified and

>> >>>> > get assigned to queues.

>> >>>> >

>> >>>> >> diff --git a/include/odp/api/spec/classification.h

>> >>>> >> b/include/odp/api/spec/classification.h

>> >>>> >> index 6eca9ab..cf56852 100644

>> >>>> >> --- a/include/odp/api/spec/classification.h

>> >>>> >> +++ b/include/odp/api/spec/classification.h

>> >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>> >>>> >>

>> >>>> >> /** A Boolean to denote support of PMR range */

>> >>>> >> odp_bool_t pmr_range_supported;

>> >>>> >> +

>> >>>> >> + /** A Boolean to denote support of queue group */

>> >>>> >> + odp_bool_t queue_group_supported;

>> >>>> >> +

>> >>>> >> + /** A Boolean to denote support of queue */

>> >>>> >> + odp_bool_t queue_supported;

>> >>>> >> } odp_cls_capability_t;

>> >>>> >>

>> >>>> >>

>> >>>> >> /**

>> >>>> >> @@ -162,7 +168,18 @@ typedef enum {

>> >>>> >>  * Used to communicate class of service creation options

>> >>>> >>  */

>> >>>> >> typedef struct odp_cls_cos_param {

>> >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>> >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with

>> >>>> >> CoS.

>> >>>> >> + */

>> >>>> >> + odp_queue_type_e type;

>> >>>> >> +

>> >>>> >> + typedef union {

>> >>>> >> + /** Queue associated with CoS */

>> >>>> >> + odp_queue_t queue;

>> >>>> >> +

>> >>>> >> + /** Queue Group associated with CoS */

>> >>>> >> + odp_queue_group_t queue_group;

>> >>>> >> + };

>> >>>> >> odp_pool_t pool; /**< Pool associated with CoS */

>> >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS

>> >>>> >> */

>> >>>> >> } odp_cls_cos_param_t;

>> >>>> >>

>> >>>> >>

>> >>>> >> diff --git a/include/odp/api/spec/queue.h

>> >>>> >> b/include/odp/api/spec/queue.h

>> >>>> >> index 51d94a2..7dde060 100644

>> >>>> >> --- a/include/odp/api/spec/queue.h

>> >>>> >> +++ b/include/odp/api/spec/queue.h

>> >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> >>>> >> odp_queue_t odp_queue_create(const char *name, const

>> >>>> >> odp_queue_param_t *param);

>> >>>> >>

>> >>>> >> +/**

>> >>>> >> + * Queue group capability

>> >>>> >> + * This capability structure defines system Queue Group

>> >>>> >> capability

>> >>>> >> + */

>> >>>> >> +typedef struct odp_queue_group_capability_t {

>> >>>> >> + /** Number of queues supported per queue group */

>> >>>> >> + unsigned supported_queues;

>> >>>> >> + /** Supported protocol fields for hashing*/

>> >>>> >> + odp_pktin_hash_proto_t supported;

>> >>>> >> +}

>> >>>> >> +

>> >>>> >> +/**

>> >>>> >> + * ODP Queue Group parameters

>> >>>> >> + * Queue group supports only schedule queues <TBD??>

>> >>>> >> + */

>> >>>> >> +typedef struct odp_queue_group_param_t {

>> >>>> >> + /** Number of queue to be created for this queue group

>> >>>> >> + * implementation may round up the value to nearest power of 2

>> >>>

>> >>> Wondering what this means for obtaining the max number of queues

>> >>> supported by the system via odp_queue_capability()..

>> >>>

>> >>> powers of 2..

>> >>>

>> >>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()

>> >>> max_queues should report 65,536 queues, right?

>> >>>

>> >>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should

>> >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

>> >>> (2^12) 4096 queues?

>> >>

>> >> odp_queue_capability() is called before creating the creating the queue

>> >> group,

>> >> so if an implementation has the limitation that it can only support

>> >> 2^16 queues then application

>> >> has to configure only 2^16 queues in the queue group.

>> >

>> > In this use case, wouldn't all queues then be reserved for creation

>> > by the implementation? And, now odp_queue_create() will always return

>> > the null handle?

>> >

>> > What happens if you do:

>> > 1. odp_queue_capability() -> 2^16 queues

>> > 2. odp_queue_group_create( 2^4 queues )

>> > 3. odp_queue_capability() -> ???

>>

>> This is a limit on the number of queue supported by a queue group.

>> This does not reflect the number of queues created using

>> odp_queue_create() function.

>> The implementation updates the maximum number of queues it can support

>> within a queue group, the application is free to configure any number

>> less than the maximum supported.

>

>

> Capabilities in ODP are used to specify implementation limits, not current

> allocations. For example, in odp-linux there is currently a limit of 64

> pools that can be created. It doesn't matter how many are currently created

> as that is simply the system limit on odp_pool_create(). The same would

> apply for queues and queue groups. An implementation may be limited to N

> queue groups that can contain a maximum of K queues each. Separately the

> implementation might have a limit of X total queues it can support. How

> these are divided among individual queues or queues that are members of

> queue groups should not affect these capability limits, which are static.

>

> When an allocation request is made and an internal limit is exceeded the

> allocation request simply fails. The capabilities are there to guide the

> application in its allocation requests so that such "surprises" are rare.


I do not disagree with the above. I believe my proposal is inline with the same.
>

>>

>>

>> >

>> >>>

>> >>> Could there be a dramatic effect on the total number of queues when

>> >>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t

>> >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits

>> >>> used and effective number of queues is (16+16+16+16) 64 queues.

>> >>>

>> >>> Is it be possible to flexibly utilize all 2^16 queues the platform

>> >>> supports regardless of whether the queue was created by the

>> >>> implementation

>> >>> or explicitly created by the application?

>> >>

>> >> This limitation is per queue group and there can be a limitation of

>> >> total queue group in the system.

>> >> Usually the total queue group supported would be a limited number.

>> >>

>> >>>

>> >>> If so, is there a way to store this extra bit of information--whether

>> >>> a queue was created by the implementation or the application?

>> >>> One of the 16 bits might work.

>> >>> But, this reduces the number of queues to (2^15) 32768.

>> >>> ..at least they are fully utilizable by both implementation and

>> >>> application.

>> >>

>> >> There are different queue types and we can ODP_QUEUE_GROUP_T as a new

>> >> type to differentiate

>> >> a queue created by odp_queue_create() and using queue group create

>> >> function.

>> >>

>> >> During destroying of the resources, the application destroys the

>> >> queues created by application and

>> >> implementation destroys the queues within a queue group when

>> >> application destroys queue group.

>> >>

>> >>>

>> >>> When the application receives an odp_event_t from odp_queue_t after

>> >>> a call to odp_schedule(), could the application call..

>> >>> odp_queue_domain() to check whether this odp_queue_t was created by

>> >>> the implementation or the application? Function returns that bit.

>> >>

>> >> Since we have a queue type which can be got using the function

>> >> odp_queue_type_t I think this

>> >> odp_queue_domain() API is not needed.

>> >

>> > Petri pointed out this week that a packet_io's destination queue may

>> > also

>> > be another case that could use a queue_group.

>> > Perhaps what is needed is a way to connect blocks (ODP objects)

>> > together like legos using something other than an odp_queue_t --

>> > because what flows through these blocks are events from one

>> > (and now more!) odp_queue_t.  Whether the queue was created by

>> > the implementation or application is a separate concern.

>

>

> One possible extension area similar to this would be link bonding where

> multiple pktios are bonded together for increased throughput and/or failover

> (depending on whether the bond is active/active or active/standby). We

> alluded to this in a recent ARCH call where TM talks to a single PktIO

> however that PktIO might represent multiple links in this case. A more

> generalized "group" concept might be an easy way to achieve that here.

>

>>

>> >

>> >>>

>> >>> If the queue domain is implementation, could it be an event

>> >>> (newly arrived packet) that came through Classification PMR CoS (CPC)?

>> >>> The packet is assigned to a odp_queue_t (flow) (created by the

>> >>> implementation)

>> >>> as defined by the CPC that was setup by the application.

>> >>> Might want efficient access to packet metadata which was populated

>> >>> as an effect of the packet passing through CPC stage.

>> >>>

>> >>> If the queue domain is application, could it be an event

>> >>> (crypto compl, or any synchronization point against ipblock or

>> >>> device over PCI bus that indicates some assist/acceleration work

>> >>> has finished) comes from a odp_queue_t previously created by the

>> >>> application via a call to odp_queue_create() (which sets that bit)?

>> >>> This queue would be any queue (not necessarily a packet 'flow')

>> >>> created by the data plane software (application).

>> >>

>> >> We already have a queue type and event type which differentiates the

>> >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow

>> >> queues can be

>> >> created only using HW since it is mainly useful for spreading the

>> >> packets across multiple flows.

>> >>

>> >> -Bala

>> >>>

>> >>>> >> + * and value should be less than the number of queues

>> >>>> >> + * supported per queue group

>> >>>> >> + */

>> >>>> >> + unsigned num_queue;

>> >>>> >> +

>> >>>> >> + /** Protocol field selection for queue group distribution

>> >>>> >> + * Multiple fields can be selected in combination

>> >>>> >> + */

>> >>>> >> + odp_queue_group_hash_proto_t hash;

>> >>>> >> +

>> >>>> >> +} odp_queue_group_param_t;

>> >>>> >> +

>> >>>> >> +/**

>> >>>> >> + * Initialize queue group params

>> >>>> >> + *

>> >>>> >> + * Initialize an odp_queue_group_param_t to its default values

>> >>>> >> for all fields.

>> >>>> >> + *

>> >>>> >> + * @param param   Address of the odp_queue_group_param_t to be

>> >>>> >> initialized

>> >>>> >> + */

>> >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t *param);

>> >>>> >> +

>> >>>> >> +/**

>> >>>> >> + * Queue Group create

>> >>>> >> + *

>> >>>> >> + * Create a queue group according to the queue group parameters.

>> >>>> >> + * The individual queues belonging to a queue group are created

>> >>>> >> by the

>> >>>> >> + * implementation and the distribution of packets into those

>> >>>> >> queues are

>> >>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.

>> >>>> >> + * The individual queues within a queue group are both created

>> >>>> >> and deleted

>> >>>> >> + * by the implementation.

>> >>>> >> + *

>> >>>> >> + * @param name    Queue Group name

>> >>>> >> + * @param param   Queue Group parameters.

>> >>>> >> + *

>> >>>> >> + * @return Queue group handle

>> >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> >>>> >> + */

>> >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>> >>>> >> + const odp_queue_group_param_t *param);

>> >>>> >> Regards,

>> >>>> >> Bala

>

>
Balasubramanian Manoharan Nov. 24, 2016, 6:33 a.m. UTC | #15
Regards,
Bala


On 23 November 2016 at 03:00, Honnappa Nagarahalli
<honnappa.nagarahalli@linaro.org> wrote:
>

>

> On 18 November 2016 at 05:54, Bala Manoharan <bala.manoharan@linaro.org>

> wrote:

>>

>> Regards,

>> Bala

>>

>>

>> On 18 November 2016 at 10:05, Honnappa Nagarahalli

>> <honnappa.nagarahalli@linaro.org> wrote:

>> > Hi Bala,

>> >    I am trying to catch up on this conversation. I have few questions

>> > here,

>> > not sure if they are discussed already.

>> >

>> > 1) Why is there a need for such large number of queues (Million) in a

>> > system? If the system meets the line rate, the packets (and the flows)

>> > waiting to be processed is small. So, a small number of queues should

>> > suffice, if we want to have to queue per flow. I agree that the system

>> > will

>> > receive packets belonging to million flows over a period of time.

>>

>> Queue group is a mechanism to distribute the incoming network traffic

>> into multiple flows and this queue group is a refined version of RSS

>> hashing since in this case the packets belonging to different CoS

>> could be configured with different hash parameters. The individual

>> queues created within a queue group indicate a flow.

>>

>> Regarding the requirement of millions of flows, it is required when

>> processing atomic queues or critical sections per flow where the

>> application has to maintain synchronizations of packets belonging to a

>> particular flow.

>>

> Your proposal is mapping the flows to queues 1:1.

> At any given instance, the system will contain a handful number of flows.

> So, handful of queues should be enough. So, another solution could be to map

> the flows to queues n:1 (say by hashing the flow ID). Since, the system will

> have a handful of flows, the same efficiency of 1:1 mapping can be achieved.

> This will reduce the complexity of the scheduler as well as reduce the

> amount of resources used.


In ODP context the queue is a flow and in this proposal the
implementation can specify as to the number of queues (flows) it can
support using the capability structure. So depending upon the
complexity of the scheduler it can expose the number of queues it can
support and if the implementation uses a HW scheduler then it should
be possible to support millions of queues.

>

> I am trying to understand the ODP implementation, please correct me if I am

> wrong. In ODP, the sync type (atomic/ordered) is associated with the queue.

> My understanding is that sync type should be a property of the packet flow.

> Queue should be associated with priority/differential treatment of the

> packets. For ex: consider 2 packet flows of the same priority, one requiring

> atomic processing and the other requiring ordered processing. In ODP, these

> 2 flows cannot be queued to the same queue. I think this is a problem and we

> need to separate the sync type from the queue.


The ordered/ atomic context should be maintained per queue, since
queue is a flow in this context. From ODP point of view queue is an
abstract concept and it is upto the implementation to map the same to
their internal HW. In your example the 2 packet flows should be
enqueued into two different queues to maintain atomic or ordered
context. Pls refer to odp_schedule() documentation.

Regards,
Bala
>

>>

>> This queue group proposal is flexible in the sense that the

>> implementation exposes the maximum number of queues it can support

>> within a queue group and the number of queues per queue group is

>> configured by the application. So if required the application can only

>> configure a smaller number of queues.

>>

>> >

>> > 2) Are there any thoughts on implementing a software scheduler for

>> > linux-generic which supports large number of queues?

>>

>> I am not sure if the existing SW scheduler will be able to handle

>> millions of queues efficiently if not it can be enhanced.

>>

>> >

>> > 3) What will be the arbitration method (round robin, strict priority

>> > etc)

>> > among the queues in the queue group? If it is weighted round robin or

>> > strict

>> > priority, how are the priorities assigned to these queues?

>>

>> The priorities are assigned per queue group and all the queues

>> belonging to a single queue group have the same priority.

>> The concept of queue group is mainly to distribute the incoming

>> traffic to multiple flows for efficient processing by the scheduler.

>>

>> On the other hand, odp_queue_create() API has priority as a input

>> parameter value.

>>

>> >

>> > 4) Is the intent, to propagate the flow ID from classification down the

>> > packet processing pipeline?

>>

>> Yes.

>>

>> Regards,

>> Bala

>> >

>> > Thank you,

>> > Honnappa

>> >

>> >

>> > On 17 November 2016 at 13:59, Bill Fischofer <bill.fischofer@linaro.org>

>> > wrote:

>> >>

>> >> On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan

>> >> <bala.manoharan@linaro.org>

>> >> wrote:

>> >>

>> >> > Regards,

>> >> > Bala

>> >> >

>> >> >

>> >> > On 15 November 2016 at 22:43, Brian Brooks <brian.brooks@linaro.org>

>> >> > wrote:

>> >> > > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan

>> >> > > <bala.manoharan@linaro.org> wrote:

>> >> > >> Regards,

>> >> > >> Bala

>> >> > >>

>> >> > >>

>> >> > >> On 11 November 2016 at 13:26, Brian Brooks

>> >> > >> <brian.brooks@linaro.org>

>> >> > wrote:

>> >> > >>> On 11/10 15:17:15, Bala Manoharan wrote:

>> >> > >>>> On 10 November 2016 at 13:26, Brian Brooks

>> >> > >>>> <brian.brooks@linaro.org>

>> >> > wrote:

>> >> > >>>> > On 11/07 16:46:12, Bala Manoharan wrote:

>> >> > >>>> >> Hi,

>> >> > >>>> >

>> >> > >>>> > Hiya

>> >> > >>>> >

>> >> > >>>> >> This mail thread discusses the design of classification queue

>> >> > >>>> >> group

>> >> > >>>> >> RFC. The same can be found in the google doc whose link is

>> >> > >>>> >> given

>> >> > >>>> >> below.

>> >> > >>>> >> Users can provide their comments either in this mail thread

>> >> > >>>> >> or

>> >> > >>>> >> in

>> >> > the

>> >> > >>>> >> google doc as per their convenience.

>> >> > >>>> >>

>> >> > >>>> >>

>> >> > >>>> >>

>> >> > >>>> >> https://docs.google.com/document/d/1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9

>> >> > slR93LZ8VXqM2o/edit?usp=sharing

>> >> > >>>> >>

>> >> > >>>> >> The basic issues with queues as being a single target for a

>> >> > >>>> >> CoS

>> >> > are two fold:

>> >> > >>>> >>

>> >> > >>>> >> Queues must be created and deleted individually. This imposes

>> >> > >>>> >> a

>> >> > >>>> >> significant burden when queues are used to represent

>> >> > >>>> >> individual

>> >> > flows

>> >> > >>>> >> since the application may need to process thousands (or

>> >> > >>>> >> millions)

>> >> > of

>> >> > >>>> >> flows.

>> >> > >>>> >

>> >> > >>>> > Wondering why there is an issue with creating and deleting

>> >> > >>>> > queues

>> >> > individually

>> >> > >>>> > if queue objects represent millions of flows..

>> >> > >>>>

>> >> > >>>> The queue groups are mainly required for hashing the incoming

>> >> > >>>> packets

>> >> > >>>> to multiple flows based on the hash configuration.

>> >> > >>>> So from application point of view it just needs a queue to have

>> >> > >>>> packets belonging to same flow and that packets belonging to

>> >> > >>>> different

>> >> > >>>> flows are placed in different queues respectively.It does not

>> >> > >>>> matter

>> >> > >>>> who creates the flow/queue.

>> >> > >>>

>> >> > >>> When the application receives an event from odp_schedule() call,

>> >> > >>> how

>> >> > does it

>> >> > >>> know whether the odp_queue_t was previously created by the

>> >> > >>> application

>> >> > from

>> >> > >>> odp_queue_create() or whether it was created by the

>> >> > >>> implementation?

>> >> > >>

>> >> > >> odp_schedule() call returns the queue from which the event was

>> >> > >> part

>> >> > >> of.

>> >> > >> The type of the queue can be got from odp_queue_type() API.

>> >> > >> But the question is there an use-case where the application need

>> >> > >> to

>> >> > know?

>> >> > >> The application has the information of the queue it has created

>> >> > >> and

>> >> > >> the queues created by implementation are destroyed by

>> >> > >> implementation.

>> >> > >

>> >> > > If certain fields of the packet are hashed to a queue handle, and

>> >> > > this

>> >> > queue

>> >> > > handle has not previously been created via odp_queue_create(),

>> >> > > there

>> >> > might

>> >> > > be a use case where the application needs to be aware of a new

>> >> > > "flow"..

>> >> > > Maybe the application ages flows.

>> >> >

>> >>

>> >> I'm not sure I understand the concern being raised here. Packet fields

>> >> are

>> >> matched against PMRs to get a matching CoS. That CoS, in turn, is

>> >> associated with with a queue or a queue group. If the latter then

>> >> specified

>> >> subfields within the packet are hashed to generate an index into that

>> >> queue

>> >> group to select the individual queue within the target queue group that

>> >> is

>> >> to receive the packet. Whether these queues have been preallocated at

>> >> odp_queue_group_create() time, or allocated dynamically on first

>> >> reference

>> >> is up to the implementation, however especially in the case of "large"

>> >> queue groups it can be expected that the number of actual queues in use

>> >> will be sparse so a deferred allocation strategy will most likely be

>> >> used.

>> >>

>> >> Applications are aware of flows because that's what an individual queue

>> >> coming out of the classifier represents. An interesting question arises

>> >> is

>> >> if a higher-level protocol (e.g., a TCP FIN sequence) ends a given

>> >> flow,

>> >> meaning that the context represented by an individual queue within a

>> >> queue

>> >> group can be released. Especially in the case of sparse queue groups it

>> >> might be worthwhile to have an API that can communicate this flow

>> >> release

>> >> back to the classifier to facilitate queue resource management.

>> >>

>> >>

>> >> >

>> >> > It is very difficult in network traffic to predict the exact flows

>> >> > which will be coming in an interface.

>> >> > The application can configure for all the possible flows in that

>> >> > case.

>> >> >

>> >>

>> >> Not sure what you mean by the application configuring. If the hash is

>> >> for

>> >> a

>> >> UDP port, for example, then the queue group has 64K (logical) queues

>> >> associated with it.  Which of these are active (and hence require

>> >> instantiation) depends on the inbound traffic that is received, which

>> >> may

>> >> be unpredictable. But the management of this is an ODP implementation

>> >> concern rather than an application concern, unless we extend the API

>> >> with

>> >> a

>> >> flow release hint as suggested above.

>> >>

>> >>

>> >> >

>> >> > >

>> >> > >>

>> >> > >>>

>> >> > >>>> It is actually simpler if implementation creates a flow since in

>> >> > >>>> that

>> >> > >>>> case implementation need not accumulate meta-data for all

>> >> > >>>> possible

>> >> > >>>> hash values in a queue group and it can be created when traffic

>> >> > >>>> arrives in that particular flow.

>> >> > >>>>

>> >> > >>>> >

>> >> > >>>> > Could an application ever call odp_schedule() and receive an

>> >> > >>>> > event

>> >> > (e.g. packet)

>> >> > >>>> > from a queue (of opaque type odp_queue_t) and that queue has

>> >> > >>>> > never

>> >> > been created

>> >> > >>>> > by the application (via odp_queue_create())? Could that ever

>> >> > >>>> > happen

>> >> > from the

>> >> > >>>> > hardware, and could the application ever handle that?

>> >> > >>>>

>> >> > >>>> No. All the queues in the system are created by the application

>> >> > >>>> either

>> >> > >>>> directly or in-directly.

>> >> > >>>> In-case of queue groups the queues are in-directly created by

>> >> > >>>> the

>> >> > >>>> application by configuring a queue group.

>> >> > >>>>

>> >> > >>>> > Or, is it related to memory usage? The reference

>> >> > >>>> > implementation

>> >> > >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.

>> >> > >>>> >

>> >> > >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB

>> >> > >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB

>> >> > >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB

>> >> > >>>> >

>> >> > >>>> > Forget about 320 bytes per queue, if each queue was

>> >> > >>>> > represented

>> >> > >>>> > by

>> >> > a 32-bit

>> >> > >>>> > integer (4 bytes!) the usage would be:

>> >> > >>>> >

>> >> > >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB

>> >> > >>>> >   2^26 ~=  67,108,864 queues ->    256 MB

>> >> > >>>> >   2^22 ~=   4,194,304 queues ->     16 MB

>> >> > >>>> >

>> >> > >>>> > That still might be a lot of usage if the application must

>> >> > explicitly create

>> >> > >>>> > every queue (before it is used) and require an ODP

>> >> > >>>> > implementation

>> >> > to map

>> >> > >>>> > between every ODP queue object (opaque type) and the internal

>> >> > >>>> > queue.

>> >> > >>>> >

>> >> > >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)

>> >> > integers. An opaque

>> >> > >>>> > pointer is used to point to some other software object. This

>> >> > >>>> > object

>> >> > should be

>> >> > >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer

>> >> > >>>> > mode)

>> >> > otherwise it

>> >> > >>>> > could just be represented in a 64-bit (or 32-bit) integer type

>> >> > value!

>> >> > >>>> >

>> >> > >>>> > To support millions of queues (flows) should odp_queue_t be an

>> >> > integer type in

>> >> > >>>> > the API? A software-only implementation may still use 320

>> >> > >>>> > bytes

>> >> > >>>> > per

>> >> > queue and

>> >> > >>>> > use that integer as an index into an array or as a key for

>> >> > >>>> > lookup

>> >> > operation on a

>> >> > >>>> > data structure containing queues. An implementation with

>> >> > >>>> > hardware

>> >> > assist may

>> >> > >>>> > use this integer value directly when interfacing with

>> >> > >>>> > hardware!

>> >> > >>>>

>> >> > >>>> I believe I have answered this question based on explanation

>> >> > >>>> above.

>> >> > >>>> Pls feel free to point out if something is not clear.

>> >> > >>>>

>> >> > >>>> >

>> >> > >>>> > Would it still be necessary to assign a "name" to each queue

>> >> > >>>> > (flow)?

>> >> > >>>>

>> >> > >>>> "name" per queue might not be required since it would mean a

>> >> > >>>> character

>> >> > >>>> based lookup across millions of items.

>> >> > >>>>

>> >> > >>>> >

>> >> > >>>> > Would a queue (flow) also require an "op type" to explicitly

>> >> > specify whether

>> >> > >>>> > access to the queue (flow) is threadsafe? Atomic queues are

>> >> > threadsafe since

>> >> > >>>> > only 1 core at any given time can recieve from it. Parallel

>> >> > >>>> > queues

>> >> > are also

>> >> > >>>> > threadsafe. Are all ODP APIs threadsafe?

>> >> > >>>>

>> >> > >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT

>> >> > >>>> and

>> >> > >>>> ODP_QUEUE_OP_MT_UNSAFE.

>> >> > >>>> Rest of the ODP APIs are multi thread safe since in ODP there is

>> >> > >>>> no

>> >> > >>>> defined way in which a single packet can be given to more than

>> >> > >>>> one

>> >> > >>>> core at the same time, as packets move across different modules

>> >> > >>>> through queues.

>> >> > >>>>

>> >> > >>>> >

>> >> > >>>> >> A single PMR can only match a packet to a single queue

>> >> > >>>> >> associated

>> >> > with

>> >> > >>>> >> a target CoS. This prohibits efficient capture of subfield

>> >> > >>>> >> classification.

>> >> > >>>> >

>> >> > >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so

>> >> > >>>> > it

>> >> > >>>> > is

>> >> > possible

>> >> > >>>> > to create a single PMR which matches multiple fields of a

>> >> > >>>> > packet.

>> >> > >>>> > I

>> >> > can imagine

>> >> > >>>> > a case where a packet matches pmr1 (match Vlan) and also

>> >> > >>>> > matches

>> >> > pmr2

>> >> > >>>> > (match Vlan AND match L3DestIP). Is that an example of

>> >> > >>>> > subfield

>> >> > classification?

>> >> > >>>> > How does the queue relate?

>> >> > >>>>

>> >> > >>>> This question is related to classification, If a PMR is

>> >> > >>>> configured

>> >> > >>>> with more than one odp_pmr_param_t then the PMR is considered a

>> >> > >>>> hit

>> >> > >>>> only if the packet matches all the configured params.

>> >> > >>>>

>> >> > >>>> Consider the following,

>> >> > >>>>

>> >> > >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.

>> >> > >>>>

>> >> > >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS

>> >> > >>>> and

>> >> > >>>> will be first applied with PMR1

>> >> > >>>> 2) If the packet matches PMR1 it will be delivered to CoS1

>> >> > >>>> 3) If the packet does not match PMR1 then it will remain in

>> >> > Default_CoS.

>> >> > >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If

>> >> > >>>> the

>> >> > >>>> packet matches PMR2 then it will be delivered to CoS2.

>> >> > >>>> 5). If the packet does not match PMR2 it will remain in CoS1.

>> >> > >>>>

>> >> > >>>>

>> >> > >>>> Each CoS will be configured with queue groups.

>> >> > >>>> Based on the final CoS of the packet the hash configuration

>> >> > >>>> (RSS)

>> >> > >>>> of

>> >> > >>>> the queue group will be applied to the packet and the packet

>> >> > >>>> will

>> >> > >>>> be

>> >> > >>>> spread across the queues within the queue group.

>> >> > >>>

>> >> > >>> Got it. So Classification PMR CoS happens entirely before Queue

>> >> > >>> Groups.

>> >> > >>> And with Queue Groups it allows a single PMR to match a packet

>> >> > >>> and

>> >> > assign

>> >> > >>> that packet to 1 out of Many queues instead of just 1 queue only.

>> >> > >>>

>> >> > >>>> Hope this clarifies.

>> >> > >>>> Bala

>> >> > >>>>

>> >> > >>>> >

>> >> > >>>> >> To solve these issues, Tiger Moth introduces the concept of a

>> >> > >>>> >> queue

>> >> > >>>> >> group. A queue group is an extension to the existing queue

>> >> > >>>> >> specification in a Class of Service.

>> >> > >>>> >>

>> >> > >>>> >> Queue groups solve the classification issues associated with

>> >> > >>>> >> individual queues in three ways:

>> >> > >>>> >>

>> >> > >>>> >> * The odp_queue_group_create() API can create a large number

>> >> > >>>> >> of

>> >> > >>>> >> related queues with a single call.

>> >> > >>>> >

>> >> > >>>> > If the application calls this API, does that mean the ODP

>> >> > implementation

>> >> > >>>> > can create a large number of queues? What happens if the

>> >> > >>>> > application

>> >> > >>>> > receives an event on a queue that was created by the

>> >> > implmentation--how

>> >> > >>>> > does the application know whether this queue was created by

>> >> > >>>> > the

>> >> > hardware

>> >> > >>>> > according to the ODP Classification or whether the queue was

>> >> > created by

>> >> > >>>> > the application?

>> >> > >>>> >

>> >> > >>>> >> * A single PMR can spread traffic to many queues associated

>> >> > >>>> >> with

>> >> > the

>> >> > >>>> >> same CoS by assigning packets matching the PMR to a queue

>> >> > >>>> >> group

>> >> > rather

>> >> > >>>> >> than a queue.

>> >> > >>>> >> * A hashed PMR subfield is used to distribute individual

>> >> > >>>> >> queues

>> >> > within

>> >> > >>>> >> a queue group for scheduling purposes.

>> >> > >>>> >

>> >> > >>>> > Is there a way to write a test case for this? Trying to think

>> >> > >>>> > of

>> >> > what kind of

>> >> > >>>> > packets (traffic distribution) and how those packets would get

>> >> > classified and

>> >> > >>>> > get assigned to queues.

>> >> > >>>> >

>> >> > >>>> >> diff --git a/include/odp/api/spec/classification.h

>> >> > >>>> >> b/include/odp/api/spec/classification.h

>> >> > >>>> >> index 6eca9ab..cf56852 100644

>> >> > >>>> >> --- a/include/odp/api/spec/classification.h

>> >> > >>>> >> +++ b/include/odp/api/spec/classification.h

>> >> > >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {

>> >> > >>>> >>

>> >> > >>>> >> /** A Boolean to denote support of PMR range */

>> >> > >>>> >> odp_bool_t pmr_range_supported;

>> >> > >>>> >> +

>> >> > >>>> >> + /** A Boolean to denote support of queue group */

>> >> > >>>> >> + odp_bool_t queue_group_supported;

>> >> > >>>> >> +

>> >> > >>>> >> + /** A Boolean to denote support of queue */

>> >> > >>>> >> + odp_bool_t queue_supported;

>> >> > >>>> >> } odp_cls_capability_t;

>> >> > >>>> >>

>> >> > >>>> >>

>> >> > >>>> >> /**

>> >> > >>>> >> @@ -162,7 +168,18 @@ typedef enum {

>> >> > >>>> >>  * Used to communicate class of service creation options

>> >> > >>>> >>  */

>> >> > >>>> >> typedef struct odp_cls_cos_param {

>> >> > >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */

>> >> > >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,

>> >> > >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked

>> >> > >>>> >> with

>> >> > CoS.

>> >> > >>>> >> + */

>> >> > >>>> >> + odp_queue_type_e type;

>> >> > >>>> >> +

>> >> > >>>> >> + typedef union {

>> >> > >>>> >> + /** Queue associated with CoS */

>> >> > >>>> >> + odp_queue_t queue;

>> >> > >>>> >> +

>> >> > >>>> >> + /** Queue Group associated with CoS */

>> >> > >>>> >> + odp_queue_group_t queue_group;

>> >> > >>>> >> + };

>> >> > >>>> >> odp_pool_t pool; /**< Pool associated with CoS */

>> >> > >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with

>> >> > >>>> >> CoS

>> >> > >>>> >> */

>> >> > >>>> >> } odp_cls_cos_param_t;

>> >> > >>>> >>

>> >> > >>>> >>

>> >> > >>>> >> diff --git a/include/odp/api/spec/queue.h

>> >> > b/include/odp/api/spec/queue.h

>> >> > >>>> >> index 51d94a2..7dde060 100644

>> >> > >>>> >> --- a/include/odp/api/spec/queue.h

>> >> > >>>> >> +++ b/include/odp/api/spec/queue.h

>> >> > >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {

>> >> > >>>> >> odp_queue_t odp_queue_create(const char *name, const

>> >> > odp_queue_param_t *param);

>> >> > >>>> >>

>> >> > >>>> >> +/**

>> >> > >>>> >> + * Queue group capability

>> >> > >>>> >> + * This capability structure defines system Queue Group

>> >> > >>>> >> capability

>> >> > >>>> >> + */

>> >> > >>>> >> +typedef struct odp_queue_group_capability_t {

>> >> > >>>> >> + /** Number of queues supported per queue group */

>> >> > >>>> >> + unsigned supported_queues;

>> >> > >>>> >> + /** Supported protocol fields for hashing*/

>> >> > >>>> >> + odp_pktin_hash_proto_t supported;

>> >> > >>>> >> +}

>> >> > >>>> >> +

>> >> > >>>> >> +/**

>> >> > >>>> >> + * ODP Queue Group parameters

>> >> > >>>> >> + * Queue group supports only schedule queues <TBD??>

>> >> > >>>> >> + */

>> >> > >>>> >> +typedef struct odp_queue_group_param_t {

>> >> > >>>> >> + /** Number of queue to be created for this queue group

>> >> > >>>> >> + * implementation may round up the value to nearest power of

>> >> > >>>> >> 2

>> >> > >>>

>> >> > >>> Wondering what this means for obtaining the max number of queues

>> >> > >>> supported by the system via odp_queue_capability()..

>> >> > >>>

>> >> > >>> powers of 2..

>> >> > >>>

>> >> > >>> If the platform supports 2^16 (65,536) queues,

>> >> > >>> odp_queue_capability()

>> >> > >>> max_queues should report 65,536 queues, right?

>> >> > >>>

>> >> > >>> If an odp_queue_group_t is created requesting 2^4 (16) queues,

>> >> > >>> should

>> >> > >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or

>> >> > >>> (2^12) 4096 queues?

>> >> > >>

>> >> > >> odp_queue_capability() is called before creating the creating the

>> >> > >> queue

>> >> > group,

>> >> > >> so if an implementation has the limitation that it can only

>> >> > >> support

>> >> > >> 2^16 queues then application

>> >> > >> has to configure only 2^16 queues in the queue group.

>> >> > >

>> >> > > In this use case, wouldn't all queues then be reserved for creation

>> >> > > by the implementation? And, now odp_queue_create() will always

>> >> > > return

>> >> > > the null handle?

>> >> > >

>> >> > > What happens if you do:

>> >> > > 1. odp_queue_capability() -> 2^16 queues

>> >> > > 2. odp_queue_group_create( 2^4 queues )

>> >> > > 3. odp_queue_capability() -> ???

>> >> >

>> >> > This is a limit on the number of queue supported by a queue group.

>> >> > This does not reflect the number of queues created using

>> >> > odp_queue_create() function.

>> >> > The implementation updates the maximum number of queues it can

>> >> > support

>> >> > within a queue group, the application is free to configure any number

>> >> > less than the maximum supported.

>> >> >

>> >>

>> >> Capabilities in ODP are used to specify implementation limits, not

>> >> current

>> >> allocations. For example, in odp-linux there is currently a limit of 64

>> >> pools that can be created. It doesn't matter how many are currently

>> >> created

>> >> as that is simply the system limit on odp_pool_create(). The same would

>> >> apply for queues and queue groups. An implementation may be limited to

>> >> N

>> >> queue groups that can contain a maximum of K queues each. Separately

>> >> the

>> >> implementation might have a limit of X total queues it can support. How

>> >> these are divided among individual queues or queues that are members of

>> >> queue groups should not affect these capability limits, which are

>> >> static.

>> >>

>> >> When an allocation request is made and an internal limit is exceeded

>> >> the

>> >> allocation request simply fails. The capabilities are there to guide

>> >> the

>> >> application in its allocation requests so that such "surprises" are

>> >> rare.

>> >>

>> >>

>> >> >

>> >> > >

>> >> > >>>

>> >> > >>> Could there be a dramatic effect on the total number of queues

>> >> > >>> when

>> >> > >>> many odp_queue_group_t have been created? E.g. 4

>> >> > >>> odp_queue_group_t

>> >> > >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16

>> >> > >>> bits

>> >> > >>> used and effective number of queues is (16+16+16+16) 64 queues.

>> >> > >>>

>> >> > >>> Is it be possible to flexibly utilize all 2^16 queues the

>> >> > >>> platform

>> >> > >>> supports regardless of whether the queue was created by the

>> >> > implementation

>> >> > >>> or explicitly created by the application?

>> >> > >>

>> >> > >> This limitation is per queue group and there can be a limitation

>> >> > >> of

>> >> > >> total queue group in the system.

>> >> > >> Usually the total queue group supported would be a limited number.

>> >> > >>

>> >> > >>>

>> >> > >>> If so, is there a way to store this extra bit of

>> >> > >>> information--whether

>> >> > >>> a queue was created by the implementation or the application?

>> >> > >>> One of the 16 bits might work.

>> >> > >>> But, this reduces the number of queues to (2^15) 32768.

>> >> > >>> ..at least they are fully utilizable by both implementation and

>> >> > application.

>> >> > >>

>> >> > >> There are different queue types and we can ODP_QUEUE_GROUP_T as a

>> >> > >> new

>> >> > >> type to differentiate

>> >> > >> a queue created by odp_queue_create() and using queue group create

>> >> > function.

>> >> > >>

>> >> > >> During destroying of the resources, the application destroys the

>> >> > >> queues created by application and

>> >> > >> implementation destroys the queues within a queue group when

>> >> > >> application destroys queue group.

>> >> > >>

>> >> > >>>

>> >> > >>> When the application receives an odp_event_t from odp_queue_t

>> >> > >>> after

>> >> > >>> a call to odp_schedule(), could the application call..

>> >> > >>> odp_queue_domain() to check whether this odp_queue_t was created

>> >> > >>> by

>> >> > >>> the implementation or the application? Function returns that bit.

>> >> > >>

>> >> > >> Since we have a queue type which can be got using the function

>> >> > >> odp_queue_type_t I think this

>> >> > >> odp_queue_domain() API is not needed.

>> >> > >

>> >> > > Petri pointed out this week that a packet_io's destination queue

>> >> > > may

>> >> > > also

>> >> > > be another case that could use a queue_group.

>> >> > > Perhaps what is needed is a way to connect blocks (ODP objects)

>> >> > > together like legos using something other than an odp_queue_t --

>> >> > > because what flows through these blocks are events from one

>> >> > > (and now more!) odp_queue_t.  Whether the queue was created by

>> >> > > the implementation or application is a separate concern.

>> >> >

>> >>

>> >> One possible extension area similar to this would be link bonding where

>> >> multiple pktios are bonded together for increased throughput and/or

>> >> failover (depending on whether the bond is active/active or

>> >> active/standby). We alluded to this in a recent ARCH call where TM

>> >> talks

>> >> to

>> >> a single PktIO however that PktIO might represent multiple links in

>> >> this

>> >> case. A more generalized "group" concept might be an easy way to

>> >> achieve

>> >> that here.

>> >>

>> >>

>> >> > >

>> >> > >>>

>> >> > >>> If the queue domain is implementation, could it be an event

>> >> > >>> (newly arrived packet) that came through Classification PMR CoS

>> >> > >>> (CPC)?

>> >> > >>> The packet is assigned to a odp_queue_t (flow) (created by the

>> >> > implementation)

>> >> > >>> as defined by the CPC that was setup by the application.

>> >> > >>> Might want efficient access to packet metadata which was

>> >> > >>> populated

>> >> > >>> as an effect of the packet passing through CPC stage.

>> >> > >>>

>> >> > >>> If the queue domain is application, could it be an event

>> >> > >>> (crypto compl, or any synchronization point against ipblock or

>> >> > >>> device over PCI bus that indicates some assist/acceleration work

>> >> > >>> has finished) comes from a odp_queue_t previously created by the

>> >> > >>> application via a call to odp_queue_create() (which sets that

>> >> > >>> bit)?

>> >> > >>> This queue would be any queue (not necessarily a packet 'flow')

>> >> > >>> created by the data plane software (application).

>> >> > >>

>> >> > >> We already have a queue type and event type which differentiates

>> >> > >> the

>> >> > >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet

>> >> > >> flow

>> >> > >> queues can be

>> >> > >> created only using HW since it is mainly useful for spreading the

>> >> > >> packets across multiple flows.

>> >> > >>

>> >> > >> -Bala

>> >> > >>>

>> >> > >>>> >> + * and value should be less than the number of queues

>> >> > >>>> >> + * supported per queue group

>> >> > >>>> >> + */

>> >> > >>>> >> + unsigned num_queue;

>> >> > >>>> >> +

>> >> > >>>> >> + /** Protocol field selection for queue group distribution

>> >> > >>>> >> + * Multiple fields can be selected in combination

>> >> > >>>> >> + */

>> >> > >>>> >> + odp_queue_group_hash_proto_t hash;

>> >> > >>>> >> +

>> >> > >>>> >> +} odp_queue_group_param_t;

>> >> > >>>> >> +

>> >> > >>>> >> +/**

>> >> > >>>> >> + * Initialize queue group params

>> >> > >>>> >> + *

>> >> > >>>> >> + * Initialize an odp_queue_group_param_t to its default

>> >> > >>>> >> values

>> >> > for all fields.

>> >> > >>>> >> + *

>> >> > >>>> >> + * @param param   Address of the odp_queue_group_param_t to

>> >> > >>>> >> be

>> >> > initialized

>> >> > >>>> >> + */

>> >> > >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t

>> >> > >>>> >> *param);

>> >> > >>>> >> +

>> >> > >>>> >> +/**

>> >> > >>>> >> + * Queue Group create

>> >> > >>>> >> + *

>> >> > >>>> >> + * Create a queue group according to the queue group

>> >> > >>>> >> parameters.

>> >> > >>>> >> + * The individual queues belonging to a queue group are

>> >> > >>>> >> created

>> >> > by the

>> >> > >>>> >> + * implementation and the distribution of packets into those

>> >> > queues are

>> >> > >>>> >> + * decided based on the odp_queue_group_hash_proto_t

>> >> > >>>> >> parameters.

>> >> > >>>> >> + * The individual queues within a queue group are both

>> >> > >>>> >> created

>> >> > and deleted

>> >> > >>>> >> + * by the implementation.

>> >> > >>>> >> + *

>> >> > >>>> >> + * @param name    Queue Group name

>> >> > >>>> >> + * @param param   Queue Group parameters.

>> >> > >>>> >> + *

>> >> > >>>> >> + * @return Queue group handle

>> >> > >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure

>> >> > >>>> >> + */

>> >> > >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,

>> >> > >>>> >> + const odp_queue_group_param_t *param);

>> >> > >>>> >> Regards,

>> >> > >>>> >> Bala

>> >> >

>> >

>> >

>

>
diff mbox

Patch

diff --git a/include/odp/api/spec/classification.h
b/include/odp/api/spec/classification.h
index 6eca9ab..cf56852 100644
--- a/include/odp/api/spec/classification.h
+++ b/include/odp/api/spec/classification.h
@@ -126,6 +126,12 @@  typedef struct odp_cls_capability_t {

/** A Boolean to denote support of PMR range */
odp_bool_t pmr_range_supported;
+
+ /** A Boolean to denote support of queue group */
+ odp_bool_t queue_group_supported;
+
+ /** A Boolean to denote support of queue */
+ odp_bool_t queue_supported;
} odp_cls_capability_t;


/**
@@ -162,7 +168,18 @@  typedef enum {
 * Used to communicate class of service creation options
 */
typedef struct odp_cls_cos_param {
- odp_queue_t queue; /**< Queue associated with CoS */
+ /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,
+ * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked with CoS.
+ */
+ odp_queue_type_e type;
+
+ typedef union {
+ /** Queue associated with CoS */
+ odp_queue_t queue;
+
+ /** Queue Group associated with CoS */
+ odp_queue_group_t queue_group;
+ };
odp_pool_t pool; /**< Pool associated with CoS */
odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS */
} odp_cls_cos_param_t;


diff --git a/include/odp/api/spec/queue.h b/include/odp/api/spec/queue.h
index 51d94a2..7dde060 100644
--- a/include/odp/api/spec/queue.h
+++ b/include/odp/api/spec/queue.h
@@ -158,6 +158,87 @@  typedef struct odp_queue_param_t {
odp_queue_t odp_queue_create(const char *name, const odp_queue_param_t *param);

+/**
+ * Queue group capability
+ * This capability structure defines system Queue Group capability
+ */
+typedef struct odp_queue_group_capability_t {
+ /** Number of queues supported per queue group */
+ unsigned supported_queues;
+ /** Supported protocol fields for hashing*/
+ odp_pktin_hash_proto_t supported;
+}
+
+/**
+ * ODP Queue Group parameters
+ * Queue group supports only schedule queues <TBD??>
+ */
+typedef struct odp_queue_group_param_t {
+ /** Number of queue to be created for this queue group
+ * implementation may round up the value to nearest power of 2
+ * and value should be less than the number of queues
+ * supported per queue group
+ */
+ unsigned num_queue;
+
+ /** Protocol field selection for queue group distribution
+ * Multiple fields can be selected in combination
+ */
+ odp_queue_group_hash_proto_t hash;
+
+} odp_queue_group_param_t;
+
+/**
+ * Initialize queue group params
+ *
+ * Initialize an odp_queue_group_param_t to its default values for all fields.
+ *
+ * @param param   Address of the odp_queue_group_param_t to be initialized
+ */
+void odp_queue_group_param_init(odp_queue_group_param_t *param);
+
+/**
+ * Queue Group create
+ *
+ * Create a queue group according to the queue group parameters.
+ * The individual queues belonging to a queue group are created by the
+ * implementation and the distribution of packets into those queues are
+ * decided based on the odp_queue_group_hash_proto_t parameters.
+ * The individual queues within a queue group are both created and deleted
+ * by the implementation.
+ *
+ * @param name    Queue Group name
+ * @param param   Queue Group parameters.
+ *
+ * @return Queue group handle
+ * @retval ODP_QUEUE_GROUP_INVALID on failure
+ */
+odp_queue_group_t odp_queue_group_create(const char *name,
+ const odp_queue_group_param_t *param);
Regards,
Bala