diff mbox

[RFC] API:update odp_schedle_multi to return events from multiple queues.

Message ID 20170203115045.19750-1-nikhil.agarwal@linaro.org
State New
Headers show

Commit Message

Nikhil Agarwal Feb. 3, 2017, 11:50 a.m. UTC
Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

---
 include/odp/api/spec/schedule.h | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

-- 
2.9.3

Comments

Ola Liljedahl Feb. 3, 2017, 1:50 p.m. UTC | #1
Do we have any performance comparison between using this new API
compared to using the existing API and the SW behind
odp_schedule_multi() sorts the events (if necessary) and only returns
events for one queue at a time (keeping the others as prescheduled
events)?

If we don't know that this new API actually improves performance
(significantly) compared to using the existing API enhanced with some
under-the-hood fixup, I don't think we have a good case for changing
the API.


On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org> wrote:
> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

> ---

>  include/odp/api/spec/schedule.h | 36 +++++++++++++++++++++++++++++++-----

>  1 file changed, 31 insertions(+), 5 deletions(-)

>

> diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/schedule.h

> index f8fed17..6e8d759 100644

> --- a/include/odp/api/spec/schedule.h

> +++ b/include/odp/api/spec/schedule.h

> @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t wait);

>   * originate from the same source queue and share the same scheduler

>   * synchronization context.

>   *

> - * @param from    Output parameter for the source queue (where the event was

> - *                dequeued from). Ignored if NULL.

> + * @param from    Output parameter for the source queues array (where the event

> + *               were dequeued from). Ignored if NULL.

>   * @param wait    Minimum time to wait for an event. Waits infinitely, if set to

>   *                ODP_SCHED_WAIT. Does not wait, if set to ODP_SCHED_NO_WAIT.

>   *                Use odp_schedule_wait_time() to convert time to other wait

> @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t wait);

>   *

>   * @return Number of events outputted (0 ... num)

>   */

> -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t events[],

> +int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t events[],

>                        int num);

>

>  /**

> @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

>  void odp_schedule_release_atomic(void);

>

>  /**

> + * Release the atomic context associated with the events specified by evnets[].

> + *

> + * This call is similar to odp_schedule_release_atomic call which releases context

> + * associated with the events defined by events.

> + * @param events  Input event array for which atomic context is to be released

> + * @param num     Number of events

> + *

> + */

> +void odp_schedule_release_atomic_contexts(odp_event_t events[], num);

> +

> +/**

>   * Release the current ordered context

>   *

>   * This call is valid only for source queues with ordered synchronization. It

> @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

>  void odp_schedule_release_ordered(void);

>

>  /**

> + * Release the ordered context associated with the events specified by evnets[].

> + *

> + * This call is similar to odp_schedule_release_ordered call which releases context

> + * associated with the events defined by events.

> + * @param events  Input event array for which ordered context is to be released

> + * @param num     Number of events

> + *

> + */

> +void odp_schedule_release_ordered_contexts(odp_event_t events[], num);

> +

> +/**

>   * Prefetch events for next schedule call

>   *

>   * Hint the scheduler that application is about to finish processing the current

> @@ -348,11 +370,13 @@ int odp_schedule_group_info(odp_schedule_group_t group,

>   * allowing order to maintained on a more granular basis. If an ordered lock

>   * is used multiple times in the same ordered context results are undefined.

>   *

> + * @param source_queue Queue handle from which event is recieved and lock to be

> + *                    aquired.

>   * @param lock_index Index of the ordered lock in the current context to be

>   *                   acquired. Must be in the range 0..odp_queue_lock_count()

>   *                   - 1

>   */

> -void odp_schedule_order_lock(unsigned lock_index);

> +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned lock_index);

>

>  /**

>   * Release ordered context lock

> @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned lock_index);

>   * This call is valid only when holding an ordered synchronization context.

>   * Release a previously locked ordered context lock.

>   *

> + * @param source_queue Queue handle from which event is recieved and lock to be

> + *                    aquired.

>   * @param lock_index Index of the ordered lock in the current context to be

>   *                   released. Results are undefined if the caller does not

>   *                   hold this lock. Must be in the range

>   *                   0..odp_queue_lock_count() - 1

>   */

> -void odp_schedule_order_unlock(unsigned lock_index);

> +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned lock_index);

>

>  /**

>   * @}

> --

> 2.9.3

>
Nikhil Agarwal Feb. 7, 2017, 1:22 p.m. UTC | #2
On 3 February 2017 at 19:20, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> Do we have any performance comparison between using this new API

> compared to using the existing API and the SW behind

> odp_schedule_multi() sorts the events (if necessary) and only returns

> events for one queue at a time (keeping the others as prescheduled

> events)?

>

Keeping highest priority packets on-hold while other cores are free to
process them may add significant latency as you are not sure when the
particular thread will call odp_schedule again.

>

> If we don't know that this new API actually improves performance

> (significantly) compared to using the existing API enhanced with some

> under-the-hood fixup, I don't think we have a good case for changing

> the API.

>

Comparing one packet per call vs multiple(8 packets) gives a boost of
around ~30% in performance.


>

>

> On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org>

> wrote:

> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

> > ---

> >  include/odp/api/spec/schedule.h | 36 ++++++++++++++++++++++++++++++

> +-----

> >  1 file changed, 31 insertions(+), 5 deletions(-)

> >

> > diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/

> schedule.h

> > index f8fed17..6e8d759 100644

> > --- a/include/odp/api/spec/schedule.h

> > +++ b/include/odp/api/spec/schedule.h

> > @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

> wait);

> >   * originate from the same source queue and share the same scheduler

> >   * synchronization context.

> >   *

> > - * @param from    Output parameter for the source queue (where the

> event was

> > - *                dequeued from). Ignored if NULL.

> > + * @param from    Output parameter for the source queues array (where

> the event

> > + *               were dequeued from). Ignored if NULL.

> >   * @param wait    Minimum time to wait for an event. Waits infinitely,

> if set to

> >   *                ODP_SCHED_WAIT. Does not wait, if set to

> ODP_SCHED_NO_WAIT.

> >   *                Use odp_schedule_wait_time() to convert time to other

> wait

> > @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

> wait);

> >   *

> >   * @return Number of events outputted (0 ... num)

> >   */

> > -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t

> events[],

> > +int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t

> events[],

> >                        int num);

> >

> >  /**

> > @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

> >  void odp_schedule_release_atomic(void);

> >

> >  /**

> > + * Release the atomic context associated with the events specified by

> evnets[].

> > + *

> > + * This call is similar to odp_schedule_release_atomic call which

> releases context

> > + * associated with the events defined by events.

> > + * @param events  Input event array for which atomic context is to be

> released

> > + * @param num     Number of events

> > + *

> > + */

> > +void odp_schedule_release_atomic_contexts(odp_event_t events[], num);

> > +

> > +/**

> >   * Release the current ordered context

> >   *

> >   * This call is valid only for source queues with ordered

> synchronization. It

> > @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

> >  void odp_schedule_release_ordered(void);

> >

> >  /**

> > + * Release the ordered context associated with the events specified by

> evnets[].

> > + *

> > + * This call is similar to odp_schedule_release_ordered call which

> releases context

> > + * associated with the events defined by events.

> > + * @param events  Input event array for which ordered context is to be

> released

> > + * @param num     Number of events

> > + *

> > + */

> > +void odp_schedule_release_ordered_contexts(odp_event_t events[], num);

> > +

> > +/**

> >   * Prefetch events for next schedule call

> >   *

> >   * Hint the scheduler that application is about to finish processing

> the current

> > @@ -348,11 +370,13 @@ int odp_schedule_group_info(odp_schedule_group_t

> group,

> >   * allowing order to maintained on a more granular basis. If an ordered

> lock

> >   * is used multiple times in the same ordered context results are

> undefined.

> >   *

> > + * @param source_queue Queue handle from which event is recieved and

> lock to be

> > + *                    aquired.

> >   * @param lock_index Index of the ordered lock in the current context

> to be

> >   *                   acquired. Must be in the range

> 0..odp_queue_lock_count()

> >   *                   - 1

> >   */

> > -void odp_schedule_order_lock(unsigned lock_index);

> > +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned

> lock_index);

> >

> >  /**

> >   * Release ordered context lock

> > @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned lock_index);

> >   * This call is valid only when holding an ordered synchronization

> context.

> >   * Release a previously locked ordered context lock.

> >   *

> > + * @param source_queue Queue handle from which event is recieved and

> lock to be

> > + *                    aquired.

> >   * @param lock_index Index of the ordered lock in the current context

> to be

> >   *                   released. Results are undefined if the caller does

> not

> >   *                   hold this lock. Must be in the range

> >   *                   0..odp_queue_lock_count() - 1

> >   */

> > -void odp_schedule_order_unlock(unsigned lock_index);

> > +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned

> lock_index);

> >

> >  /**

> >   * @}

> > --

> > 2.9.3

> >

>
Bill Fischofer Feb. 7, 2017, 1:28 p.m. UTC | #3
On Tue, Feb 7, 2017 at 7:22 AM, Nikhil Agarwal
<nikhil.agarwal@linaro.org> wrote:
> On 3 February 2017 at 19:20, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

>

>> Do we have any performance comparison between using this new API

>> compared to using the existing API and the SW behind

>> odp_schedule_multi() sorts the events (if necessary) and only returns

>> events for one queue at a time (keeping the others as prescheduled

>> events)?

>>

> Keeping highest priority packets on-hold while other cores are free to

> process them may add significant latency as you are not sure when the

> particular thread will call odp_schedule again.


How would these other cores be prevented from processing them? If they
call odp_schedule() or odp_schedule_multi() they are going to be given
packet(s) to process.

>

>>

>> If we don't know that this new API actually improves performance

>> (significantly) compared to using the existing API enhanced with some

>> under-the-hood fixup, I don't think we have a good case for changing

>> the API.

>>

> Comparing one packet per call vs multiple(8 packets) gives a boost of

> around ~30% in performance.


We already have odp_schedule_multi() that can process multiple packets
in one call. The issue here is how often these packets are coming from
multiple queues and other cores aren't available to process them?

>

>

>>

>>

>> On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> wrote:

>> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> > ---

>> >  include/odp/api/spec/schedule.h | 36 ++++++++++++++++++++++++++++++

>> +-----

>> >  1 file changed, 31 insertions(+), 5 deletions(-)

>> >

>> > diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/

>> schedule.h

>> > index f8fed17..6e8d759 100644

>> > --- a/include/odp/api/spec/schedule.h

>> > +++ b/include/odp/api/spec/schedule.h

>> > @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

>> wait);

>> >   * originate from the same source queue and share the same scheduler

>> >   * synchronization context.

>> >   *

>> > - * @param from    Output parameter for the source queue (where the

>> event was

>> > - *                dequeued from). Ignored if NULL.

>> > + * @param from    Output parameter for the source queues array (where

>> the event

>> > + *               were dequeued from). Ignored if NULL.

>> >   * @param wait    Minimum time to wait for an event. Waits infinitely,

>> if set to

>> >   *                ODP_SCHED_WAIT. Does not wait, if set to

>> ODP_SCHED_NO_WAIT.

>> >   *                Use odp_schedule_wait_time() to convert time to other

>> wait

>> > @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

>> wait);

>> >   *

>> >   * @return Number of events outputted (0 ... num)

>> >   */

>> > -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t

>> events[],

>> > +int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t

>> events[],

>> >                        int num);

>> >

>> >  /**

>> > @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

>> >  void odp_schedule_release_atomic(void);

>> >

>> >  /**

>> > + * Release the atomic context associated with the events specified by

>> evnets[].

>> > + *

>> > + * This call is similar to odp_schedule_release_atomic call which

>> releases context

>> > + * associated with the events defined by events.

>> > + * @param events  Input event array for which atomic context is to be

>> released

>> > + * @param num     Number of events

>> > + *

>> > + */

>> > +void odp_schedule_release_atomic_contexts(odp_event_t events[], num);

>> > +

>> > +/**

>> >   * Release the current ordered context

>> >   *

>> >   * This call is valid only for source queues with ordered

>> synchronization. It

>> > @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

>> >  void odp_schedule_release_ordered(void);

>> >

>> >  /**

>> > + * Release the ordered context associated with the events specified by

>> evnets[].

>> > + *

>> > + * This call is similar to odp_schedule_release_ordered call which

>> releases context

>> > + * associated with the events defined by events.

>> > + * @param events  Input event array for which ordered context is to be

>> released

>> > + * @param num     Number of events

>> > + *

>> > + */

>> > +void odp_schedule_release_ordered_contexts(odp_event_t events[], num);

>> > +

>> > +/**

>> >   * Prefetch events for next schedule call

>> >   *

>> >   * Hint the scheduler that application is about to finish processing

>> the current

>> > @@ -348,11 +370,13 @@ int odp_schedule_group_info(odp_schedule_group_t

>> group,

>> >   * allowing order to maintained on a more granular basis. If an ordered

>> lock

>> >   * is used multiple times in the same ordered context results are

>> undefined.

>> >   *

>> > + * @param source_queue Queue handle from which event is recieved and

>> lock to be

>> > + *                    aquired.

>> >   * @param lock_index Index of the ordered lock in the current context

>> to be

>> >   *                   acquired. Must be in the range

>> 0..odp_queue_lock_count()

>> >   *                   - 1

>> >   */

>> > -void odp_schedule_order_lock(unsigned lock_index);

>> > +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned

>> lock_index);

>> >

>> >  /**

>> >   * Release ordered context lock

>> > @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned lock_index);

>> >   * This call is valid only when holding an ordered synchronization

>> context.

>> >   * Release a previously locked ordered context lock.

>> >   *

>> > + * @param source_queue Queue handle from which event is recieved and

>> lock to be

>> > + *                    aquired.

>> >   * @param lock_index Index of the ordered lock in the current context

>> to be

>> >   *                   released. Results are undefined if the caller does

>> not

>> >   *                   hold this lock. Must be in the range

>> >   *                   0..odp_queue_lock_count() - 1

>> >   */

>> > -void odp_schedule_order_unlock(unsigned lock_index);

>> > +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned

>> lock_index);

>> >

>> >  /**

>> >   * @}

>> > --

>> > 2.9.3

>> >

>>
Nikhil Agarwal Feb. 7, 2017, 1:39 p.m. UTC | #4
On 7 February 2017 at 18:58, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> On Tue, Feb 7, 2017 at 7:22 AM, Nikhil Agarwal

> <nikhil.agarwal@linaro.org> wrote:

> > On 3 February 2017 at 19:20, Ola Liljedahl <ola.liljedahl@linaro.org>

> wrote:

> >

> >> Do we have any performance comparison between using this new API

> >> compared to using the existing API and the SW behind

> >> odp_schedule_multi() sorts the events (if necessary) and only returns

> >> events for one queue at a time (keeping the others as prescheduled

> >> events)?

> >>

> > Keeping highest priority packets on-hold while other cores are free to

> > process them may add significant latency as you are not sure when the

> > particular thread will call odp_schedule again.

>

> How would these other cores be prevented from processing them? If they

> call odp_schedule() or odp_schedule_multi() they are going to be given

> packet(s) to process.

>


Since different threads may be configured for different scheduler groups
you cannot give packet buffered for one thread to other.(Unless some
scheduling logic is also introduced in software). will have to buffer
different pre-scheduled events for different threads.

>

> >

> >>

> >> If we don't know that this new API actually improves performance

> >> (significantly) compared to using the existing API enhanced with some

> >> under-the-hood fixup, I don't think we have a good case for changing

> >> the API.

> >>

> > Comparing one packet per call vs multiple(8 packets) gives a boost of

> > around ~30% in performance.

>

> We already have odp_schedule_multi() that can process multiple packets

> in one call. The issue here is how often these packets are coming from

> multiple queues and other cores aren't available to process them?

>

If we don't use classifier generally hashing is used to distribute packets
among multiple queues of a pktio. So packet will arrive from multiple
queues most often.

>

> >

> >

> >>

> >>

> >> On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org>

> >> wrote:

> >> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

> >> > ---

> >> >  include/odp/api/spec/schedule.h | 36 ++++++++++++++++++++++++++++++

> >> +-----

> >> >  1 file changed, 31 insertions(+), 5 deletions(-)

> >> >

> >> > diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/

> >> schedule.h

> >> > index f8fed17..6e8d759 100644

> >> > --- a/include/odp/api/spec/schedule.h

> >> > +++ b/include/odp/api/spec/schedule.h

> >> > @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from,

> uint64_t

> >> wait);

> >> >   * originate from the same source queue and share the same scheduler

> >> >   * synchronization context.

> >> >   *

> >> > - * @param from    Output parameter for the source queue (where the

> >> event was

> >> > - *                dequeued from). Ignored if NULL.

> >> > + * @param from    Output parameter for the source queues array (where

> >> the event

> >> > + *               were dequeued from). Ignored if NULL.

> >> >   * @param wait    Minimum time to wait for an event. Waits

> infinitely,

> >> if set to

> >> >   *                ODP_SCHED_WAIT. Does not wait, if set to

> >> ODP_SCHED_NO_WAIT.

> >> >   *                Use odp_schedule_wait_time() to convert time to

> other

> >> wait

> >> > @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from,

> uint64_t

> >> wait);

> >> >   *

> >> >   * @return Number of events outputted (0 ... num)

> >> >   */

> >> > -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t

> >> events[],

> >> > +int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t

> >> events[],

> >> >                        int num);

> >> >

> >> >  /**

> >> > @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

> >> >  void odp_schedule_release_atomic(void);

> >> >

> >> >  /**

> >> > + * Release the atomic context associated with the events specified by

> >> evnets[].

> >> > + *

> >> > + * This call is similar to odp_schedule_release_atomic call which

> >> releases context

> >> > + * associated with the events defined by events.

> >> > + * @param events  Input event array for which atomic context is to be

> >> released

> >> > + * @param num     Number of events

> >> > + *

> >> > + */

> >> > +void odp_schedule_release_atomic_contexts(odp_event_t events[],

> num);

> >> > +

> >> > +/**

> >> >   * Release the current ordered context

> >> >   *

> >> >   * This call is valid only for source queues with ordered

> >> synchronization. It

> >> > @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

> >> >  void odp_schedule_release_ordered(void);

> >> >

> >> >  /**

> >> > + * Release the ordered context associated with the events specified

> by

> >> evnets[].

> >> > + *

> >> > + * This call is similar to odp_schedule_release_ordered call which

> >> releases context

> >> > + * associated with the events defined by events.

> >> > + * @param events  Input event array for which ordered context is to

> be

> >> released

> >> > + * @param num     Number of events

> >> > + *

> >> > + */

> >> > +void odp_schedule_release_ordered_contexts(odp_event_t events[],

> num);

> >> > +

> >> > +/**

> >> >   * Prefetch events for next schedule call

> >> >   *

> >> >   * Hint the scheduler that application is about to finish processing

> >> the current

> >> > @@ -348,11 +370,13 @@ int odp_schedule_group_info(odp_

> schedule_group_t

> >> group,

> >> >   * allowing order to maintained on a more granular basis. If an

> ordered

> >> lock

> >> >   * is used multiple times in the same ordered context results are

> >> undefined.

> >> >   *

> >> > + * @param source_queue Queue handle from which event is recieved and

> >> lock to be

> >> > + *                    aquired.

> >> >   * @param lock_index Index of the ordered lock in the current context

> >> to be

> >> >   *                   acquired. Must be in the range

> >> 0..odp_queue_lock_count()

> >> >   *                   - 1

> >> >   */

> >> > -void odp_schedule_order_lock(unsigned lock_index);

> >> > +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned

> >> lock_index);

> >> >

> >> >  /**

> >> >   * Release ordered context lock

> >> > @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned

> lock_index);

> >> >   * This call is valid only when holding an ordered synchronization

> >> context.

> >> >   * Release a previously locked ordered context lock.

> >> >   *

> >> > + * @param source_queue Queue handle from which event is recieved and

> >> lock to be

> >> > + *                    aquired.

> >> >   * @param lock_index Index of the ordered lock in the current context

> >> to be

> >> >   *                   released. Results are undefined if the caller

> does

> >> not

> >> >   *                   hold this lock. Must be in the range

> >> >   *                   0..odp_queue_lock_count() - 1

> >> >   */

> >> > -void odp_schedule_order_unlock(unsigned lock_index);

> >> > +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned

> >> lock_index);

> >> >

> >> >  /**

> >> >   * @}

> >> > --

> >> > 2.9.3

> >> >

> >>

>
Bill Fischofer Feb. 7, 2017, 1:58 p.m. UTC | #5
On Tue, Feb 7, 2017 at 7:39 AM, Nikhil Agarwal
<nikhil.agarwal@linaro.org> wrote:
>

>

> On 7 February 2017 at 18:58, Bill Fischofer <bill.fischofer@linaro.org>

> wrote:

>>

>> On Tue, Feb 7, 2017 at 7:22 AM, Nikhil Agarwal

>> <nikhil.agarwal@linaro.org> wrote:

>> > On 3 February 2017 at 19:20, Ola Liljedahl <ola.liljedahl@linaro.org>

>> > wrote:

>> >

>> >> Do we have any performance comparison between using this new API

>> >> compared to using the existing API and the SW behind

>> >> odp_schedule_multi() sorts the events (if necessary) and only returns

>> >> events for one queue at a time (keeping the others as prescheduled

>> >> events)?

>> >>

>> > Keeping highest priority packets on-hold while other cores are free to

>> > process them may add significant latency as you are not sure when the

>> > particular thread will call odp_schedule again.

>>

>> How would these other cores be prevented from processing them? If they

>> call odp_schedule() or odp_schedule_multi() they are going to be given

>> packet(s) to process.

>

>

> Since different threads may be configured for different scheduler groups you

> cannot give packet buffered for one thread to other.(Unless some scheduling

> logic is also introduced in software). will have to buffer different

> pre-scheduled events for different threads.


This is an important point and may be the key here. Currently ODP
assumes that there are no restrictions on assigning threads running on
different cores to scheduler groups, so one can easily have threads
running on different cores belonging to the same scheduler group.
Certainly this is the case with ODP_SCHED_GROUP_ALL and
ODP_SCHED_GROUP_WORKER, which are expected to represent most
queues/threads. If your platform links the concept of scheduler groups
to specific cores then this would explain some of the motivation here.
Is this NUMA related or is something else going on here?

>>

>>

>> >

>> >>

>> >> If we don't know that this new API actually improves performance

>> >> (significantly) compared to using the existing API enhanced with some

>> >> under-the-hood fixup, I don't think we have a good case for changing

>> >> the API.

>> >>

>> > Comparing one packet per call vs multiple(8 packets) gives a boost of

>> > around ~30% in performance.

>>

>> We already have odp_schedule_multi() that can process multiple packets

>> in one call. The issue here is how often these packets are coming from

>> multiple queues and other cores aren't available to process them?

>

> If we don't use classifier generally hashing is used to distribute packets

> among multiple queues of a pktio. So packet will arrive from multiple queues

> most often.


The ODP classifier is the preferred model for scheduled packet
processing as hashing provides much coarser levels of control that is
perhaps better suited to a polling I/O model. It's not clear why you'd
want to hash and then use scheduled I/O. Can you elaborate here?

>>

>>

>> >

>> >

>> >>

>> >>

>> >> On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> >> wrote:

>> >> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> >> > ---

>> >> >  include/odp/api/spec/schedule.h | 36 ++++++++++++++++++++++++++++++

>> >> +-----

>> >> >  1 file changed, 31 insertions(+), 5 deletions(-)

>> >> >

>> >> > diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/

>> >> schedule.h

>> >> > index f8fed17..6e8d759 100644

>> >> > --- a/include/odp/api/spec/schedule.h

>> >> > +++ b/include/odp/api/spec/schedule.h

>> >> > @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from,

>> >> > uint64_t

>> >> wait);

>> >> >   * originate from the same source queue and share the same scheduler

>> >> >   * synchronization context.

>> >> >   *

>> >> > - * @param from    Output parameter for the source queue (where the

>> >> event was

>> >> > - *                dequeued from). Ignored if NULL.

>> >> > + * @param from    Output parameter for the source queues array

>> >> > (where

>> >> the event

>> >> > + *               were dequeued from). Ignored if NULL.

>> >> >   * @param wait    Minimum time to wait for an event. Waits

>> >> > infinitely,

>> >> if set to

>> >> >   *                ODP_SCHED_WAIT. Does not wait, if set to

>> >> ODP_SCHED_NO_WAIT.

>> >> >   *                Use odp_schedule_wait_time() to convert time to

>> >> > other

>> >> wait

>> >> > @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from,

>> >> > uint64_t

>> >> wait);

>> >> >   *

>> >> >   * @return Number of events outputted (0 ... num)

>> >> >   */

>> >> > -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t

>> >> events[],

>> >> > +int odp_schedule_multi(odp_queue_t from[], uint64_t wait,

>> >> > odp_event_t

>> >> events[],

>> >> >                        int num);

>> >> >

>> >> >  /**

>> >> > @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

>> >> >  void odp_schedule_release_atomic(void);

>> >> >

>> >> >  /**

>> >> > + * Release the atomic context associated with the events specified

>> >> > by

>> >> evnets[].

>> >> > + *

>> >> > + * This call is similar to odp_schedule_release_atomic call which

>> >> releases context

>> >> > + * associated with the events defined by events.

>> >> > + * @param events  Input event array for which atomic context is to

>> >> > be

>> >> released

>> >> > + * @param num     Number of events

>> >> > + *

>> >> > + */

>> >> > +void odp_schedule_release_atomic_contexts(odp_event_t events[],

>> >> > num);

>> >> > +

>> >> > +/**

>> >> >   * Release the current ordered context

>> >> >   *

>> >> >   * This call is valid only for source queues with ordered

>> >> synchronization. It

>> >> > @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

>> >> >  void odp_schedule_release_ordered(void);

>> >> >

>> >> >  /**

>> >> > + * Release the ordered context associated with the events specified

>> >> > by

>> >> evnets[].

>> >> > + *

>> >> > + * This call is similar to odp_schedule_release_ordered call which

>> >> releases context

>> >> > + * associated with the events defined by events.

>> >> > + * @param events  Input event array for which ordered context is to

>> >> > be

>> >> released

>> >> > + * @param num     Number of events

>> >> > + *

>> >> > + */

>> >> > +void odp_schedule_release_ordered_contexts(odp_event_t events[],

>> >> > num);

>> >> > +

>> >> > +/**

>> >> >   * Prefetch events for next schedule call

>> >> >   *

>> >> >   * Hint the scheduler that application is about to finish processing

>> >> the current

>> >> > @@ -348,11 +370,13 @@ int

>> >> > odp_schedule_group_info(odp_schedule_group_t

>> >> group,

>> >> >   * allowing order to maintained on a more granular basis. If an

>> >> > ordered

>> >> lock

>> >> >   * is used multiple times in the same ordered context results are

>> >> undefined.

>> >> >   *

>> >> > + * @param source_queue Queue handle from which event is recieved and

>> >> lock to be

>> >> > + *                    aquired.

>> >> >   * @param lock_index Index of the ordered lock in the current

>> >> > context

>> >> to be

>> >> >   *                   acquired. Must be in the range

>> >> 0..odp_queue_lock_count()

>> >> >   *                   - 1

>> >> >   */

>> >> > -void odp_schedule_order_lock(unsigned lock_index);

>> >> > +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned

>> >> lock_index);

>> >> >

>> >> >  /**

>> >> >   * Release ordered context lock

>> >> > @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned

>> >> > lock_index);

>> >> >   * This call is valid only when holding an ordered synchronization

>> >> context.

>> >> >   * Release a previously locked ordered context lock.

>> >> >   *

>> >> > + * @param source_queue Queue handle from which event is recieved and

>> >> lock to be

>> >> > + *                    aquired.

>> >> >   * @param lock_index Index of the ordered lock in the current

>> >> > context

>> >> to be

>> >> >   *                   released. Results are undefined if the caller

>> >> > does

>> >> not

>> >> >   *                   hold this lock. Must be in the range

>> >> >   *                   0..odp_queue_lock_count() - 1

>> >> >   */

>> >> > -void odp_schedule_order_unlock(unsigned lock_index);

>> >> > +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned

>> >> lock_index);

>> >> >

>> >> >  /**

>> >> >   * @}

>> >> > --

>> >> > 2.9.3

>> >> >

>> >>

>

>
Ola Liljedahl Feb. 7, 2017, 3:47 p.m. UTC | #6
On 7 February 2017 at 14:22, Nikhil Agarwal <nikhil.agarwal@linaro.org> wrote:
>

>

> On 3 February 2017 at 19:20, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

>>

>> Do we have any performance comparison between using this new API

>> compared to using the existing API and the SW behind

>> odp_schedule_multi() sorts the events (if necessary) and only returns

>> events for one queue at a time (keeping the others as prescheduled

>> events)?

>

> Keeping highest priority packets on-hold while other cores are free to

> process them may add significant latency as you are not sure when the

> particular thread will call odp_schedule again.

>>

>>

>> If we don't know that this new API actually improves performance

>> (significantly) compared to using the existing API enhanced with some

>> under-the-hood fixup, I don't think we have a good case for changing

>> the API.

>

> Comparing one packet per call vs multiple(8 packets) gives a boost of around

> ~30% in performance.

But will these N (e.g. 8) packets come from N different queues so that
odp_schedule_multi() can only return 1 packet at a time to the
application?

What is the expected average number of events per source queue when
getting events from the HW?

If every event comes from a different queue, is there any benefit from
batching except to avoid some SW overhead in odp_schedule_multi()? It
seems like there would be little memory access locality if every event
comes from a new queue and thus requires a different type of
processing and accesses different tables.

>

>>

>>

>>

>> On 3 February 2017 at 12:50, Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> wrote:

>> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@linaro.org>

>> > ---

>> >  include/odp/api/spec/schedule.h | 36

>> > +++++++++++++++++++++++++++++++-----

>> >  1 file changed, 31 insertions(+), 5 deletions(-)

>> >

>> > diff --git a/include/odp/api/spec/schedule.h

>> > b/include/odp/api/spec/schedule.h

>> > index f8fed17..6e8d759 100644

>> > --- a/include/odp/api/spec/schedule.h

>> > +++ b/include/odp/api/spec/schedule.h

>> > @@ -118,8 +118,8 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

>> > wait);

>> >   * originate from the same source queue and share the same scheduler

>> >   * synchronization context.

>> >   *

>> > - * @param from    Output parameter for the source queue (where the

>> > event was

>> > - *                dequeued from). Ignored if NULL.

>> > + * @param from    Output parameter for the source queues array (where

>> > the event

>> > + *               were dequeued from). Ignored if NULL.

>> >   * @param wait    Minimum time to wait for an event. Waits infinitely,

>> > if set to

>> >   *                ODP_SCHED_WAIT. Does not wait, if set to

>> > ODP_SCHED_NO_WAIT.

>> >   *                Use odp_schedule_wait_time() to convert time to other

>> > wait

>> > @@ -129,7 +129,7 @@ odp_event_t odp_schedule(odp_queue_t *from, uint64_t

>> > wait);

>> >   *

>> >   * @return Number of events outputted (0 ... num)

>> >   */

>> > -int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t

>> > events[],

>> > +int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t

>> > events[],

>> >                        int num);

>> >

>> >  /**

>> > @@ -170,6 +170,17 @@ void odp_schedule_resume(void);

>> >  void odp_schedule_release_atomic(void);

>> >

>> >  /**

>> > + * Release the atomic context associated with the events specified by

>> > evnets[].

>> > + *

>> > + * This call is similar to odp_schedule_release_atomic call which

>> > releases context

>> > + * associated with the events defined by events.

>> > + * @param events  Input event array for which atomic context is to be

>> > released

>> > + * @param num     Number of events

>> > + *

>> > + */

>> > +void odp_schedule_release_atomic_contexts(odp_event_t events[], num);

>> > +

>> > +/**

>> >   * Release the current ordered context

>> >   *

>> >   * This call is valid only for source queues with ordered

>> > synchronization. It

>> > @@ -187,6 +198,17 @@ void odp_schedule_release_atomic(void);

>> >  void odp_schedule_release_ordered(void);

>> >

>> >  /**

>> > + * Release the ordered context associated with the events specified by

>> > evnets[].

>> > + *

>> > + * This call is similar to odp_schedule_release_ordered call which

>> > releases context

>> > + * associated with the events defined by events.

>> > + * @param events  Input event array for which ordered context is to be

>> > released

>> > + * @param num     Number of events

>> > + *

>> > + */

>> > +void odp_schedule_release_ordered_contexts(odp_event_t events[], num);

>> > +

>> > +/**

>> >   * Prefetch events for next schedule call

>> >   *

>> >   * Hint the scheduler that application is about to finish processing

>> > the current

>> > @@ -348,11 +370,13 @@ int odp_schedule_group_info(odp_schedule_group_t

>> > group,

>> >   * allowing order to maintained on a more granular basis. If an ordered

>> > lock

>> >   * is used multiple times in the same ordered context results are

>> > undefined.

>> >   *

>> > + * @param source_queue Queue handle from which event is recieved and

>> > lock to be

>> > + *                    aquired.

>> >   * @param lock_index Index of the ordered lock in the current context

>> > to be

>> >   *                   acquired. Must be in the range

>> > 0..odp_queue_lock_count()

>> >   *                   - 1

>> >   */

>> > -void odp_schedule_order_lock(unsigned lock_index);

>> > +void odp_schedule_order_lock(odp_queue_t source_queue, unsigned

>> > lock_index);

>> >

>> >  /**

>> >   * Release ordered context lock

>> > @@ -360,12 +384,14 @@ void odp_schedule_order_lock(unsigned lock_index);

>> >   * This call is valid only when holding an ordered synchronization

>> > context.

>> >   * Release a previously locked ordered context lock.

>> >   *

>> > + * @param source_queue Queue handle from which event is recieved and

>> > lock to be

>> > + *                    aquired.

>> >   * @param lock_index Index of the ordered lock in the current context

>> > to be

>> >   *                   released. Results are undefined if the caller does

>> > not

>> >   *                   hold this lock. Must be in the range

>> >   *                   0..odp_queue_lock_count() - 1

>> >   */

>> > -void odp_schedule_order_unlock(unsigned lock_index);

>> > +void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned

>> > lock_index);

>> >

>> >  /**

>> >   * @}

>> > --

>> > 2.9.3

>> >

>

>
diff mbox

Patch

diff --git a/include/odp/api/spec/schedule.h b/include/odp/api/spec/schedule.h
index f8fed17..6e8d759 100644
--- a/include/odp/api/spec/schedule.h
+++ b/include/odp/api/spec/schedule.h
@@ -118,8 +118,8 @@  odp_event_t odp_schedule(odp_queue_t *from, uint64_t wait);
  * originate from the same source queue and share the same scheduler
  * synchronization context.
  *
- * @param from    Output parameter for the source queue (where the event was
- *                dequeued from). Ignored if NULL.
+ * @param from    Output parameter for the source queues array (where the event
+ *		  were dequeued from). Ignored if NULL.
  * @param wait    Minimum time to wait for an event. Waits infinitely, if set to
  *                ODP_SCHED_WAIT. Does not wait, if set to ODP_SCHED_NO_WAIT.
  *                Use odp_schedule_wait_time() to convert time to other wait
@@ -129,7 +129,7 @@  odp_event_t odp_schedule(odp_queue_t *from, uint64_t wait);
  *
  * @return Number of events outputted (0 ... num)
  */
-int odp_schedule_multi(odp_queue_t *from, uint64_t wait, odp_event_t events[],
+int odp_schedule_multi(odp_queue_t from[], uint64_t wait, odp_event_t events[],
 		       int num);
 
 /**
@@ -170,6 +170,17 @@  void odp_schedule_resume(void);
 void odp_schedule_release_atomic(void);
 
 /**
+ * Release the atomic context associated with the events specified by evnets[].
+ *
+ * This call is similar to odp_schedule_release_atomic call which releases context
+ * associated with the events defined by events.
+ * @param events  Input event array for which atomic context is to be released
+ * @param num     Number of events
+ *
+ */
+void odp_schedule_release_atomic_contexts(odp_event_t events[], num);
+
+/**
  * Release the current ordered context
  *
  * This call is valid only for source queues with ordered synchronization. It
@@ -187,6 +198,17 @@  void odp_schedule_release_atomic(void);
 void odp_schedule_release_ordered(void);
 
 /**
+ * Release the ordered context associated with the events specified by evnets[].
+ *
+ * This call is similar to odp_schedule_release_ordered call which releases context
+ * associated with the events defined by events.
+ * @param events  Input event array for which ordered context is to be released
+ * @param num     Number of events
+ *
+ */
+void odp_schedule_release_ordered_contexts(odp_event_t events[], num);
+
+/**
  * Prefetch events for next schedule call
  *
  * Hint the scheduler that application is about to finish processing the current
@@ -348,11 +370,13 @@  int odp_schedule_group_info(odp_schedule_group_t group,
  * allowing order to maintained on a more granular basis. If an ordered lock
  * is used multiple times in the same ordered context results are undefined.
  *
+ * @param source_queue Queue handle from which event is recieved and lock to be
+ * 	 	       aquired.
  * @param lock_index Index of the ordered lock in the current context to be
  *                   acquired. Must be in the range 0..odp_queue_lock_count()
  *                   - 1
  */
-void odp_schedule_order_lock(unsigned lock_index);
+void odp_schedule_order_lock(odp_queue_t source_queue, unsigned lock_index);
 
 /**
  * Release ordered context lock
@@ -360,12 +384,14 @@  void odp_schedule_order_lock(unsigned lock_index);
  * This call is valid only when holding an ordered synchronization context.
  * Release a previously locked ordered context lock.
  *
+ * @param source_queue Queue handle from which event is recieved and lock to be
+ * 	 	       aquired.
  * @param lock_index Index of the ordered lock in the current context to be
  *                   released. Results are undefined if the caller does not
  *                   hold this lock. Must be in the range
  *                   0..odp_queue_lock_count() - 1
  */
-void odp_schedule_order_unlock(unsigned lock_index);
+void odp_schedule_order_unlock(odp_queue_t source_queue, unsigned lock_index);
 
 /**
  * @}