mbox series

[RFC,v6,0/6] lib/ring: APIs to support custom element size

Message ID 20191021002300.26497-1-honnappa.nagarahalli@arm.com
Headers show
Series lib/ring: APIs to support custom element size | expand

Message

Honnappa Nagarahalli Oct. 21, 2019, 12:22 a.m. UTC
The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation. The patch adds same performance tests that are run
for existing APIs. This allows for performance comparison.

I also tested with memcpy. x86 shows significant improvements on bulk
and burst tests. On the Arm platform, I used, there is a drop of
4% to 6% in few tests. May be this is something that we can explore
later.

Note that this version skips changes to other libraries as I would
like to get an agreement on the implementation from the community.
They will be added once there is agreement on the rte_ring changes.

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for configurable element size ring
  test/ring: add perf tests for configurable element size ring
  lib/ring: copy ring elements using memcpy partially
  lib/ring: improved copy function to copy ring elements

 app/test/Makefile                    |   2 +
 app/test/meson.build                 |   2 +
 app/test/test_ring_elem.c            | 859 +++++++++++++++++++++++++++
 app/test/test_ring_perf.c            |  22 +-
 app/test/test_ring_perf_elem.c       | 419 +++++++++++++
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   4 +
 lib/librte_ring/rte_ring.c           |  34 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 818 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 11 files changed, 2147 insertions(+), 19 deletions(-)
 create mode 100644 app/test/test_ring_elem.c
 create mode 100644 app/test/test_ring_perf_elem.c
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1

Comments

Olivier Matz Oct. 23, 2019, 9:48 a.m. UTC | #1
Hi Honnappa,

On Sun, Oct 20, 2019 at 07:22:54PM -0500, Honnappa Nagarahalli wrote:
> The current rte_ring hard-codes the type of the ring element to 'void *',

> hence the size of the element is hard-coded to 32b/64b. Since the ring

> element type is not an input to rte_ring APIs, it results in couple

> of issues:

> 

> 1) If an application requires to store an element which is not 64b, it

>    needs to write its own ring APIs similar to rte_event_ring APIs. This

>    creates additional burden on the programmers, who end up making

>    work-arounds and often waste memory.

> 2) If there are multiple libraries that store elements of the same

>    type, currently they would have to write their own rte_ring APIs. This

>    results in code duplication.

> 

> This patch adds new APIs to support configurable ring element size.

> The APIs support custom element sizes by allowing to define the ring

> element to be a multiple of 32b.

> 

> The aim is to achieve same performance as the existing ring

> implementation. The patch adds same performance tests that are run

> for existing APIs. This allows for performance comparison.

> 

> I also tested with memcpy. x86 shows significant improvements on bulk

> and burst tests. On the Arm platform, I used, there is a drop of

> 4% to 6% in few tests. May be this is something that we can explore

> later.

> 

> Note that this version skips changes to other libraries as I would

> like to get an agreement on the implementation from the community.

> They will be added once there is agreement on the rte_ring changes.

> 

> v6

>  - Labelled as RFC to indicate the better status

>  - Added unit tests to test the rte_ring_xxx_elem APIs

>  - Corrected 'macro based partial memcpy' (5/6) patch

>  - Added Konstantin's method after correction (6/6)

>  - Check Patch shows significant warnings and errors mainly due

>    copying code from existing test cases. None of them are harmful.

>    I will fix them once we have an agreement.

> 

> v5

>  - Use memcpy for chunks of 32B (Konstantin).

>  - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available

>    to compare the results easily.

>  - Copying without memcpy is also available in 1/3, if anyone wants to

>    experiment on their platform.

>  - Added other platform owners to test on their respective platforms.

> 

> v4

>  - Few fixes after more performance testing

> 

> v3

>  - Removed macro-fest and used inline functions

>    (Stephen, Bruce)

> 

> v2

>  - Change Event Ring implementation to use ring templates

>    (Jerin, Pavan)

> 

> Honnappa Nagarahalli (6):

>   test/ring: use division for cycle count calculation

>   lib/ring: apis to support configurable element size

>   test/ring: add functional tests for configurable element size ring

>   test/ring: add perf tests for configurable element size ring

>   lib/ring: copy ring elements using memcpy partially

>   lib/ring: improved copy function to copy ring elements

> 

>  app/test/Makefile                    |   2 +

>  app/test/meson.build                 |   2 +

>  app/test/test_ring_elem.c            | 859 +++++++++++++++++++++++++++

>  app/test/test_ring_perf.c            |  22 +-

>  app/test/test_ring_perf_elem.c       | 419 +++++++++++++

>  lib/librte_ring/Makefile             |   3 +-

>  lib/librte_ring/meson.build          |   4 +

>  lib/librte_ring/rte_ring.c           |  34 +-

>  lib/librte_ring/rte_ring.h           |   1 +

>  lib/librte_ring/rte_ring_elem.h      | 818 +++++++++++++++++++++++++

>  lib/librte_ring/rte_ring_version.map |   2 +

>  11 files changed, 2147 insertions(+), 19 deletions(-)

>  create mode 100644 app/test/test_ring_elem.c

>  create mode 100644 app/test/test_ring_perf_elem.c

>  create mode 100644 lib/librte_ring/rte_ring_elem.h


Sorry, I come a day after the fair.

I have only few comments on the shape (I'll reply to individual
patches). On the substance, it looks good to me. I also feel this
version is much better than the template-based versions.

Thanks
Olivier