[PATCHv3] configure.ac check for atomic operations support

Message ID	1419326012-10666-1-git-send-email-maxim.uvarov@linaro.org
State	New
Headers	show Return-Path: <patchwork-forward+bncBDX25FHYWQNBBWHE4SSAKGQEQJ2UCTA@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.45 as permitted sender) client-ip=209.85.215.45; Received-SPF: none (google.com: lng-odp-bounces@lists.linaro.org does not designate permitted sender hosts) client-ip=54.225.227.206; From: Maxim Uvarov <maxim.uvarov@linaro.org> To: lng-odp@lists.linaro.org Date: Tue, 23 Dec 2014 12:13:32 +0300 Message-Id: <1419326012-10666-1-git-send-email-maxim.uvarov@linaro.org> Subject: [lng-odp] [PATCHv3] configure.ac check for atomic operations support Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lng-odp-bounces@lists.linaro.org Sender: lng-odp-bounces@lists.linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org

Maxim Uvarov Dec. 23, 2014, 9:13 a.m. UTC

Odp atomic operations based on compiler build-ins. Make
sure that compiler supports such operation at configure
stage.

Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
---
 v3: make atomic checks platfrom specific

 configure.ac                           |  4 +++-
 platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 platform/linux-generic/m4/configure.m4

Ola Liljedahl Dec. 23, 2014, 1:26 p.m. UTC | #1

On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
> Odp atomic operations based on compiler build-ins. Make
> sure that compiler supports such operation at configure
> stage.
>
> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
> ---
>  v3: make atomic checks platfrom specific
>
>  configure.ac                           |  4 +++-
>  platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 platform/linux-generic/m4/configure.m4
>
> diff --git a/configure.ac b/configure.ac
> index 377e8be..f55beca 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform],
>      [AS_HELP_STRING([--with-platform=prefix],
>          [Select platform to be used, default linux-generic])],
>      [],
> -    [with_platform=linux-generic])
> +    [with_platform=linux-generic
> +     m4_include([./platform/linux-generic/m4/configure.m4])
> +    ])
>
>  AC_SUBST([with_platform])
>
> diff --git a/platform/linux-generic/m4/configure.m4 b/platform/linux-generic/m4/configure.m4
> new file mode 100644
> index 0000000..bfb1fb0
> --- /dev/null
> +++ b/platform/linux-generic/m4/configure.m4
> @@ -0,0 +1,18 @@
> +AC_MSG_CHECKING(for GCC atomic builtins)
> +AC_LINK_IFELSE(
> +    [AC_LANG_SOURCE(
> +      [[#include <stdint.h>
> +        int main() {
> +        uint32_t v = 1;
> +        __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED);
> +        __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED);
> +        __atomic_store_n(&v, 1, __ATOMIC_RELAXED);
> +        __atomic_load_n(&v, __ATOMIC_RELAXED);
> +        return 0;
> +        }
> +    ]])],
> +    AC_MSG_RESULT(yes),
> +    AC_MSG_RESULT(no)
> +    echo "Atomic operation are not supported by your compiller."
> +    echo "Use newer version. For gcc > 4.7.3"
Why >4.7.3? Minor releases do not introduce major new features.
__atomic support was introduced on GCC 4.7.0.


> +    exit -1)
> --
> 1.8.5.1.163.gd7aced9
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

Maxim Uvarov Dec. 23, 2014, 3:16 p.m. UTC | #2

On 12/23/2014 04:26 PM, Ola Liljedahl wrote:
> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>> Odp atomic operations based on compiler build-ins. Make
>> sure that compiler supports such operation at configure
>> stage.
>>
>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
>> ---
>>   v3: make atomic checks platfrom specific
>>
>>   configure.ac                           |  4 +++-
>>   platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>   create mode 100644 platform/linux-generic/m4/configure.m4
>>
>> diff --git a/configure.ac b/configure.ac
>> index 377e8be..f55beca 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform],
>>       [AS_HELP_STRING([--with-platform=prefix],
>>           [Select platform to be used, default linux-generic])],
>>       [],
>> -    [with_platform=linux-generic])
>> +    [with_platform=linux-generic
>> +     m4_include([./platform/linux-generic/m4/configure.m4])
>> +    ])
>>
>>   AC_SUBST([with_platform])
>>
>> diff --git a/platform/linux-generic/m4/configure.m4 b/platform/linux-generic/m4/configure.m4
>> new file mode 100644
>> index 0000000..bfb1fb0
>> --- /dev/null
>> +++ b/platform/linux-generic/m4/configure.m4
>> @@ -0,0 +1,18 @@
>> +AC_MSG_CHECKING(for GCC atomic builtins)
>> +AC_LINK_IFELSE(
>> +    [AC_LANG_SOURCE(
>> +      [[#include <stdint.h>
>> +        int main() {
>> +        uint32_t v = 1;
>> +        __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED);
>> +        __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED);
>> +        __atomic_store_n(&v, 1, __ATOMIC_RELAXED);
>> +        __atomic_load_n(&v, __ATOMIC_RELAXED);
>> +        return 0;
>> +        }
>> +    ]])],
>> +    AC_MSG_RESULT(yes),
>> +    AC_MSG_RESULT(no)
>> +    echo "Atomic operation are not supported by your compiller."
>> +    echo "Use newer version. For gcc > 4.7.3"
> Why >4.7.3? Minor releases do not introduce major new features.
> __atomic support was introduced on GCC 4.7.0.
I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. 
So I just set tested value.
I'm ok to change it to 4.7.0.

Maxim.

>
>> +    exit -1)
>> --
>> 1.8.5.1.163.gd7aced9
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp

Ola Liljedahl Dec. 23, 2014, 3:26 p.m. UTC | #3

On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
> On 12/23/2014 04:26 PM, Ola Liljedahl wrote:
>>
>> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org>
>> wrote:
>>>
>>> Odp atomic operations based on compiler build-ins. Make
>>> sure that compiler supports such operation at configure
>>> stage.
>>>
>>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
>>> ---
>>>   v3: make atomic checks platfrom specific
>>>
>>>   configure.ac                           |  4 +++-
>>>   platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
>>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>>   create mode 100644 platform/linux-generic/m4/configure.m4
>>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 377e8be..f55beca 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform],
>>>       [AS_HELP_STRING([--with-platform=prefix],
>>>           [Select platform to be used, default linux-generic])],
>>>       [],
>>> -    [with_platform=linux-generic])
>>> +    [with_platform=linux-generic
>>> +     m4_include([./platform/linux-generic/m4/configure.m4])
>>> +    ])
>>>
>>>   AC_SUBST([with_platform])
>>>
>>> diff --git a/platform/linux-generic/m4/configure.m4
>>> b/platform/linux-generic/m4/configure.m4
>>> new file mode 100644
>>> index 0000000..bfb1fb0
>>> --- /dev/null
>>> +++ b/platform/linux-generic/m4/configure.m4
>>> @@ -0,0 +1,18 @@
>>> +AC_MSG_CHECKING(for GCC atomic builtins)
>>> +AC_LINK_IFELSE(
>>> +    [AC_LANG_SOURCE(
>>> +      [[#include <stdint.h>
>>> +        int main() {
>>> +        uint32_t v = 1;
You don't need to use a uint32_t, a plain "int" should suffice. Thus
no need to include <stdint.h> either which should be more robust.

>>> +        __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED);
>>> +        __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED);
>>> +        __atomic_store_n(&v, 1, __ATOMIC_RELAXED);
>>> +        __atomic_load_n(&v, __ATOMIC_RELAXED);
>>> +        return 0;
>>> +        }
>>> +    ]])],
>>> +    AC_MSG_RESULT(yes),
>>> +    AC_MSG_RESULT(no)
>>> +    echo "Atomic operation are not supported by your compiller."
"compiller" speling erorr.

I think the message should be more detailed.
echo "GCC-style __atomic builtins not supported by the compiler"


>>> +    echo "Use newer version. For gcc > 4.7.3"
>>
>> Why >4.7.3? Minor releases do not introduce major new features.
>> __atomic support was introduced on GCC 4.7.0.
>
> I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. So I
> just set tested value.
> I'm ok to change it to 4.7.0.
Good.

>
> Maxim.
>
>>
>>> +    exit -1)
>>> --
>>> 1.8.5.1.163.gd7aced9
>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Mike Holmes Dec. 23, 2014, 3:28 p.m. UTC | #4

how will this work, Octeon does not care about   __atomic_fetch_add(&v, 1,
__ATOMIC_RELAXED); and yet you will fail configure I assume

static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
{
#if defined __OCTEON__
>-------uint32_t ret;
>-------__asm__ __volatile__ ("syncws");
>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
>------->------->-------      "r" (atom));
>-------return ret;
#else
>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
#endif
}


On 23 December 2014 at 10:26, Ola Liljedahl <ola.liljedahl@linaro.org>
wrote:

> On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org>
> wrote:
> > On 12/23/2014 04:26 PM, Ola Liljedahl wrote:
> >>
> >> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org>
> >> wrote:
> >>>
> >>> Odp atomic operations based on compiler build-ins. Make
> >>> sure that compiler supports such operation at configure
> >>> stage.
> >>>
> >>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
> >>> ---
> >>>   v3: make atomic checks platfrom specific
> >>>
> >>>   configure.ac                           |  4 +++-
> >>>   platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
> >>>   2 files changed, 21 insertions(+), 1 deletion(-)
> >>>   create mode 100644 platform/linux-generic/m4/configure.m4
> >>>
> >>> diff --git a/configure.ac b/configure.ac
> >>> index 377e8be..f55beca 100644
> >>> --- a/configure.ac
> >>> +++ b/configure.ac
> >>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform],
> >>>       [AS_HELP_STRING([--with-platform=prefix],
> >>>           [Select platform to be used, default linux-generic])],
> >>>       [],
> >>> -    [with_platform=linux-generic])
> >>> +    [with_platform=linux-generic
> >>> +     m4_include([./platform/linux-generic/m4/configure.m4])
> >>> +    ])
> >>>
> >>>   AC_SUBST([with_platform])
> >>>
> >>> diff --git a/platform/linux-generic/m4/configure.m4
> >>> b/platform/linux-generic/m4/configure.m4
> >>> new file mode 100644
> >>> index 0000000..bfb1fb0
> >>> --- /dev/null
> >>> +++ b/platform/linux-generic/m4/configure.m4
> >>> @@ -0,0 +1,18 @@
> >>> +AC_MSG_CHECKING(for GCC atomic builtins)
> >>> +AC_LINK_IFELSE(
> >>> +    [AC_LANG_SOURCE(
> >>> +      [[#include <stdint.h>
> >>> +        int main() {
> >>> +        uint32_t v = 1;
> You don't need to use a uint32_t, a plain "int" should suffice. Thus
> no need to include <stdint.h> either which should be more robust.
>
> >>> +        __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED);
> >>> +        __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED);
> >>> +        __atomic_store_n(&v, 1, __ATOMIC_RELAXED);
> >>> +        __atomic_load_n(&v, __ATOMIC_RELAXED);
> >>> +        return 0;
> >>> +        }
> >>> +    ]])],
> >>> +    AC_MSG_RESULT(yes),
> >>> +    AC_MSG_RESULT(no)
> >>> +    echo "Atomic operation are not supported by your compiller."
> "compiller" speling erorr.
>
> I think the message should be more detailed.
> echo "GCC-style __atomic builtins not supported by the compiler"
>
>
> >>> +    echo "Use newer version. For gcc > 4.7.3"
> >>
> >> Why >4.7.3? Minor releases do not introduce major new features.
> >> __atomic support was introduced on GCC 4.7.0.
> >
> > I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there.
> So I
> > just set tested value.
> > I'm ok to change it to 4.7.0.
> Good.
>
> >
> > Maxim.
> >
> >>
> >>> +    exit -1)
> >>> --
> >>> 1.8.5.1.163.gd7aced9
> >>>
> >>>
> >>> _______________________________________________
> >>> lng-odp mailing list
> >>> lng-odp@lists.linaro.org
> >>> http://lists.linaro.org/mailman/listinfo/lng-odp
> >
> >
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

Mike Holmes Dec. 23, 2014, 3:44 p.m. UTC | #5

We need and arch directory that ./configure can pull from for those things
that are not platform specific but are core core specific.
./configure then needs different rules to accommodate the above cases,
presumable the different rules can be pulled depending on something common
like $ARCH similar to the linux compile.

On 23 December 2014 at 10:28, Mike Holmes <mike.holmes@linaro.org> wrote:

> how will this work, Octeon does not care about   __atomic_fetch_add(&v,
> 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
>
> static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
> {
> #if defined __OCTEON__
> >-------uint32_t ret;
> >-------__asm__ __volatile__ ("syncws");
> >-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
> >------->------->-------      "r" (atom));
> >-------return ret;
> #else
> >-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
> #endif
> }
>
>
> On 23 December 2014 at 10:26, Ola Liljedahl <ola.liljedahl@linaro.org>
> wrote:
>
>> On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org>
>> wrote:
>> > On 12/23/2014 04:26 PM, Ola Liljedahl wrote:
>> >>
>> >> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org>
>> >> wrote:
>> >>>
>> >>> Odp atomic operations based on compiler build-ins. Make
>> >>> sure that compiler supports such operation at configure
>> >>> stage.
>> >>>
>> >>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
>> >>> ---
>> >>>   v3: make atomic checks platfrom specific
>> >>>
>> >>>   configure.ac                           |  4 +++-
>> >>>   platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++
>> >>>   2 files changed, 21 insertions(+), 1 deletion(-)
>> >>>   create mode 100644 platform/linux-generic/m4/configure.m4
>> >>>
>> >>> diff --git a/configure.ac b/configure.ac
>> >>> index 377e8be..f55beca 100644
>> >>> --- a/configure.ac
>> >>> +++ b/configure.ac
>> >>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform],
>> >>>       [AS_HELP_STRING([--with-platform=prefix],
>> >>>           [Select platform to be used, default linux-generic])],
>> >>>       [],
>> >>> -    [with_platform=linux-generic])
>> >>> +    [with_platform=linux-generic
>> >>> +     m4_include([./platform/linux-generic/m4/configure.m4])
>> >>> +    ])
>> >>>
>> >>>   AC_SUBST([with_platform])
>> >>>
>> >>> diff --git a/platform/linux-generic/m4/configure.m4
>> >>> b/platform/linux-generic/m4/configure.m4
>> >>> new file mode 100644
>> >>> index 0000000..bfb1fb0
>> >>> --- /dev/null
>> >>> +++ b/platform/linux-generic/m4/configure.m4
>> >>> @@ -0,0 +1,18 @@
>> >>> +AC_MSG_CHECKING(for GCC atomic builtins)
>> >>> +AC_LINK_IFELSE(
>> >>> +    [AC_LANG_SOURCE(
>> >>> +      [[#include <stdint.h>
>> >>> +        int main() {
>> >>> +        uint32_t v = 1;
>> You don't need to use a uint32_t, a plain "int" should suffice. Thus
>> no need to include <stdint.h> either which should be more robust.
>>
>> >>> +        __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED);
>> >>> +        __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED);
>> >>> +        __atomic_store_n(&v, 1, __ATOMIC_RELAXED);
>> >>> +        __atomic_load_n(&v, __ATOMIC_RELAXED);
>> >>> +        return 0;
>> >>> +        }
>> >>> +    ]])],
>> >>> +    AC_MSG_RESULT(yes),
>> >>> +    AC_MSG_RESULT(no)
>> >>> +    echo "Atomic operation are not supported by your compiller."
>> "compiller" speling erorr.
>>
>> I think the message should be more detailed.
>> echo "GCC-style __atomic builtins not supported by the compiler"
>>
>>
>> >>> +    echo "Use newer version. For gcc > 4.7.3"
>> >>
>> >> Why >4.7.3? Minor releases do not introduce major new features.
>> >> __atomic support was introduced on GCC 4.7.0.
>> >
>> > I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there.
>> So I
>> > just set tested value.
>> > I'm ok to change it to 4.7.0.
>> Good.
>>
>> >
>> > Maxim.
>> >
>> >>
>> >>> +    exit -1)
>> >>> --
>> >>> 1.8.5.1.163.gd7aced9
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> lng-odp mailing list
>> >>> lng-odp@lists.linaro.org
>> >>> http://lists.linaro.org/mailman/listinfo/lng-odp
>> >
>> >
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>
>
> --
> *Mike Holmes*
> Linaro  Sr Technical Manager
> LNG - ODP
>

Taras Kondratiuk Dec. 23, 2014, 4:15 p.m. UTC | #6

On 12/23/2014 05:28 PM, Mike Holmes wrote:
> how will this work, Octeon does not care about   __atomic_fetch_add(&v,
> 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
> 
> static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
> {
> #if defined __OCTEON__
>>-------uint32_t ret;
>>-------__asm__ __volatile__ ("syncws");
>>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
>>------->------->-------      "r" (atom));
>>-------return ret;
> #else
>>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
> #endif
> }

Linux-generic shouldn't have arch specific parts, because it is
*generic*. Instead it can fall back to strong __sync functions if
__atomic are not available.

Mike Holmes Dec. 23, 2014, 5:01 p.m. UTC | #7

On 23 December 2014 at 11:15, Taras Kondratiuk <taras.kondratiuk@linaro.org>
wrote:

> On 12/23/2014 05:28 PM, Mike Holmes wrote:
> > how will this work, Octeon does not care about   __atomic_fetch_add(&v,
> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
> >
> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
> > {
> > #if defined __OCTEON__
> >>-------uint32_t ret;
> >>-------__asm__ __volatile__ ("syncws");
> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
> >>------->------->-------      "r" (atom));
> >>-------return ret;
> > #else
> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
> > #endif
> > }
>
> Linux-generic shouldn't have arch specific parts, because it is
> *generic*. Instead it can fall back to strong __sync functions if
> __atomic are not available.
>
>
Linux kernel is generic but has core specifics - is this not the same ?

What about these cases ?

platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE
*file, odp_system_info_t *sysinfo)
platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "octeon",
platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
cpuinfo_octeon
platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ ||
defined __i386__
platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ ||
defined __i386__
platform/linux-generic/include/odp_spin_internal.h:#if defined __x86_64__
|| defined __i386__
platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
__i386__ || defined __OCTEON__ || \
platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
__i386__
platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE *file,
odp_system_info_t *sysinfo)
platform/linux-generic/odp_system_info.c:       #if defined __x86_64__ ||
defined __i386__
platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "x86",
platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
cpuinfo_x86
platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
__i386__ || defined __OCTEON__ || \
platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined __i386__
platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ ||
defined __aarch64__
platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__)
platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__
platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined
__aarch64__
platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE *file
ODP_UNUSED,
platform/linux-generic/odp_system_info.c:       #elif defined __arm__ ||
defined __aarch64__
platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "arm",
platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
cpuinfo_arm



>
> Taras Kondratiuk
>

Ola Liljedahl Dec. 23, 2014, 6:38 p.m. UTC | #8

I had some discussions with Bala and we think the GCC generates the
appropriate code for OCTEON so there should be no need for any
OCTEON-specific inline assembler in odp_atomic.h. I just didn't want
to remove this piece of code when I redesigned the atomic support
because I didn't understand why there is a SYNCW instruction *before*
the load. Load-acquire type of loads may need sync or barrier
instructions but they come after the load, not before.

I think we should trust GCC to generate better code for OCTEON than us
mere humans, at least until proven wrong. So please remove this
OCTEON-specific snippet.

-- Ola


On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote:
>
>
> On 23 December 2014 at 11:15, Taras Kondratiuk <taras.kondratiuk@linaro.org>
> wrote:
>>
>> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>> > how will this work, Octeon does not care about   __atomic_fetch_add(&v,
>> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
>> >
>> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>> > {
>> > #if defined __OCTEON__
>> >>-------uint32_t ret;
>> >>-------__asm__ __volatile__ ("syncws");
>> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
>> >>------->------->-------      "r" (atom));
>> >>-------return ret;
>> > #else
>> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
>> > #endif
>> > }
>>
>> Linux-generic shouldn't have arch specific parts, because it is
>> *generic*. Instead it can fall back to strong __sync functions if
>> __atomic are not available.
>>
>
> Linux kernel is generic but has core specifics - is this not the same ?
>
> What about these cases ?
>
> platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE
> *file, odp_system_info_t *sysinfo)
> platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "octeon",
> platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> cpuinfo_octeon
> platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ ||
> defined __i386__
> platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ ||
> defined __i386__
> platform/linux-generic/include/odp_spin_internal.h:#if defined __x86_64__ ||
> defined __i386__
> platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
> __i386__ || defined __OCTEON__ || \
> platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
> __i386__
> platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE *file,
> odp_system_info_t *sysinfo)
> platform/linux-generic/odp_system_info.c:       #if defined __x86_64__ ||
> defined __i386__
> platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "x86",
> platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> cpuinfo_x86
> platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined
> __i386__ || defined __OCTEON__ || \
> platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined __i386__
> platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ ||
> defined __aarch64__
> platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__)
> platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__
> platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined
> __aarch64__
> platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE *file
> ODP_UNUSED,
> platform/linux-generic/odp_system_info.c:       #elif defined __arm__ ||
> defined __aarch64__
> platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "arm",
> platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> cpuinfo_arm
>
>
>>
>>
>> Taras Kondratiuk
>
>
>
>
> --
> Mike Holmes
> Linaro  Sr Technical Manager
> LNG - ODP

Bill Fischofer Dec. 23, 2014, 6:49 p.m. UTC | #9

That's good to know and certainly the code is cleaner that way.  I think we
should adopt the practice that any architecture-specific changes to
linux-generic code (adds, updates, or removals) must either originate from
or have a Reviewed-by a representative of that architecture (in this case,
Bala).  That way there's no confusion as to whether the known experts agree
with the change, since these sort of things can have subtle implications
that are not obvious to the layman.

On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
wrote:
>
> I had some discussions with Bala and we think the GCC generates the
> appropriate code for OCTEON so there should be no need for any
> OCTEON-specific inline assembler in odp_atomic.h. I just didn't want
> to remove this piece of code when I redesigned the atomic support
> because I didn't understand why there is a SYNCW instruction *before*
> the load. Load-acquire type of loads may need sync or barrier
> instructions but they come after the load, not before.
>
> I think we should trust GCC to generate better code for OCTEON than us
> mere humans, at least until proven wrong. So please remove this
> OCTEON-specific snippet.
>
> -- Ola
>
>
> On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote:
> >
> >
> > On 23 December 2014 at 11:15, Taras Kondratiuk <
> taras.kondratiuk@linaro.org>
> > wrote:
> >>
> >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
> >> > how will this work, Octeon does not care about
>  __atomic_fetch_add(&v,
> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
> >> >
> >> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t
> *atom)
> >> > {
> >> > #if defined __OCTEON__
> >> >>-------uint32_t ret;
> >> >>-------__asm__ __volatile__ ("syncws");
> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) :
> >> >>------->------->-------      "r" (atom));
> >> >>-------return ret;
> >> > #else
> >> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
> >> > #endif
> >> > }
> >>
> >> Linux-generic shouldn't have arch specific parts, because it is
> >> *generic*. Instead it can fall back to strong __sync functions if
> >> __atomic are not available.
> >>
> >
> > Linux kernel is generic but has core specifics - is this not the same ?
> >
> > What about these cases ?
> >
> > platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE
> > *file, odp_system_info_t *sysinfo)
> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "octeon",
> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> > cpuinfo_octeon
> > platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ ||
> > defined __i386__
> > platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ ||
> > defined __i386__
> > platform/linux-generic/include/odp_spin_internal.h:#if defined
> __x86_64__ ||
> > defined __i386__
> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
> defined
> > __i386__ || defined __OCTEON__ || \
> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
> defined
> > __i386__
> > platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE
> *file,
> > odp_system_info_t *sysinfo)
> > platform/linux-generic/odp_system_info.c:       #if defined __x86_64__ ||
> > defined __i386__
> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "x86",
> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> > cpuinfo_x86
> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
> defined
> > __i386__ || defined __OCTEON__ || \
> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined
> __i386__
> > platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ ||
> > defined __aarch64__
> > platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__)
> > platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__
> > platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined
> > __aarch64__
> > platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE
> *file
> > ODP_UNUSED,
> > platform/linux-generic/odp_system_info.c:       #elif defined __arm__ ||
> > defined __aarch64__
> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "arm",
> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
> > cpuinfo_arm
> >
> >
> >>
> >>
> >> Taras Kondratiuk
> >
> >
> >
> >
> > --
> > Mike Holmes
> > Linaro  Sr Technical Manager
> > LNG - ODP
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

Mike Holmes Dec. 23, 2014, 7:40 p.m. UTC | #10

On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> That's good to know and certainly the code is cleaner that way.  I think
> we should adopt the practice that any architecture-specific changes to
> linux-generic code (adds, updates, or removals) must either originate from
> or have a Reviewed-by a representative of that architecture (in this case,
> Bala).  That way there's no confusion as to whether the known experts agree
> with the change, since these sort of things can have subtle implications
> that are not obvious to the layman.
>

I think linux-generic should be devoid of any per core optimizations in its
default state.
There could be in an arch directory and it would be supported as you say
Bill by someone who will care for that that variant, this also avoids
filling linux-generic with a possibly large list of preprocessor flags that
we can't test.

In this case we don't have public access to the octeon to even know if this
code works so separating or deletion feels right to me, although Cavium are
adding their board in this case.



>
> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
> wrote:
>
>> I had some discussions with Bala and we think the GCC generates the
>> appropriate code for OCTEON so there should be no need for any
>> OCTEON-specific inline assembler in odp_atomic.h. I just didn't want
>> to remove this piece of code when I redesigned the atomic support
>> because I didn't understand why there is a SYNCW instruction *before*
>> the load. Load-acquire type of loads may need sync or barrier
>> instructions but they come after the load, not before.
>>
>> I think we should trust GCC to generate better code for OCTEON than us
>> mere humans, at least until proven wrong. So please remove this
>> OCTEON-specific snippet.
>>
>> -- Ola
>>
>>
>> On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote:
>> >
>> >
>> > On 23 December 2014 at 11:15, Taras Kondratiuk <
>> taras.kondratiuk@linaro.org>
>> > wrote:
>> >>
>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>> >> > how will this work, Octeon does not care about
>>  __atomic_fetch_add(&v,
>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume
>> >> >
>> >> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t
>> *atom)
>> >> > {
>> >> > #if defined __OCTEON__
>> >> >>-------uint32_t ret;
>> >> >>-------__asm__ __volatile__ ("syncws");
>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom)
>> :
>> >> >>------->------->-------      "r" (atom));
>> >> >>-------return ret;
>> >> > #else
>> >> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED);
>> >> > #endif
>> >> > }
>> >>
>> >> Linux-generic shouldn't have arch specific parts, because it is
>> >> *generic*. Instead it can fall back to strong __sync functions if
>> >> __atomic are not available.
>> >>
>> >
>> > Linux kernel is generic but has core specifics - is this not the same ?
>> >
>> > What about these cases ?
>> >
>> > platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE
>> > *file, odp_system_info_t *sysinfo)
>> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str =
>> "octeon",
>> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
>> > cpuinfo_octeon
>> > platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ ||
>> > defined __i386__
>> > platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ ||
>> > defined __i386__
>> > platform/linux-generic/include/odp_spin_internal.h:#if defined
>> __x86_64__ ||
>> > defined __i386__
>> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
>> defined
>> > __i386__ || defined __OCTEON__ || \
>> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
>> defined
>> > __i386__
>> > platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE
>> *file,
>> > odp_system_info_t *sysinfo)
>> > platform/linux-generic/odp_system_info.c:       #if defined __x86_64__
>> ||
>> > defined __i386__
>> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "x86",
>> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
>> > cpuinfo_x86
>> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ ||
>> defined
>> > __i386__ || defined __OCTEON__ || \
>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined
>> __i386__
>> > platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ ||
>> > defined __aarch64__
>> > platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__)
>> > platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__
>> > platform/linux-generic/odp_system_info.c:#elif defined __arm__ ||
>> defined
>> > __aarch64__
>> > platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE
>> *file
>> > ODP_UNUSED,
>> > platform/linux-generic/odp_system_info.c:       #elif defined __arm__ ||
>> > defined __aarch64__
>> > platform/linux-generic/odp_system_info.c:       .cpu_arch_str = "arm",
>> > platform/linux-generic/odp_system_info.c:       .cpuinfo_parser =
>> > cpuinfo_arm
>> >
>> >
>> >>
>> >>
>> >> Taras Kondratiuk
>> >
>> >
>> >
>> >
>> > --
>> > Mike Holmes
>> > Linaro  Sr Technical Manager
>> > LNG - ODP
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>

Maxim Uvarov Dec. 24, 2014, 10:21 a.m. UTC | #11

On 12/23/2014 07:15 PM, Taras Kondratiuk wrote:
> Linux-generic shouldn't have arch specific parts, because it is
> *generic*. Instead it can fall back to strong __sync functions if
> __atomic are not available.

Taras I think that even linux-generic should have it's own requirements. 
It might be compiler version.
Dependence on libssl. Kernel version and etc.

We do not say that linux-generic should work on any linux, even 20 years 
old. We just define needed requirements
to compile it. The same is with atomics. Instead of creating 20 branches 
for all gcc and not gcc version we just
stick with more fresh gcc and say that it's minimal requirement to run 
linux-generic.

Maxim.

Maxim Uvarov Dec. 24, 2014, 11:19 a.m. UTC | #12

On 12/23/2014 10:40 PM, Mike Holmes wrote:
>
>
> On 23 December 2014 at 13:49, Bill Fischofer 
> <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
>
>     That's good to know and certainly the code is cleaner that way.  I
>     think we should adopt the practice that any architecture-specific
>     changes to linux-generic code (adds, updates, or removals) must
>     either originate from or have a Reviewed-by a representative of
>     that architecture (in this case, Bala).  That way there's no
>     confusion as to whether the known experts agree with the change,
>     since these sort of things can have subtle implications that are
>     not obvious to the layman.
>
>
> I think linux-generic should be devoid of any per core optimizations 
> in its default state.
> There could be in an arch directory and it would be supported as you 
> say Bill by someone who will care for that that variant, this also 
> avoids filling linux-generic with a possibly large list of 
> preprocessor flags that we can't test.
>
> In this case we don't have public access to the octeon to even know if 
> this code works so separating or deletion feels right to me, although 
> Cavium are adding their board in this case.
>
To have optimizations we should understand which per arch optimization 
we need. I see that if newer gcc supports good optimizations then we 
need to use that gcc. Most of people use Open Embedded and Linaro gcc 
which is almost latest version. So if there is no optimization in 
standard environment (gcc, kernel, libc) then arch implementation should 
go to platform specific git variant. If it's covered by standard 
environment then we simple updating minimal version requirement for 
required component.

Maxim.


>
>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote:
>
>         I had some discussions with Bala and we think the GCC
>         generates the
>         appropriate code for OCTEON so there should be no need for any
>         OCTEON-specific inline assembler in odp_atomic.h. I just
>         didn't want
>         to remove this piece of code when I redesigned the atomic support
>         because I didn't understand why there is a SYNCW instruction
>         *before*
>         the load. Load-acquire type of loads may need sync or barrier
>         instructions but they come after the load, not before.
>
>         I think we should trust GCC to generate better code for OCTEON
>         than us
>         mere humans, at least until proven wrong. So please remove this
>         OCTEON-specific snippet.
>
>         -- Ola
>
>
>         On 23 December 2014 at 18:01, Mike Holmes
>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
>         >
>         >
>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org>>
>         > wrote:
>         >>
>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>         >> > how will this work, Octeon does not care about
>          __atomic_fetch_add(&v,
>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>         assume
>         >> >
>         >> > static inline uint32_t
>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>         >> > {
>         >> > #if defined __OCTEON__
>         >> >>-------uint32_t ret;
>         >> >>-------__asm__ __volatile__ ("syncws");
>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>         "+m" (atom) :
>         >> >>------->------->------- "r" (atom));
>         >> >>-------return ret;
>         >> > #else
>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>         __ATOMIC_RELAXED);
>         >> > #endif
>         >> > }
>         >>
>         >> Linux-generic shouldn't have arch specific parts, because it is
>         >> *generic*. Instead it can fall back to strong __sync
>         functions if
>         >> __atomic are not available.
>         >>
>         >
>         > Linux kernel is generic but has core specifics - is this not
>         the same ?
>         >
>         > What about these cases ?
>         >
>         > platform/linux-generic/odp_system_info.c:static int
>         cpuinfo_octeon(FILE
>         > *file, odp_system_info_t *sysinfo)
>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>         "octeon",
>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>         > cpuinfo_octeon
>         > platform/linux-generic/include/api/odp_align.h:#if defined
>         __x86_64__ ||
>         > defined __i386__
>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>         __x86_64__ ||
>         > defined __i386__
>         > platform/linux-generic/include/odp_spin_internal.h:#if
>         defined __x86_64__ ||
>         > defined __i386__
>         > platform/linux-generic/odp_system_info.c:#if defined
>         __x86_64__ || defined
>         > __i386__ || defined __OCTEON__ || \
>         > platform/linux-generic/odp_system_info.c:#if defined
>         __x86_64__ || defined
>         > __i386__
>         > platform/linux-generic/odp_system_info.c:static int
>         cpuinfo_x86(FILE *file,
>         > odp_system_info_t *sysinfo)
>         > platform/linux-generic/odp_system_info.c:    #if defined
>         __x86_64__ ||
>         > defined __i386__
>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>         "x86",
>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>         > cpuinfo_x86
>         > platform/linux-generic/odp_system_info.c:#if defined
>         __x86_64__ || defined
>         > __i386__ || defined __OCTEON__ || \
>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>         defined __i386__
>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>         __arm__ ||
>         > defined __aarch64__
>         > platform/linux-generic/include/api/odp_sync.h:#elif
>         defined(__arm__)
>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>         defined __arm__
>         > platform/linux-generic/odp_system_info.c:#elif defined
>         __arm__ || defined
>         > __aarch64__
>         > platform/linux-generic/odp_system_info.c:static int
>         cpuinfo_arm(FILE *file
>         > ODP_UNUSED,
>         > platform/linux-generic/odp_system_info.c:    #elif defined
>         __arm__ ||
>         > defined __aarch64__
>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>         "arm",
>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>         > cpuinfo_arm
>         >
>         >
>         >>
>         >>
>         >> Taras Kondratiuk
>         >
>         >
>         >
>         >
>         > --
>         > Mike Holmes
>         > Linaro  Sr Technical Manager
>         > LNG - ODP
>
>         _______________________________________________
>         lng-odp mailing list
>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         http://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
> -- 
> *Mike Holmes*
> Linaro  Sr Technical Manager
> LNG - ODP
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp

Mike Holmes Dec. 24, 2014, 2:59 p.m. UTC | #13

On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:

> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>
>>
>>
>> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org
>> <mailto:bill.fischofer@linaro.org>> wrote:
>>
>>     That's good to know and certainly the code is cleaner that way.  I
>>     think we should adopt the practice that any architecture-specific
>>     changes to linux-generic code (adds, updates, or removals) must
>>     either originate from or have a Reviewed-by a representative of
>>     that architecture (in this case, Bala).  That way there's no
>>     confusion as to whether the known experts agree with the change,
>>     since these sort of things can have subtle implications that are
>>     not obvious to the layman.
>>
>>
>> I think linux-generic should be devoid of any per core optimizations in
>> its default state.
>> There could be in an arch directory and it would be supported as you say
>> Bill by someone who will care for that that variant, this also avoids
>> filling linux-generic with a possibly large list of preprocessor flags that
>> we can't test.
>>
>> In this case we don't have public access to the octeon to even know if
>> this code works so separating or deletion feels right to me, although
>> Cavium are adding their board in this case.
>>
>>  To have optimizations we should understand which per arch optimization
> we need. I see that if newer gcc supports good optimizations then we need
> to use that gcc. Most of people use Open Embedded and Linaro gcc which is
> almost latest version. So if there is no optimization in standard
> environment (gcc, kernel, libc) then arch implementation should go to
> platform specific git variant. If it's covered by standard environment then
> we simple updating minimal version requirement for required component.
>
>
Lets take an example ODP API file

static inline void odp_sync_stores(void)
{
#if defined __x86_64__ || defined __i386__

>-------__asm__  __volatile__ ("sfence\n" : : : "memory");

#elif defined(__arm__)
#if __ARM_ARCH == 6
>-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>------->------->-------: : "r" (0) : "memory");
#elif __ARM_ARCH >= 7 || defined __aarch64__

>-------__asm__ __volatile__ ("dmb st" : : : "memory");
#else
>-------__asm__ __volatile__ ("" : : : "memory");
#endif

#elif defined __OCTEON__

>-------__asm__  __volatile__ ("syncws\n" : : : "memory");

#else
>-------__sync_synchronize();
#endif


How do we accommodate this without resorting to architecture specific
processor flags / assembly?

I would prefer to see platform/linux-generic/include/api/odp_sync.h changed
to

platform/linux-generic/include/api/x86/odp_sync.h
platform/linux-generic/include/api/arm/odp_sync.h
platform/linux-generic/include/api/octeon/odp_sync.h

This assumes that there is currently no pure c code that can be used that
is acceptable on all three platforms.

Or since linux-generic is not a performance platform but rather a reference
whenever we need to make a trade off between simplicity and performance, we
alter this to be a regular function and move to:-

platform/linux-generic/include/api/odp_sync.h <- now a pure forward
declaration of the API with doxygen documentation
and implementations correctly broken down by arch

arch/x86/odp_sync.h
arch/arm/odp_sync.h
arch/octeon/odp_sync.h



> Maxim.
>
>
>
>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote:
>>
>>         I had some discussions with Bala and we think the GCC
>>         generates the
>>         appropriate code for OCTEON so there should be no need for any
>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>         didn't want
>>         to remove this piece of code when I redesigned the atomic support
>>         because I didn't understand why there is a SYNCW instruction
>>         *before*
>>         the load. Load-acquire type of loads may need sync or barrier
>>         instructions but they come after the load, not before.
>>
>>         I think we should trust GCC to generate better code for OCTEON
>>         than us
>>         mere humans, at least until proven wrong. So please remove this
>>         OCTEON-specific snippet.
>>
>>         -- Ola
>>
>>
>>         On 23 December 2014 at 18:01, Mike Holmes
>>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
>>         >
>>         >
>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org
>> >>
>>
>>         > wrote:
>>         >>
>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>         >> > how will this work, Octeon does not care about
>>          __atomic_fetch_add(&v,
>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>         assume
>>         >> >
>>         >> > static inline uint32_t
>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>         >> > {
>>         >> > #if defined __OCTEON__
>>         >> >>-------uint32_t ret;
>>         >> >>-------__asm__ __volatile__ ("syncws");
>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>         "+m" (atom) :
>>         >> >>------->------->------- "r" (atom));
>>         >> >>-------return ret;
>>         >> > #else
>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>         __ATOMIC_RELAXED);
>>         >> > #endif
>>         >> > }
>>         >>
>>         >> Linux-generic shouldn't have arch specific parts, because it is
>>         >> *generic*. Instead it can fall back to strong __sync
>>         functions if
>>         >> __atomic are not available.
>>         >>
>>         >
>>         > Linux kernel is generic but has core specifics - is this not
>>         the same ?
>>         >
>>         > What about these cases ?
>>         >
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_octeon(FILE
>>         > *file, odp_system_info_t *sysinfo)
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "octeon",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_octeon
>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>         defined __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__ || defined __OCTEON__ || \
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_x86(FILE *file,
>>         > odp_system_info_t *sysinfo)
>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "x86",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_x86
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__ || defined __OCTEON__ || \
>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>         defined __i386__
>>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>>         __arm__ ||
>>         > defined __aarch64__
>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>         defined(__arm__)
>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>         defined __arm__
>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>         __arm__ || defined
>>         > __aarch64__
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_arm(FILE *file
>>         > ODP_UNUSED,
>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>         __arm__ ||
>>         > defined __aarch64__
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "arm",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_arm
>>         >
>>         >
>>         >>
>>         >>
>>         >> Taras Kondratiuk
>>         >
>>         >
>>         >
>>         >
>>         > --
>>         > Mike Holmes
>>         > Linaro  Sr Technical Manager
>>         > LNG - ODP
>>
>>         _______________________________________________
>>         lng-odp mailing list
>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>
>> --
>> *Mike Holmes*
>> Linaro  Sr Technical Manager
>> LNG - ODP
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

Bill Fischofer Dec. 24, 2014, 3:03 p.m. UTC | #14

That makes sense.  The only problem is GCC's problems with forward
declarations and inlining.  IMO a design issue with the compiler, but it is
what it is.  But we do need a general solution to this problem in 2015
since proper inlining support will be critical for production-grade
implementations.

On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org> wrote:

>
>
> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org>
> wrote:
>
>> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>>
>>>
>>>
>>> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org
>>> <mailto:bill.fischofer@linaro.org>> wrote:
>>>
>>>     That's good to know and certainly the code is cleaner that way.  I
>>>     think we should adopt the practice that any architecture-specific
>>>     changes to linux-generic code (adds, updates, or removals) must
>>>     either originate from or have a Reviewed-by a representative of
>>>     that architecture (in this case, Bala).  That way there's no
>>>     confusion as to whether the known experts agree with the change,
>>>     since these sort of things can have subtle implications that are
>>>     not obvious to the layman.
>>>
>>>
>>> I think linux-generic should be devoid of any per core optimizations in
>>> its default state.
>>> There could be in an arch directory and it would be supported as you say
>>> Bill by someone who will care for that that variant, this also avoids
>>> filling linux-generic with a possibly large list of preprocessor flags that
>>> we can't test.
>>>
>>> In this case we don't have public access to the octeon to even know if
>>> this code works so separating or deletion feels right to me, although
>>> Cavium are adding their board in this case.
>>>
>>>  To have optimizations we should understand which per arch optimization
>> we need. I see that if newer gcc supports good optimizations then we need
>> to use that gcc. Most of people use Open Embedded and Linaro gcc which is
>> almost latest version. So if there is no optimization in standard
>> environment (gcc, kernel, libc) then arch implementation should go to
>> platform specific git variant. If it's covered by standard environment then
>> we simple updating minimal version requirement for required component.
>>
>>
> Lets take an example ODP API file
>
> static inline void odp_sync_stores(void)
> {
> #if defined __x86_64__ || defined __i386__
>
> >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
>
> #elif defined(__arm__)
> #if __ARM_ARCH == 6
> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
> >------->------->-------: : "r" (0) : "memory");
> #elif __ARM_ARCH >= 7 || defined __aarch64__
>
> >-------__asm__ __volatile__ ("dmb st" : : : "memory");
> #else
> >-------__asm__ __volatile__ ("" : : : "memory");
> #endif
>
> #elif defined __OCTEON__
>
> >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
>
> #else
> >-------__sync_synchronize();
> #endif
>
>
> How do we accommodate this without resorting to architecture specific
> processor flags / assembly?
>
> I would prefer to see platform/linux-generic/include/api/odp_sync.h
> changed to
>
> platform/linux-generic/include/api/x86/odp_sync.h
> platform/linux-generic/include/api/arm/odp_sync.h
> platform/linux-generic/include/api/octeon/odp_sync.h
>
> This assumes that there is currently no pure c code that can be used that
> is acceptable on all three platforms.
>
> Or since linux-generic is not a performance platform but rather a
> reference whenever we need to make a trade off between simplicity and
> performance, we alter this to be a regular function and move to:-
>
> platform/linux-generic/include/api/odp_sync.h <- now a pure forward
> declaration of the API with doxygen documentation
> and implementations correctly broken down by arch
>
> arch/x86/odp_sync.h
> arch/arm/odp_sync.h
> arch/octeon/odp_sync.h
>
>
>
>> Maxim.
>>
>>
>>
>>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote:
>>>
>>>         I had some discussions with Bala and we think the GCC
>>>         generates the
>>>         appropriate code for OCTEON so there should be no need for any
>>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>>         didn't want
>>>         to remove this piece of code when I redesigned the atomic support
>>>         because I didn't understand why there is a SYNCW instruction
>>>         *before*
>>>         the load. Load-acquire type of loads may need sync or barrier
>>>         instructions but they come after the load, not before.
>>>
>>>         I think we should trust GCC to generate better code for OCTEON
>>>         than us
>>>         mere humans, at least until proven wrong. So please remove this
>>>         OCTEON-specific snippet.
>>>
>>>         -- Ola
>>>
>>>
>>>         On 23 December 2014 at 18:01, Mike Holmes
>>>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
>>>         >
>>>         >
>>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org
>>> >>
>>>
>>>         > wrote:
>>>         >>
>>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>>         >> > how will this work, Octeon does not care about
>>>          __atomic_fetch_add(&v,
>>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>>         assume
>>>         >> >
>>>         >> > static inline uint32_t
>>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>>         >> > {
>>>         >> > #if defined __OCTEON__
>>>         >> >>-------uint32_t ret;
>>>         >> >>-------__asm__ __volatile__ ("syncws");
>>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>>         "+m" (atom) :
>>>         >> >>------->------->------- "r" (atom));
>>>         >> >>-------return ret;
>>>         >> > #else
>>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>>         __ATOMIC_RELAXED);
>>>         >> > #endif
>>>         >> > }
>>>         >>
>>>         >> Linux-generic shouldn't have arch specific parts, because it
>>> is
>>>         >> *generic*. Instead it can fall back to strong __sync
>>>         functions if
>>>         >> __atomic are not available.
>>>         >>
>>>         >
>>>         > Linux kernel is generic but has core specifics - is this not
>>>         the same ?
>>>         >
>>>         > What about these cases ?
>>>         >
>>>         > platform/linux-generic/odp_system_info.c:static int
>>>         cpuinfo_octeon(FILE
>>>         > *file, odp_system_info_t *sysinfo)
>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>         "octeon",
>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>>         > cpuinfo_octeon
>>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>>         __x86_64__ ||
>>>         > defined __i386__
>>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>>         __x86_64__ ||
>>>         > defined __i386__
>>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>>         defined __x86_64__ ||
>>>         > defined __i386__
>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>         __x86_64__ || defined
>>>         > __i386__ || defined __OCTEON__ || \
>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>         __x86_64__ || defined
>>>         > __i386__
>>>         > platform/linux-generic/odp_system_info.c:static int
>>>         cpuinfo_x86(FILE *file,
>>>         > odp_system_info_t *sysinfo)
>>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>>         __x86_64__ ||
>>>         > defined __i386__
>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>         "x86",
>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>>         > cpuinfo_x86
>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>         __x86_64__ || defined
>>>         > __i386__ || defined __OCTEON__ || \
>>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>>         defined __i386__
>>>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>>>         __arm__ ||
>>>         > defined __aarch64__
>>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>>         defined(__arm__)
>>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>>         defined __arm__
>>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>>         __arm__ || defined
>>>         > __aarch64__
>>>         > platform/linux-generic/odp_system_info.c:static int
>>>         cpuinfo_arm(FILE *file
>>>         > ODP_UNUSED,
>>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>>         __arm__ ||
>>>         > defined __aarch64__
>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>         "arm",
>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>>         > cpuinfo_arm
>>>         >
>>>         >
>>>         >>
>>>         >>
>>>         >> Taras Kondratiuk
>>>         >
>>>         >
>>>         >
>>>         >
>>>         > --
>>>         > Mike Holmes
>>>         > Linaro  Sr Technical Manager
>>>         > LNG - ODP
>>>
>>>         _______________________________________________
>>>         lng-odp mailing list
>>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>>
>>>
>>>
>>> --
>>> *Mike Holmes*
>>> Linaro  Sr Technical Manager
>>> LNG - ODP
>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>
>
> --
> *Mike Holmes*
> Linaro  Sr Technical Manager
> LNG - ODP
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Bill Fischofer Dec. 24, 2014, 3:05 p.m. UTC | #15

I suspect the long-term answer will be to use the autotools to construct
the actual include hierarchy (and possibly the files themelves) needed for
a given --with-platform specification.  That would mean the canonical ODP
API files may need to be meta-files that are used as input to ./configure
to generate the actual ones used.

On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> That makes sense.  The only problem is GCC's problems with forward
> declarations and inlining.  IMO a design issue with the compiler, but it is
> what it is.  But we do need a general solution to this problem in 2015
> since proper inlining support will be critical for production-grade
> implementations.
>
> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org>
> wrote:
>
>>
>>
>> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org>
>> wrote:
>>
>>> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>>>
>>>>
>>>>
>>>> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org
>>>> <mailto:bill.fischofer@linaro.org>> wrote:
>>>>
>>>>     That's good to know and certainly the code is cleaner that way.  I
>>>>     think we should adopt the practice that any architecture-specific
>>>>     changes to linux-generic code (adds, updates, or removals) must
>>>>     either originate from or have a Reviewed-by a representative of
>>>>     that architecture (in this case, Bala).  That way there's no
>>>>     confusion as to whether the known experts agree with the change,
>>>>     since these sort of things can have subtle implications that are
>>>>     not obvious to the layman.
>>>>
>>>>
>>>> I think linux-generic should be devoid of any per core optimizations in
>>>> its default state.
>>>> There could be in an arch directory and it would be supported as you
>>>> say Bill by someone who will care for that that variant, this also avoids
>>>> filling linux-generic with a possibly large list of preprocessor flags that
>>>> we can't test.
>>>>
>>>> In this case we don't have public access to the octeon to even know if
>>>> this code works so separating or deletion feels right to me, although
>>>> Cavium are adding their board in this case.
>>>>
>>>>  To have optimizations we should understand which per arch optimization
>>> we need. I see that if newer gcc supports good optimizations then we need
>>> to use that gcc. Most of people use Open Embedded and Linaro gcc which is
>>> almost latest version. So if there is no optimization in standard
>>> environment (gcc, kernel, libc) then arch implementation should go to
>>> platform specific git variant. If it's covered by standard environment then
>>> we simple updating minimal version requirement for required component.
>>>
>>>
>> Lets take an example ODP API file
>>
>> static inline void odp_sync_stores(void)
>> {
>> #if defined __x86_64__ || defined __i386__
>>
>> >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
>>
>> #elif defined(__arm__)
>> #if __ARM_ARCH == 6
>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>> >------->------->-------: : "r" (0) : "memory");
>> #elif __ARM_ARCH >= 7 || defined __aarch64__
>>
>> >-------__asm__ __volatile__ ("dmb st" : : : "memory");
>> #else
>> >-------__asm__ __volatile__ ("" : : : "memory");
>> #endif
>>
>> #elif defined __OCTEON__
>>
>> >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
>>
>> #else
>> >-------__sync_synchronize();
>> #endif
>>
>>
>> How do we accommodate this without resorting to architecture specific
>> processor flags / assembly?
>>
>> I would prefer to see platform/linux-generic/include/api/odp_sync.h
>> changed to
>>
>> platform/linux-generic/include/api/x86/odp_sync.h
>> platform/linux-generic/include/api/arm/odp_sync.h
>> platform/linux-generic/include/api/octeon/odp_sync.h
>>
>> This assumes that there is currently no pure c code that can be used that
>> is acceptable on all three platforms.
>>
>> Or since linux-generic is not a performance platform but rather a
>> reference whenever we need to make a trade off between simplicity and
>> performance, we alter this to be a regular function and move to:-
>>
>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward
>> declaration of the API with doxygen documentation
>> and implementations correctly broken down by arch
>>
>> arch/x86/odp_sync.h
>> arch/arm/odp_sync.h
>> arch/octeon/odp_sync.h
>>
>>
>>
>>> Maxim.
>>>
>>>
>>>
>>>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>>>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote:
>>>>
>>>>         I had some discussions with Bala and we think the GCC
>>>>         generates the
>>>>         appropriate code for OCTEON so there should be no need for any
>>>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>>>         didn't want
>>>>         to remove this piece of code when I redesigned the atomic
>>>> support
>>>>         because I didn't understand why there is a SYNCW instruction
>>>>         *before*
>>>>         the load. Load-acquire type of loads may need sync or barrier
>>>>         instructions but they come after the load, not before.
>>>>
>>>>         I think we should trust GCC to generate better code for OCTEON
>>>>         than us
>>>>         mere humans, at least until proven wrong. So please remove this
>>>>         OCTEON-specific snippet.
>>>>
>>>>         -- Ola
>>>>
>>>>
>>>>         On 23 December 2014 at 18:01, Mike Holmes
>>>>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
>>>>         >
>>>>         >
>>>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>>>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@
>>>> linaro.org>>
>>>>
>>>>         > wrote:
>>>>         >>
>>>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>>>         >> > how will this work, Octeon does not care about
>>>>          __atomic_fetch_add(&v,
>>>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>>>         assume
>>>>         >> >
>>>>         >> > static inline uint32_t
>>>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>>>         >> > {
>>>>         >> > #if defined __OCTEON__
>>>>         >> >>-------uint32_t ret;
>>>>         >> >>-------__asm__ __volatile__ ("syncws");
>>>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>>>         "+m" (atom) :
>>>>         >> >>------->------->------- "r" (atom));
>>>>         >> >>-------return ret;
>>>>         >> > #else
>>>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>>>         __ATOMIC_RELAXED);
>>>>         >> > #endif
>>>>         >> > }
>>>>         >>
>>>>         >> Linux-generic shouldn't have arch specific parts, because it
>>>> is
>>>>         >> *generic*. Instead it can fall back to strong __sync
>>>>         functions if
>>>>         >> __atomic are not available.
>>>>         >>
>>>>         >
>>>>         > Linux kernel is generic but has core specifics - is this not
>>>>         the same ?
>>>>         >
>>>>         > What about these cases ?
>>>>         >
>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>         cpuinfo_octeon(FILE
>>>>         > *file, odp_system_info_t *sysinfo)
>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>         "octeon",
>>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser
>>>> =
>>>>         > cpuinfo_octeon
>>>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>>>         __x86_64__ ||
>>>>         > defined __i386__
>>>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>>>         __x86_64__ ||
>>>>         > defined __i386__
>>>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>>>         defined __x86_64__ ||
>>>>         > defined __i386__
>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>         __x86_64__ || defined
>>>>         > __i386__ || defined __OCTEON__ || \
>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>         __x86_64__ || defined
>>>>         > __i386__
>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>         cpuinfo_x86(FILE *file,
>>>>         > odp_system_info_t *sysinfo)
>>>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>>>         __x86_64__ ||
>>>>         > defined __i386__
>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>         "x86",
>>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser
>>>> =
>>>>         > cpuinfo_x86
>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>         __x86_64__ || defined
>>>>         > __i386__ || defined __OCTEON__ || \
>>>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>>>         defined __i386__
>>>>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>>>>         __arm__ ||
>>>>         > defined __aarch64__
>>>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>>>         defined(__arm__)
>>>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>>>         defined __arm__
>>>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>>>         __arm__ || defined
>>>>         > __aarch64__
>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>         cpuinfo_arm(FILE *file
>>>>         > ODP_UNUSED,
>>>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>>>         __arm__ ||
>>>>         > defined __aarch64__
>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>         "arm",
>>>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser
>>>> =
>>>>         > cpuinfo_arm
>>>>         >
>>>>         >
>>>>         >>
>>>>         >>
>>>>         >> Taras Kondratiuk
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         > --
>>>>         > Mike Holmes
>>>>         > Linaro  Sr Technical Manager
>>>>         > LNG - ODP
>>>>
>>>>         _______________________________________________
>>>>         lng-odp mailing list
>>>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Mike Holmes*
>>>> Linaro  Sr Technical Manager
>>>> LNG - ODP
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>
>>
>>
>> --
>> *Mike Holmes*
>> Linaro  Sr Technical Manager
>> LNG - ODP
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>

Mike Holmes Dec. 24, 2014, 3:28 p.m. UTC | #16

On 24 December 2014 at 10:05, Bill Fischofer <bill.fischofer@linaro.org>
wrote:

> I suspect the long-term answer will be to use the autotools to construct
> the actual include hierarchy (and possibly the files themelves) needed for
> a given --with-platform specification.
>

Let me check I have the same terminology in my head - we should have this
defined in the arch doc but I did not look.
platform = a physical board with interfaces etc and a core of some
architecture as a complete solution
arch = a cpu core for example octeon, arm, x86

In this case I then see a need to pick up arch independently
from --with-platform so that the optimizations can be reused by all
platforms with that architecture core. If we dont I would see a need for
platform/linux-x86-generic
platform/linux-arm-generic
platform/linux-octeon-generic


> That would mean the canonical ODP API files may need to be meta-files that
> are used as input to ./configure to generate the actual ones used.
>

If the work Taras and Anders are doing to clean out the API and move it
back to the root succeeds, and it looks good to me, then it opens the door
to some more wriggle room in solving this issue I think.
But I could see the configure step helping take a single API definition and
attaching it to the required optimizations as a solution if it cannot be
done within C / GCC




>
> On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer <bill.fischofer@linaro.org
> > wrote:
>
>> That makes sense.  The only problem is GCC's problems with forward
>> declarations and inlining.  IMO a design issue with the compiler, but it is
>> what it is.  But we do need a general solution to this problem in 2015
>> since proper inlining support will be critical for production-grade
>> implementations.
>>
>> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org>
>> wrote:
>>
>>>
>>>
>>> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org>
>>> wrote:
>>>
>>>> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>>>>
>>>>>
>>>>>
>>>>> On 23 December 2014 at 13:49, Bill Fischofer <
>>>>> bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
>>>>>
>>>>>     That's good to know and certainly the code is cleaner that way.  I
>>>>>     think we should adopt the practice that any architecture-specific
>>>>>     changes to linux-generic code (adds, updates, or removals) must
>>>>>     either originate from or have a Reviewed-by a representative of
>>>>>     that architecture (in this case, Bala).  That way there's no
>>>>>     confusion as to whether the known experts agree with the change,
>>>>>     since these sort of things can have subtle implications that are
>>>>>     not obvious to the layman.
>>>>>
>>>>>
>>>>> I think linux-generic should be devoid of any per core optimizations
>>>>> in its default state.
>>>>> There could be in an arch directory and it would be supported as you
>>>>> say Bill by someone who will care for that that variant, this also avoids
>>>>> filling linux-generic with a possibly large list of preprocessor flags that
>>>>> we can't test.
>>>>>
>>>>> In this case we don't have public access to the octeon to even know if
>>>>> this code works so separating or deletion feels right to me, although
>>>>> Cavium are adding their board in this case.
>>>>>
>>>>>  To have optimizations we should understand which per arch
>>>> optimization we need. I see that if newer gcc supports good optimizations
>>>> then we need to use that gcc. Most of people use Open Embedded and Linaro
>>>> gcc which is almost latest version. So if there is no optimization in
>>>> standard environment (gcc, kernel, libc) then arch implementation should go
>>>> to platform specific git variant. If it's covered by standard environment
>>>> then we simple updating minimal version requirement for required component.
>>>>
>>>>
>>> Lets take an example ODP API file
>>>
>>> static inline void odp_sync_stores(void)
>>> {
>>> #if defined __x86_64__ || defined __i386__
>>>
>>> >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
>>>
>>> #elif defined(__arm__)
>>> #if __ARM_ARCH == 6
>>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>>> >------->------->-------: : "r" (0) : "memory");
>>> #elif __ARM_ARCH >= 7 || defined __aarch64__
>>>
>>> >-------__asm__ __volatile__ ("dmb st" : : : "memory");
>>> #else
>>> >-------__asm__ __volatile__ ("" : : : "memory");
>>> #endif
>>>
>>> #elif defined __OCTEON__
>>>
>>> >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
>>>
>>> #else
>>> >-------__sync_synchronize();
>>> #endif
>>>
>>>
>>> How do we accommodate this without resorting to architecture specific
>>> processor flags / assembly?
>>>
>>> I would prefer to see platform/linux-generic/include/api/odp_sync.h
>>> changed to
>>>
>>> platform/linux-generic/include/api/x86/odp_sync.h
>>> platform/linux-generic/include/api/arm/odp_sync.h
>>> platform/linux-generic/include/api/octeon/odp_sync.h
>>>
>>> This assumes that there is currently no pure c code that can be used
>>> that is acceptable on all three platforms.
>>>
>>> Or since linux-generic is not a performance platform but rather a
>>> reference whenever we need to make a trade off between simplicity and
>>> performance, we alter this to be a regular function and move to:-
>>>
>>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward
>>> declaration of the API with doxygen documentation
>>> and implementations correctly broken down by arch
>>>
>>> arch/x86/odp_sync.h
>>> arch/arm/odp_sync.h
>>> arch/octeon/odp_sync.h
>>>
>>>
>>>
>>>> Maxim.
>>>>
>>>>
>>>>
>>>>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>>>>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>>
>>>>> wrote:
>>>>>
>>>>>         I had some discussions with Bala and we think the GCC
>>>>>         generates the
>>>>>         appropriate code for OCTEON so there should be no need for any
>>>>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>>>>         didn't want
>>>>>         to remove this piece of code when I redesigned the atomic
>>>>> support
>>>>>         because I didn't understand why there is a SYNCW instruction
>>>>>         *before*
>>>>>         the load. Load-acquire type of loads may need sync or barrier
>>>>>         instructions but they come after the load, not before.
>>>>>
>>>>>         I think we should trust GCC to generate better code for OCTEON
>>>>>         than us
>>>>>         mere humans, at least until proven wrong. So please remove this
>>>>>         OCTEON-specific snippet.
>>>>>
>>>>>         -- Ola
>>>>>
>>>>>
>>>>>         On 23 December 2014 at 18:01, Mike Holmes
>>>>>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>>
>>>>> wrote:
>>>>>         >
>>>>>         >
>>>>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>>>>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@
>>>>> linaro.org>>
>>>>>
>>>>>         > wrote:
>>>>>         >>
>>>>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>>>>         >> > how will this work, Octeon does not care about
>>>>>          __atomic_fetch_add(&v,
>>>>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>>>>         assume
>>>>>         >> >
>>>>>         >> > static inline uint32_t
>>>>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>>>>         >> > {
>>>>>         >> > #if defined __OCTEON__
>>>>>         >> >>-------uint32_t ret;
>>>>>         >> >>-------__asm__ __volatile__ ("syncws");
>>>>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>>>>         "+m" (atom) :
>>>>>         >> >>------->------->------- "r" (atom));
>>>>>         >> >>-------return ret;
>>>>>         >> > #else
>>>>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>>>>         __ATOMIC_RELAXED);
>>>>>         >> > #endif
>>>>>         >> > }
>>>>>         >>
>>>>>         >> Linux-generic shouldn't have arch specific parts, because
>>>>> it is
>>>>>         >> *generic*. Instead it can fall back to strong __sync
>>>>>         functions if
>>>>>         >> __atomic are not available.
>>>>>         >>
>>>>>         >
>>>>>         > Linux kernel is generic but has core specifics - is this not
>>>>>         the same ?
>>>>>         >
>>>>>         > What about these cases ?
>>>>>         >
>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>         cpuinfo_octeon(FILE
>>>>>         > *file, odp_system_info_t *sysinfo)
>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>>         "octeon",
>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>> .cpuinfo_parser =
>>>>>         > cpuinfo_octeon
>>>>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>>>>         __x86_64__ ||
>>>>>         > defined __i386__
>>>>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>>>>         __x86_64__ ||
>>>>>         > defined __i386__
>>>>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>>>>         defined __x86_64__ ||
>>>>>         > defined __i386__
>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>         __x86_64__ || defined
>>>>>         > __i386__ || defined __OCTEON__ || \
>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>         __x86_64__ || defined
>>>>>         > __i386__
>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>         cpuinfo_x86(FILE *file,
>>>>>         > odp_system_info_t *sysinfo)
>>>>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>>>>         __x86_64__ ||
>>>>>         > defined __i386__
>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>>         "x86",
>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>> .cpuinfo_parser =
>>>>>         > cpuinfo_x86
>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>         __x86_64__ || defined
>>>>>         > __i386__ || defined __OCTEON__ || \
>>>>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>>>>         defined __i386__
>>>>>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>>>>>         __arm__ ||
>>>>>         > defined __aarch64__
>>>>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>>>>         defined(__arm__)
>>>>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>>>>         defined __arm__
>>>>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>>>>         __arm__ || defined
>>>>>         > __aarch64__
>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>         cpuinfo_arm(FILE *file
>>>>>         > ODP_UNUSED,
>>>>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>>>>         __arm__ ||
>>>>>         > defined __aarch64__
>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>>>>         "arm",
>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>> .cpuinfo_parser =
>>>>>         > cpuinfo_arm
>>>>>         >
>>>>>         >
>>>>>         >>
>>>>>         >>
>>>>>         >> Taras Kondratiuk
>>>>>         >
>>>>>         >
>>>>>         >
>>>>>         >
>>>>>         > --
>>>>>         > Mike Holmes
>>>>>         > Linaro  Sr Technical Manager
>>>>>         > LNG - ODP
>>>>>
>>>>>         _______________________________________________
>>>>>         lng-odp mailing list
>>>>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>>>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Mike Holmes*
>>>>> Linaro  Sr Technical Manager
>>>>> LNG - ODP
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lng-odp mailing list
>>>>> lng-odp@lists.linaro.org
>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>
>>>
>>>
>>> --
>>> *Mike Holmes*
>>> Linaro  Sr Technical Manager
>>> LNG - ODP
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>>
>>
>

Bill Fischofer Dec. 24, 2014, 3:56 p.m. UTC | #17

Getting the ODP API files out of platform/include/api is the first step.
But based on earlier attempts to deal with inlining Taras kept running into
issues because GCC insists that functions to be inlined must have inline in
their forward declarations.  There's no simple way to do that without a lot
of macro ugliness, but I suspect autotools could help with that.

Each ODP API specified in a common header may need to have a
platform-specific inline specification attached to it, depending on what
the implementation needs. If this were an all-or-nothing thing we could do
that with a simple macro that gets set in odp_platform_types.h (or a
similar file).  Call it ODP_API and it's either set to null or to 'static
inline'.

Then the common ODP API files would say:

ODP_API <rest of function prototype>

for example, instead of today saying

void *odp_buffer_addr(odp_buffer_t buf);

the canonical form would be written as

ODP_API void *odp_buffer_addr(odp_buffer_t buf);

and we'd get the needed static inline (or not) depending on how the
implementation is written.  But it's not that simple because each API is
independently inlineable, so a single macro won't do.  We could have a
separate macro for each API, but that gets unwieldy very quickly.

Instead, the API file could contain some common symbols that get
macro-expanded based on ./configure options so that what gets generated
into platform/include/api is what's needed for that platform/arch/etc.

This might also be a way to easily capture platform-specific doxygen since
if the API files are being custom generated then they can have specific
documentation attached to them as well.  That might be a cleaner way of
addressing Petri's earlier concern about odp_config.h containing
linux-generic stuff mixed in with general ODP documentation.

On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org> wrote:

>
>
> On 24 December 2014 at 10:05, Bill Fischofer <bill.fischofer@linaro.org>
> wrote:
>
>> I suspect the long-term answer will be to use the autotools to construct
>> the actual include hierarchy (and possibly the files themelves) needed for
>> a given --with-platform specification.
>>
>
> Let me check I have the same terminology in my head - we should have this
> defined in the arch doc but I did not look.
> platform = a physical board with interfaces etc and a core of some
> architecture as a complete solution
> arch = a cpu core for example octeon, arm, x86
>
> In this case I then see a need to pick up arch independently
> from --with-platform so that the optimizations can be reused by all
> platforms with that architecture core. If we dont I would see a need for
> platform/linux-x86-generic
> platform/linux-arm-generic
> platform/linux-octeon-generic
>
>
>> That would mean the canonical ODP API files may need to be meta-files
>> that are used as input to ./configure to generate the actual ones used.
>>
>
> If the work Taras and Anders are doing to clean out the API and move it
> back to the root succeeds, and it looks good to me, then it opens the door
> to some more wriggle room in solving this issue I think.
> But I could see the configure step helping take a single API definition
> and attaching it to the required optimizations as a solution if it cannot
> be done within C / GCC
>
>
>
>
>>
>> On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer <
>> bill.fischofer@linaro.org> wrote:
>>
>>> That makes sense.  The only problem is GCC's problems with forward
>>> declarations and inlining.  IMO a design issue with the compiler, but it is
>>> what it is.  But we do need a general solution to this problem in 2015
>>> since proper inlining support will be critical for production-grade
>>> implementations.
>>>
>>> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org>
>>>> wrote:
>>>>
>>>>> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 23 December 2014 at 13:49, Bill Fischofer <
>>>>>> bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
>>>>>>
>>>>>>     That's good to know and certainly the code is cleaner that way.  I
>>>>>>     think we should adopt the practice that any architecture-specific
>>>>>>     changes to linux-generic code (adds, updates, or removals) must
>>>>>>     either originate from or have a Reviewed-by a representative of
>>>>>>     that architecture (in this case, Bala).  That way there's no
>>>>>>     confusion as to whether the known experts agree with the change,
>>>>>>     since these sort of things can have subtle implications that are
>>>>>>     not obvious to the layman.
>>>>>>
>>>>>>
>>>>>> I think linux-generic should be devoid of any per core optimizations
>>>>>> in its default state.
>>>>>> There could be in an arch directory and it would be supported as you
>>>>>> say Bill by someone who will care for that that variant, this also avoids
>>>>>> filling linux-generic with a possibly large list of preprocessor flags that
>>>>>> we can't test.
>>>>>>
>>>>>> In this case we don't have public access to the octeon to even know
>>>>>> if this code works so separating or deletion feels right to me, although
>>>>>> Cavium are adding their board in this case.
>>>>>>
>>>>>>  To have optimizations we should understand which per arch
>>>>> optimization we need. I see that if newer gcc supports good optimizations
>>>>> then we need to use that gcc. Most of people use Open Embedded and Linaro
>>>>> gcc which is almost latest version. So if there is no optimization in
>>>>> standard environment (gcc, kernel, libc) then arch implementation should go
>>>>> to platform specific git variant. If it's covered by standard environment
>>>>> then we simple updating minimal version requirement for required component.
>>>>>
>>>>>
>>>> Lets take an example ODP API file
>>>>
>>>> static inline void odp_sync_stores(void)
>>>> {
>>>> #if defined __x86_64__ || defined __i386__
>>>>
>>>> >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
>>>>
>>>> #elif defined(__arm__)
>>>> #if __ARM_ARCH == 6
>>>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>>>> >------->------->-------: : "r" (0) : "memory");
>>>> #elif __ARM_ARCH >= 7 || defined __aarch64__
>>>>
>>>> >-------__asm__ __volatile__ ("dmb st" : : : "memory");
>>>> #else
>>>> >-------__asm__ __volatile__ ("" : : : "memory");
>>>> #endif
>>>>
>>>> #elif defined __OCTEON__
>>>>
>>>> >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
>>>>
>>>> #else
>>>> >-------__sync_synchronize();
>>>> #endif
>>>>
>>>>
>>>> How do we accommodate this without resorting to architecture specific
>>>> processor flags / assembly?
>>>>
>>>> I would prefer to see platform/linux-generic/include/api/odp_sync.h
>>>> changed to
>>>>
>>>> platform/linux-generic/include/api/x86/odp_sync.h
>>>> platform/linux-generic/include/api/arm/odp_sync.h
>>>> platform/linux-generic/include/api/octeon/odp_sync.h
>>>>
>>>> This assumes that there is currently no pure c code that can be used
>>>> that is acceptable on all three platforms.
>>>>
>>>> Or since linux-generic is not a performance platform but rather a
>>>> reference whenever we need to make a trade off between simplicity and
>>>> performance, we alter this to be a regular function and move to:-
>>>>
>>>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward
>>>> declaration of the API with doxygen documentation
>>>> and implementations correctly broken down by arch
>>>>
>>>> arch/x86/odp_sync.h
>>>> arch/arm/odp_sync.h
>>>> arch/octeon/odp_sync.h
>>>>
>>>>
>>>>
>>>>> Maxim.
>>>>>
>>>>>
>>>>>
>>>>>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>>>>>     <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>>
>>>>>> wrote:
>>>>>>
>>>>>>         I had some discussions with Bala and we think the GCC
>>>>>>         generates the
>>>>>>         appropriate code for OCTEON so there should be no need for any
>>>>>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>>>>>         didn't want
>>>>>>         to remove this piece of code when I redesigned the atomic
>>>>>> support
>>>>>>         because I didn't understand why there is a SYNCW instruction
>>>>>>         *before*
>>>>>>         the load. Load-acquire type of loads may need sync or barrier
>>>>>>         instructions but they come after the load, not before.
>>>>>>
>>>>>>         I think we should trust GCC to generate better code for OCTEON
>>>>>>         than us
>>>>>>         mere humans, at least until proven wrong. So please remove
>>>>>> this
>>>>>>         OCTEON-specific snippet.
>>>>>>
>>>>>>         -- Ola
>>>>>>
>>>>>>
>>>>>>         On 23 December 2014 at 18:01, Mike Holmes
>>>>>>         <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>>
>>>>>> wrote:
>>>>>>         >
>>>>>>         >
>>>>>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>>>>>         <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@
>>>>>> linaro.org>>
>>>>>>
>>>>>>         > wrote:
>>>>>>         >>
>>>>>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>>>>>         >> > how will this work, Octeon does not care about
>>>>>>          __atomic_fetch_add(&v,
>>>>>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>>>>>         assume
>>>>>>         >> >
>>>>>>         >> > static inline uint32_t
>>>>>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>>>>>         >> > {
>>>>>>         >> > #if defined __OCTEON__
>>>>>>         >> >>-------uint32_t ret;
>>>>>>         >> >>-------__asm__ __volatile__ ("syncws");
>>>>>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>>>>>         "+m" (atom) :
>>>>>>         >> >>------->------->------- "r" (atom));
>>>>>>         >> >>-------return ret;
>>>>>>         >> > #else
>>>>>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>>>>>         __ATOMIC_RELAXED);
>>>>>>         >> > #endif
>>>>>>         >> > }
>>>>>>         >>
>>>>>>         >> Linux-generic shouldn't have arch specific parts, because
>>>>>> it is
>>>>>>         >> *generic*. Instead it can fall back to strong __sync
>>>>>>         functions if
>>>>>>         >> __atomic are not available.
>>>>>>         >>
>>>>>>         >
>>>>>>         > Linux kernel is generic but has core specifics - is this not
>>>>>>         the same ?
>>>>>>         >
>>>>>>         > What about these cases ?
>>>>>>         >
>>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>>         cpuinfo_octeon(FILE
>>>>>>         > *file, odp_system_info_t *sysinfo)
>>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str
>>>>>> =
>>>>>>         "octeon",
>>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>>> .cpuinfo_parser =
>>>>>>         > cpuinfo_octeon
>>>>>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>>>>>         __x86_64__ ||
>>>>>>         > defined __i386__
>>>>>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>>>>>         __x86_64__ ||
>>>>>>         > defined __i386__
>>>>>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>>>>>         defined __x86_64__ ||
>>>>>>         > defined __i386__
>>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>>         __x86_64__ || defined
>>>>>>         > __i386__ || defined __OCTEON__ || \
>>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>>         __x86_64__ || defined
>>>>>>         > __i386__
>>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>>         cpuinfo_x86(FILE *file,
>>>>>>         > odp_system_info_t *sysinfo)
>>>>>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>>>>>         __x86_64__ ||
>>>>>>         > defined __i386__
>>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str
>>>>>> =
>>>>>>         "x86",
>>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>>> .cpuinfo_parser =
>>>>>>         > cpuinfo_x86
>>>>>>         > platform/linux-generic/odp_system_info.c:#if defined
>>>>>>         __x86_64__ || defined
>>>>>>         > __i386__ || defined __OCTEON__ || \
>>>>>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>>>>>         defined __i386__
>>>>>>         > platform/linux-generic/include/api/odp_align.h:#elif
>>>>>> defined
>>>>>>         __arm__ ||
>>>>>>         > defined __aarch64__
>>>>>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>>>>>         defined(__arm__)
>>>>>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>>>>>         defined __arm__
>>>>>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>>>>>         __arm__ || defined
>>>>>>         > __aarch64__
>>>>>>         > platform/linux-generic/odp_system_info.c:static int
>>>>>>         cpuinfo_arm(FILE *file
>>>>>>         > ODP_UNUSED,
>>>>>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>>>>>         __arm__ ||
>>>>>>         > defined __aarch64__
>>>>>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str
>>>>>> =
>>>>>>         "arm",
>>>>>>         > platform/linux-generic/odp_system_info.c:
>>>>>> .cpuinfo_parser =
>>>>>>         > cpuinfo_arm
>>>>>>         >
>>>>>>         >
>>>>>>         >>
>>>>>>         >>
>>>>>>         >> Taras Kondratiuk
>>>>>>         >
>>>>>>         >
>>>>>>         >
>>>>>>         >
>>>>>>         > --
>>>>>>         > Mike Holmes
>>>>>>         > Linaro  Sr Technical Manager
>>>>>>         > LNG - ODP
>>>>>>
>>>>>>         _______________________________________________
>>>>>>         lng-odp mailing list
>>>>>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>>>>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Mike Holmes*
>>>>>> Linaro  Sr Technical Manager
>>>>>> LNG - ODP
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> lng-odp@lists.linaro.org
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lng-odp mailing list
>>>>> lng-odp@lists.linaro.org
>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Mike Holmes*
>>>> Linaro  Sr Technical Manager
>>>> LNG - ODP
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>
>>
>
>
> --
> *Mike Holmes*
> Linaro  Sr Technical Manager
> LNG - ODP
>

Taras Kondratiuk Dec. 25, 2014, 9:15 a.m. UTC | #18

There is no need to make it so complex. At least until 'static'
before 'not static' functions is forbidden by some post-C11 standard).

Me and Anders are making another try to separate clean API. In a
previous try common (clean) API headers were public. They had
<plat/...> include hooks to pull platform specific definitions. That
appeared to be not flexible enough. Current approach is to make common
API headers private in a sense that application won't include them
directly, but instead implementation will provide public headers and
embedded clean API headers there.

Please check [1] for an example of this approach tried for a few
headers (atomic, byteorder). It seems to be working fine for now. I'm
reworking Keystone implementation based on this approach.

[1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace

On 12/24/2014 05:56 PM, Bill Fischofer wrote:
> Getting the ODP API files out of platform/include/api is the first
> step.  But based on earlier attempts to deal with inlining Taras kept
> running into issues because GCC insists that functions to be inlined
> must have inline in their forward declarations.  There's no simple way
> to do that without a lot of macro ugliness, but I suspect autotools
> could help with that.  
> 
> Each ODP API specified in a common header may need to have a
> platform-specific inline specification attached to it, depending on what
> the implementation needs. If this were an all-or-nothing thing we could
> do that with a simple macro that gets set in odp_platform_types.h (or a
> similar file).  Call it ODP_API and it's either set to null or to
> 'static inline'.
> 
> Then the common ODP API files would say:
> 
> ODP_API <rest of function prototype>
> 
> for example, instead of today saying
> 
> void *odp_buffer_addr(odp_buffer_t buf);
> 
> the canonical form would be written as
> 
> ODP_API void *odp_buffer_addr(odp_buffer_t buf);
> 
> and we'd get the needed static inline (or not) depending on how the
> implementation is written.  But it's not that simple because each API is
> independently inlineable, so a single macro won't do.  We could have a
> separate macro for each API, but that gets unwieldy very quickly.
> 
> Instead, the API file could contain some common symbols that get
> macro-expanded based on ./configure options so that what gets generated
> into platform/include/api is what's needed for that platform/arch/etc.  
> 
> This might also be a way to easily capture platform-specific doxygen
> since if the API files are being custom generated then they can have
> specific documentation attached to them as well.  That might be a
> cleaner way of addressing Petri's earlier concern about odp_config.h
> containing linux-generic stuff mixed in with general ODP documentation.
> 
> On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org
> <mailto:mike.holmes@linaro.org>> wrote:
> 
> 
> 
>     On 24 December 2014 at 10:05, Bill Fischofer
>     <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
> 
>         I suspect the long-term answer will be to use the autotools to
>         construct the actual include hierarchy (and possibly the files
>         themelves) needed for a given --with-platform specification.  
> 
> 
>     Let me check I have the same terminology in my head - we should have
>     this defined in the arch doc but I did not look.
>     platform = a physical board with interfaces etc and a core of some
>     architecture as a complete solution
>     arch = a cpu core for example octeon, arm, x86
> 
>     In this case I then see a need to pick up arch independently
>     from --with-platform so that the optimizations can be reused by all
>     platforms with that architecture core. If we dont I would see a need
>     for 
>     platform/linux-x86-generic
>     platform/linux-arm-generic
>     platform/linux-octeon-generic
>      
> 
>         That would mean the canonical ODP API files may need to be
>         meta-files that are used as input to ./configure to generate the
>         actual ones used.
> 
> 
>     If the work Taras and Anders are doing to clean out the API and move
>     it back to the root succeeds, and it looks good to me, then it opens
>     the door to some more wriggle room in solving this issue I think.
>     But I could see the configure step helping take a single API
>     definition and attaching it to the required optimizations as a
>     solution if it cannot be done within C / GCC
> 
> 
>      
> 
> 
>         On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer
>         <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>>
>         wrote:
> 
>             That makes sense.  The only problem is GCC's problems with
>             forward declarations and inlining.  IMO a design issue with
>             the compiler, but it is what it is.  But we do need a
>             general solution to this problem in 2015 since proper
>             inlining support will be critical for production-grade
>             implementations.
> 
>             On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes
>             <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
> 
> 
> 
>                 On 24 December 2014 at 06:19, Maxim Uvarov
>                 <maxim.uvarov@linaro.org
>                 <mailto:maxim.uvarov@linaro.org>> wrote:
> 
>                     On 12/23/2014 10:40 PM, Mike Holmes wrote:
> 
> 
> 
>                         On 23 December 2014 at 13:49, Bill Fischofer
>                         <bill.fischofer@linaro.org
>                         <mailto:bill.fischofer@linaro.org>
>                         <mailto:bill.fischofer@linaro.__org
>                         <mailto:bill.fischofer@linaro.org>>> wrote:
> 
>                             That's good to know and certainly the code
>                         is cleaner that way.  I
>                             think we should adopt the practice that any
>                         architecture-specific
>                             changes to linux-generic code (adds,
>                         updates, or removals) must
>                             either originate from or have a Reviewed-by
>                         a representative of
>                             that architecture (in this case, Bala). 
>                         That way there's no
>                             confusion as to whether the known experts
>                         agree with the change,
>                             since these sort of things can have subtle
>                         implications that are
>                             not obvious to the layman.
> 
> 
>                         I think linux-generic should be devoid of any
>                         per core optimizations in its default state.
>                         There could be in an arch directory and it would
>                         be supported as you say Bill by someone who will
>                         care for that that variant, this also avoids
>                         filling linux-generic with a possibly large list
>                         of preprocessor flags that we can't test.
> 
>                         In this case we don't have public access to the
>                         octeon to even know if this code works so
>                         separating or deletion feels right to me,
>                         although Cavium are adding their board in this case.
> 
>                     To have optimizations we should understand which per
>                     arch optimization we need. I see that if newer gcc
>                     supports good optimizations then we need to use that
>                     gcc. Most of people use Open Embedded and Linaro gcc
>                     which is almost latest version. So if there is no
>                     optimization in standard environment (gcc, kernel,
>                     libc) then arch implementation should go to platform
>                     specific git variant. If it's covered by standard
>                     environment then we simple updating minimal version
>                     requirement for required component.
> 
> 
>                 Lets take an example ODP API file
> 
>                 static inline void odp_sync_stores(void)
>                 {
>                 #if defined __x86_64__ || defined __i386__
> 
>                 >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
> 
>                 #elif defined(__arm__)
>                 #if __ARM_ARCH == 6
>                 >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>                 >------->------->-------: : "r" (0) : "memory");
>                 #elif __ARM_ARCH >= 7 || defined __aarch64__
> 
>                 >-------__asm__ __volatile__ ("dmb st" : : : "memory");
>                 #else
>                 >-------__asm__ __volatile__ ("" : : : "memory");
>                 #endif
> 
>                 #elif defined __OCTEON__
> 
>                 >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
> 
>                 #else
>                 >-------__sync_synchronize();
>                 #endif
> 
> 
>                 How do we accommodate this without resorting to
>                 architecture specific processor flags / assembly?
> 
>                 I would prefer to
>                 see platform/linux-generic/include/api/odp_sync.h changed to
> 
>                 platform/linux-generic/include/api/x86/odp_sync.h
>                 platform/linux-generic/include/api/arm/odp_sync.h
>                 platform/linux-generic/include/api/octeon/odp_sync.h
> 
>                 This assumes that there is currently no pure c code that
>                 can be used that is acceptable on all three platforms.
> 
>                 Or since linux-generic is not a performance platform but
>                 rather a reference whenever we need to make a trade off
>                 between simplicity and performance, we alter this to be
>                 a regular function and move to:-
> 
>                 platform/linux-generic/include/api/odp_sync.h <- now a
>                 pure forward declaration of the API with doxygen
>                 documentation
>                 and implementations correctly broken down by arch
> 
>                 arch/x86/odp_sync.h
>                 arch/arm/odp_sync.h
>                 arch/octeon/odp_sync.h
> 
>                  
> 
>                     Maxim.
> 
> 
> 
>                             On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>                             <ola.liljedahl@linaro.org
>                         <mailto:ola.liljedahl@linaro.org>
>                         <mailto:ola.liljedahl@linaro.__org
>                         <mailto:ola.liljedahl@linaro.org>>> wrote:
> 
>                                 I had some discussions with Bala and we
>                         think the GCC
>                                 generates the
>                                 appropriate code for OCTEON so there
>                         should be no need for any
>                                 OCTEON-specific inline assembler in
>                         odp_atomic.h. I just
>                                 didn't want
>                                 to remove this piece of code when I
>                         redesigned the atomic support
>                                 because I didn't understand why there is
>                         a SYNCW instruction
>                                 *before*
>                                 the load. Load-acquire type of loads may
>                         need sync or barrier
>                                 instructions but they come after the
>                         load, not before.
> 
>                                 I think we should trust GCC to generate
>                         better code for OCTEON
>                                 than us
>                                 mere humans, at least until proven
>                         wrong. So please remove this
>                                 OCTEON-specific snippet.
> 
>                                 -- Ola
> 
> 
>                                 On 23 December 2014 at 18:01, Mike Holmes
>                                 <mike.holmes@linaro.org
>                         <mailto:mike.holmes@linaro.org>
>                         <mailto:mike.holmes@linaro.org
>                         <mailto:mike.holmes@linaro.org>__>> wrote:
>                                 >
>                                 >
>                                 > On 23 December 2014 at 11:15, Taras
>                         Kondratiuk
>                                 <taras.kondratiuk@linaro.org
>                         <mailto:taras.kondratiuk@linaro.org>
>                         <mailto:taras.kondratiuk@__linaro.org
>                         <mailto:taras.kondratiuk@linaro.org>>>
> 
>                                 > wrote:
>                                 >>
>                                 >> On 12/23/2014 05:28 PM, Mike Holmes
>                         wrote:
>                                 >> > how will this work, Octeon does not
>                         care about
>                                  __atomic_fetch_add(&v,
>                                 >> > 1, __ATOMIC_RELAXED); and yet you
>                         will fail configure I
>                                 assume
>                                 >> >
>                                 >> > static inline uint32_t
>                                
>                         odp_atomic_fetch_inc_u32(odp___atomic_u32_t *atom)
>                                 >> > {
>                                 >> > #if defined __OCTEON__
>                                 >> >>-------uint32_t ret;
>                                 >> >>-------__asm__ __volatile__ ("syncws");
>                                 >> >>-------__asm__ __volatile__ ("lai
>                         %0,(%2)" : "=r" (ret),
>                                 "+m" (atom) :
>                                 >> >>------->------->------- "r" (atom));
>                                 >> >>-------return ret;
>                                 >> > #else
>                                 >> >>-------return
>                         __atomic_fetch_add(&atom->v, 1,
>                                 __ATOMIC_RELAXED);
>                                 >> > #endif
>                                 >> > }
>                                 >>
>                                 >> Linux-generic shouldn't have arch
>                         specific parts, because it is
>                                 >> *generic*. Instead it can fall back
>                         to strong __sync
>                                 functions if
>                                 >> __atomic are not available.
>                                 >>
>                                 >
>                                 > Linux kernel is generic but has core
>                         specifics - is this not
>                                 the same ?
>                                 >
>                                 > What about these cases ?
>                                 >
>                                 >
>                         platform/linux-generic/odp___system_info.c:static int
>                                 cpuinfo_octeon(FILE
>                                 > *file, odp_system_info_t *sysinfo)
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpu_arch_str =
>                                 "octeon",
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpuinfo_parser =
>                                 > cpuinfo_octeon
>                                 >
>                         platform/linux-generic/__include/api/odp_align.h:#if
>                         defined
>                                 __x86_64__ ||
>                                 > defined __i386__
>                                 >
>                         platform/linux-generic/__include/api/odp_sync.h:#if
>                         defined
>                                 __x86_64__ ||
>                                 > defined __i386__
>                                 >
>                         platform/linux-generic/__include/odp_spin_internal.h:#__if
>                                 defined __x86_64__ ||
>                                 > defined __i386__
>                                 >
>                         platform/linux-generic/odp___system_info.c:#if
>                         defined
>                                 __x86_64__ || defined
>                                 > __i386__ || defined __OCTEON__ || \
>                                 >
>                         platform/linux-generic/odp___system_info.c:#if
>                         defined
>                                 __x86_64__ || defined
>                                 > __i386__
>                                 >
>                         platform/linux-generic/odp___system_info.c:static int
>                                 cpuinfo_x86(FILE *file,
>                                 > odp_system_info_t *sysinfo)
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         #if defined
>                                 __x86_64__ ||
>                                 > defined __i386__
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpu_arch_str =
>                                 "x86",
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpuinfo_parser =
>                                 > cpuinfo_x86
>                                 >
>                         platform/linux-generic/odp___system_info.c:#if
>                         defined
>                                 __x86_64__ || defined
>                                 > __i386__ || defined __OCTEON__ || \
>                                 >
>                         platform/linux-generic/odp___time.c:#if defined
>                         __x86_64__ ||
>                                 defined __i386__
>                                 >
>                         platform/linux-generic/__include/api/odp_align.h:#elif
>                         defined
>                                 __arm__ ||
>                                 > defined __aarch64__
>                                 >
>                         platform/linux-generic/__include/api/odp_sync.h:#elif
>                                 defined(__arm__)
>                                 >
>                         platform/linux-generic/__include/odp_spin_internal.h:#__elif
>                                 defined __arm__
>                                 >
>                         platform/linux-generic/odp___system_info.c:#elif
>                         defined
>                                 __arm__ || defined
>                                 > __aarch64__
>                                 >
>                         platform/linux-generic/odp___system_info.c:static int
>                                 cpuinfo_arm(FILE *file
>                                 > ODP_UNUSED,
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         #elif defined
>                                 __arm__ ||
>                                 > defined __aarch64__
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpu_arch_str =
>                                 "arm",
>                                 >
>                         platform/linux-generic/odp___system_info.c:   
>                         .cpuinfo_parser =
>                                 > cpuinfo_arm
>                                 >
>                                 >
>                                 >>
>                                 >>
>                                 >> Taras Kondratiuk
>                                 >
>                                 >
>                                 >
>                                 >
>                                 > --
>                                 > Mike Holmes
>                                 > Linaro  Sr Technical Manager
>                                 > LNG - ODP
> 
>                                
>                         _________________________________________________
>                                 lng-odp mailing list
>                                 lng-odp@lists.linaro.org
>                         <mailto:lng-odp@lists.linaro.org>
>                         <mailto:lng-odp@lists.linaro.__org
>                         <mailto:lng-odp@lists.linaro.org>>
>                                
>                         http://lists.linaro.org/__mailman/listinfo/lng-odp
>                         <http://lists.linaro.org/mailman/listinfo/lng-odp>
> 
> 
> 
> 
>                         -- 
>                         *Mike Holmes*
>                         Linaro  Sr Technical Manager
>                         LNG - ODP
> 
> 
>                         _________________________________________________
>                         lng-odp mailing list
>                         lng-odp@lists.linaro.org
>                         <mailto:lng-odp@lists.linaro.org>
>                         http://lists.linaro.org/__mailman/listinfo/lng-odp
>                         <http://lists.linaro.org/mailman/listinfo/lng-odp>
> 
> 
> 
>                     _________________________________________________
>                     lng-odp mailing list
>                     lng-odp@lists.linaro.org
>                     <mailto:lng-odp@lists.linaro.org>
>                     http://lists.linaro.org/__mailman/listinfo/lng-odp
>                     <http://lists.linaro.org/mailman/listinfo/lng-odp>
> 
> 
> 
> 
>                 -- 
>                 *Mike Holmes* 
>                 Linaro  Sr Technical Manager
>                 LNG - ODP
> 
>                 _______________________________________________
>                 lng-odp mailing list
>                 lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>                 http://lists.linaro.org/mailman/listinfo/lng-odp
> 
> 
> 
> 
> 
> 
>     -- 
>     *Mike Holmes* 
>     Linaro  Sr Technical Manager
>     LNG - ODP
> 
> 
> 
> 
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

Taras Kondratiuk Dec. 25, 2014, 9:34 a.m. UTC | #19

On 12/24/2014 12:21 PM, Maxim Uvarov wrote:
> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote:
>> Linux-generic shouldn't have arch specific parts, because it is
>> *generic*. Instead it can fall back to strong __sync functions if
>> __atomic are not available.
> 
> Taras I think that even linux-generic should have it's own requirements. 
> It might be compiler version.
> Dependence on libssl. Kernel version and etc.
> 
> We do not say that linux-generic should work on any linux, even 20 years 
> old. We just define needed requirements
> to compile it. The same is with atomics. Instead of creating 20 branches 
> for all gcc and not gcc version we just
> stick with more fresh gcc and say that it's minimal requirement to run 
> linux-generic.

I don't think it worth to make such a tight restriction to use only gcc
4.7+. Having fallback to __sync_* function will relax it to something
like gcc 4.1+. Sure performance will be worse, but it should be ok for
linux-generic.

Maxim Uvarov Dec. 25, 2014, 9:53 a.m. UTC | #20

On 12/25/2014 12:34 PM, Taras Kondratiuk wrote:
> On 12/24/2014 12:21 PM, Maxim Uvarov wrote:
>> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote:
>>> Linux-generic shouldn't have arch specific parts, because it is
>>> *generic*. Instead it can fall back to strong __sync functions if
>>> __atomic are not available.
>> Taras I think that even linux-generic should have it's own requirements.
>> It might be compiler version.
>> Dependence on libssl. Kernel version and etc.
>>
>> We do not say that linux-generic should work on any linux, even 20 years
>> old. We just define needed requirements
>> to compile it. The same is with atomics. Instead of creating 20 branches
>> for all gcc and not gcc version we just
>> stick with more fresh gcc and say that it's minimal requirement to run
>> linux-generic.
> I don't think it worth to make such a tight restriction to use only gcc
> 4.7+. Having fallback to __sync_* function will relax it to something
> like gcc 4.1+. Sure performance will be worse, but it should be ok for
> linux-generic.
>
Why it is ok? If possible we need to make linux-generic version fast.
Support of old gcc is not required. If it's needed for somebody he can
do this in his own platform.

Maxim.

Maxim Uvarov Dec. 25, 2014, 10:23 a.m. UTC | #21

On 12/25/2014 12:15 PM, Taras Kondratiuk wrote:
> There is no need to make it so complex. At least until 'static'
> before 'not static' functions is forbidden by some post-C11 standard).
>
> Me and Anders are making another try to separate clean API. In a
> previous try common (clean) API headers were public. They had
> <plat/...> include hooks to pull platform specific definitions. That
> appeared to be not flexible enough. Current approach is to make common
> API headers private in a sense that application won't include them
> directly, but instead implementation will provide public headers and
> embedded clean API headers there.
>
> Please check [1] for an example of this approach tried for a few
> headers (atomic, byteorder). It seems to be working fine for now. I'm
> reworking Keystone implementation based on this approach.
>
> [1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace

I see from your patch you are doing:

rename from platform/linux-generic/include/api/odp/buffer.h
rename to include/api/odp/buffer.h

"api" looks like is not needed and it can be simple include/buffer.h


btw,  what should I do with thread patch? Should I send updated version?

Maxim.


> On 12/24/2014 05:56 PM, Bill Fischofer wrote:
>> Getting the ODP API files out of platform/include/api is the first
>> step.  But based on earlier attempts to deal with inlining Taras kept
>> running into issues because GCC insists that functions to be inlined
>> must have inline in their forward declarations.  There's no simple way
>> to do that without a lot of macro ugliness, but I suspect autotools
>> could help with that.
>>
>> Each ODP API specified in a common header may need to have a
>> platform-specific inline specification attached to it, depending on what
>> the implementation needs. If this were an all-or-nothing thing we could
>> do that with a simple macro that gets set in odp_platform_types.h (or a
>> similar file).  Call it ODP_API and it's either set to null or to
>> 'static inline'.
>>
>> Then the common ODP API files would say:
>>
>> ODP_API <rest of function prototype>
>>
>> for example, instead of today saying
>>
>> void *odp_buffer_addr(odp_buffer_t buf);
>>
>> the canonical form would be written as
>>
>> ODP_API void *odp_buffer_addr(odp_buffer_t buf);
>>
>> and we'd get the needed static inline (or not) depending on how the
>> implementation is written.  But it's not that simple because each API is
>> independently inlineable, so a single macro won't do.  We could have a
>> separate macro for each API, but that gets unwieldy very quickly.
>>
>> Instead, the API file could contain some common symbols that get
>> macro-expanded based on ./configure options so that what gets generated
>> into platform/include/api is what's needed for that platform/arch/etc.
>>
>> This might also be a way to easily capture platform-specific doxygen
>> since if the API files are being custom generated then they can have
>> specific documentation attached to them as well.  That might be a
>> cleaner way of addressing Petri's earlier concern about odp_config.h
>> containing linux-generic stuff mixed in with general ODP documentation.
>>
>> On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org
>> <mailto:mike.holmes@linaro.org>> wrote:
>>
>>
>>
>>      On 24 December 2014 at 10:05, Bill Fischofer
>>      <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote:
>>
>>          I suspect the long-term answer will be to use the autotools to
>>          construct the actual include hierarchy (and possibly the files
>>          themelves) needed for a given --with-platform specification.
>>
>>
>>      Let me check I have the same terminology in my head - we should have
>>      this defined in the arch doc but I did not look.
>>      platform = a physical board with interfaces etc and a core of some
>>      architecture as a complete solution
>>      arch = a cpu core for example octeon, arm, x86
>>
>>      In this case I then see a need to pick up arch independently
>>      from --with-platform so that the optimizations can be reused by all
>>      platforms with that architecture core. If we dont I would see a need
>>      for
>>      platform/linux-x86-generic
>>      platform/linux-arm-generic
>>      platform/linux-octeon-generic
>>       
>>
>>          That would mean the canonical ODP API files may need to be
>>          meta-files that are used as input to ./configure to generate the
>>          actual ones used.
>>
>>
>>      If the work Taras and Anders are doing to clean out the API and move
>>      it back to the root succeeds, and it looks good to me, then it opens
>>      the door to some more wriggle room in solving this issue I think.
>>      But I could see the configure step helping take a single API
>>      definition and attaching it to the required optimizations as a
>>      solution if it cannot be done within C / GCC
>>
>>
>>       
>>
>>
>>          On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer
>>          <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>>
>>          wrote:
>>
>>              That makes sense.  The only problem is GCC's problems with
>>              forward declarations and inlining.  IMO a design issue with
>>              the compiler, but it is what it is.  But we do need a
>>              general solution to this problem in 2015 since proper
>>              inlining support will be critical for production-grade
>>              implementations.
>>
>>              On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes
>>              <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote:
>>
>>
>>
>>                  On 24 December 2014 at 06:19, Maxim Uvarov
>>                  <maxim.uvarov@linaro.org
>>                  <mailto:maxim.uvarov@linaro.org>> wrote:
>>
>>                      On 12/23/2014 10:40 PM, Mike Holmes wrote:
>>
>>
>>
>>                          On 23 December 2014 at 13:49, Bill Fischofer
>>                          <bill.fischofer@linaro.org
>>                          <mailto:bill.fischofer@linaro.org>
>>                          <mailto:bill.fischofer@linaro.__org
>>                          <mailto:bill.fischofer@linaro.org>>> wrote:
>>
>>                              That's good to know and certainly the code
>>                          is cleaner that way.  I
>>                              think we should adopt the practice that any
>>                          architecture-specific
>>                              changes to linux-generic code (adds,
>>                          updates, or removals) must
>>                              either originate from or have a Reviewed-by
>>                          a representative of
>>                              that architecture (in this case, Bala).
>>                          That way there's no
>>                              confusion as to whether the known experts
>>                          agree with the change,
>>                              since these sort of things can have subtle
>>                          implications that are
>>                              not obvious to the layman.
>>
>>
>>                          I think linux-generic should be devoid of any
>>                          per core optimizations in its default state.
>>                          There could be in an arch directory and it would
>>                          be supported as you say Bill by someone who will
>>                          care for that that variant, this also avoids
>>                          filling linux-generic with a possibly large list
>>                          of preprocessor flags that we can't test.
>>
>>                          In this case we don't have public access to the
>>                          octeon to even know if this code works so
>>                          separating or deletion feels right to me,
>>                          although Cavium are adding their board in this case.
>>
>>                      To have optimizations we should understand which per
>>                      arch optimization we need. I see that if newer gcc
>>                      supports good optimizations then we need to use that
>>                      gcc. Most of people use Open Embedded and Linaro gcc
>>                      which is almost latest version. So if there is no
>>                      optimization in standard environment (gcc, kernel,
>>                      libc) then arch implementation should go to platform
>>                      specific git variant. If it's covered by standard
>>                      environment then we simple updating minimal version
>>                      requirement for required component.
>>
>>
>>                  Lets take an example ODP API file
>>
>>                  static inline void odp_sync_stores(void)
>>                  {
>>                  #if defined __x86_64__ || defined __i386__
>>
>>                  >-------__asm__  __volatile__ ("sfence\n" : : : "memory");
>>
>>                  #elif defined(__arm__)
>>                  #if __ARM_ARCH == 6
>>                  >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>>                  >------->------->-------: : "r" (0) : "memory");
>>                  #elif __ARM_ARCH >= 7 || defined __aarch64__
>>
>>                  >-------__asm__ __volatile__ ("dmb st" : : : "memory");
>>                  #else
>>                  >-------__asm__ __volatile__ ("" : : : "memory");
>>                  #endif
>>
>>                  #elif defined __OCTEON__
>>
>>                  >-------__asm__  __volatile__ ("syncws\n" : : : "memory");
>>
>>                  #else
>>                  >-------__sync_synchronize();
>>                  #endif
>>
>>
>>                  How do we accommodate this without resorting to
>>                  architecture specific processor flags / assembly?
>>
>>                  I would prefer to
>>                  see platform/linux-generic/include/api/odp_sync.h changed to
>>
>>                  platform/linux-generic/include/api/x86/odp_sync.h
>>                  platform/linux-generic/include/api/arm/odp_sync.h
>>                  platform/linux-generic/include/api/octeon/odp_sync.h
>>
>>                  This assumes that there is currently no pure c code that
>>                  can be used that is acceptable on all three platforms.
>>
>>                  Or since linux-generic is not a performance platform but
>>                  rather a reference whenever we need to make a trade off
>>                  between simplicity and performance, we alter this to be
>>                  a regular function and move to:-
>>
>>                  platform/linux-generic/include/api/odp_sync.h <- now a
>>                  pure forward declaration of the API with doxygen
>>                  documentation
>>                  and implementations correctly broken down by arch
>>
>>                  arch/x86/odp_sync.h
>>                  arch/arm/odp_sync.h
>>                  arch/octeon/odp_sync.h
>>
>>                   
>>
>>                      Maxim.
>>
>>
>>
>>                              On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>                              <ola.liljedahl@linaro.org
>>                          <mailto:ola.liljedahl@linaro.org>
>>                          <mailto:ola.liljedahl@linaro.__org
>>                          <mailto:ola.liljedahl@linaro.org>>> wrote:
>>
>>                                  I had some discussions with Bala and we
>>                          think the GCC
>>                                  generates the
>>                                  appropriate code for OCTEON so there
>>                          should be no need for any
>>                                  OCTEON-specific inline assembler in
>>                          odp_atomic.h. I just
>>                                  didn't want
>>                                  to remove this piece of code when I
>>                          redesigned the atomic support
>>                                  because I didn't understand why there is
>>                          a SYNCW instruction
>>                                  *before*
>>                                  the load. Load-acquire type of loads may
>>                          need sync or barrier
>>                                  instructions but they come after the
>>                          load, not before.
>>
>>                                  I think we should trust GCC to generate
>>                          better code for OCTEON
>>                                  than us
>>                                  mere humans, at least until proven
>>                          wrong. So please remove this
>>                                  OCTEON-specific snippet.
>>
>>                                  -- Ola
>>
>>
>>                                  On 23 December 2014 at 18:01, Mike Holmes
>>                                  <mike.holmes@linaro.org
>>                          <mailto:mike.holmes@linaro.org>
>>                          <mailto:mike.holmes@linaro.org
>>                          <mailto:mike.holmes@linaro.org>__>> wrote:
>>                                  >
>>                                  >
>>                                  > On 23 December 2014 at 11:15, Taras
>>                          Kondratiuk
>>                                  <taras.kondratiuk@linaro.org
>>                          <mailto:taras.kondratiuk@linaro.org>
>>                          <mailto:taras.kondratiuk@__linaro.org
>>                          <mailto:taras.kondratiuk@linaro.org>>>
>>
>>                                  > wrote:
>>                                  >>
>>                                  >> On 12/23/2014 05:28 PM, Mike Holmes
>>                          wrote:
>>                                  >> > how will this work, Octeon does not
>>                          care about
>>                                   __atomic_fetch_add(&v,
>>                                  >> > 1, __ATOMIC_RELAXED); and yet you
>>                          will fail configure I
>>                                  assume
>>                                  >> >
>>                                  >> > static inline uint32_t
>>                                 
>>                          odp_atomic_fetch_inc_u32(odp___atomic_u32_t *atom)
>>                                  >> > {
>>                                  >> > #if defined __OCTEON__
>>                                  >> >>-------uint32_t ret;
>>                                  >> >>-------__asm__ __volatile__ ("syncws");
>>                                  >> >>-------__asm__ __volatile__ ("lai
>>                          %0,(%2)" : "=r" (ret),
>>                                  "+m" (atom) :
>>                                  >> >>------->------->------- "r" (atom));
>>                                  >> >>-------return ret;
>>                                  >> > #else
>>                                  >> >>-------return
>>                          __atomic_fetch_add(&atom->v, 1,
>>                                  __ATOMIC_RELAXED);
>>                                  >> > #endif
>>                                  >> > }
>>                                  >>
>>                                  >> Linux-generic shouldn't have arch
>>                          specific parts, because it is
>>                                  >> *generic*. Instead it can fall back
>>                          to strong __sync
>>                                  functions if
>>                                  >> __atomic are not available.
>>                                  >>
>>                                  >
>>                                  > Linux kernel is generic but has core
>>                          specifics - is this not
>>                                  the same ?
>>                                  >
>>                                  > What about these cases ?
>>                                  >
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:static int
>>                                  cpuinfo_octeon(FILE
>>                                  > *file, odp_system_info_t *sysinfo)
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpu_arch_str =
>>                                  "octeon",
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpuinfo_parser =
>>                                  > cpuinfo_octeon
>>                                  >
>>                          platform/linux-generic/__include/api/odp_align.h:#if
>>                          defined
>>                                  __x86_64__ ||
>>                                  > defined __i386__
>>                                  >
>>                          platform/linux-generic/__include/api/odp_sync.h:#if
>>                          defined
>>                                  __x86_64__ ||
>>                                  > defined __i386__
>>                                  >
>>                          platform/linux-generic/__include/odp_spin_internal.h:#__if
>>                                  defined __x86_64__ ||
>>                                  > defined __i386__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:#if
>>                          defined
>>                                  __x86_64__ || defined
>>                                  > __i386__ || defined __OCTEON__ || \
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:#if
>>                          defined
>>                                  __x86_64__ || defined
>>                                  > __i386__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:static int
>>                                  cpuinfo_x86(FILE *file,
>>                                  > odp_system_info_t *sysinfo)
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          #if defined
>>                                  __x86_64__ ||
>>                                  > defined __i386__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpu_arch_str =
>>                                  "x86",
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpuinfo_parser =
>>                                  > cpuinfo_x86
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:#if
>>                          defined
>>                                  __x86_64__ || defined
>>                                  > __i386__ || defined __OCTEON__ || \
>>                                  >
>>                          platform/linux-generic/odp___time.c:#if defined
>>                          __x86_64__ ||
>>                                  defined __i386__
>>                                  >
>>                          platform/linux-generic/__include/api/odp_align.h:#elif
>>                          defined
>>                                  __arm__ ||
>>                                  > defined __aarch64__
>>                                  >
>>                          platform/linux-generic/__include/api/odp_sync.h:#elif
>>                                  defined(__arm__)
>>                                  >
>>                          platform/linux-generic/__include/odp_spin_internal.h:#__elif
>>                                  defined __arm__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:#elif
>>                          defined
>>                                  __arm__ || defined
>>                                  > __aarch64__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:static int
>>                                  cpuinfo_arm(FILE *file
>>                                  > ODP_UNUSED,
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          #elif defined
>>                                  __arm__ ||
>>                                  > defined __aarch64__
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpu_arch_str =
>>                                  "arm",
>>                                  >
>>                          platform/linux-generic/odp___system_info.c:
>>                          .cpuinfo_parser =
>>                                  > cpuinfo_arm
>>                                  >
>>                                  >
>>                                  >>
>>                                  >>
>>                                  >> Taras Kondratiuk
>>                                  >
>>                                  >
>>                                  >
>>                                  >
>>                                  > --
>>                                  > Mike Holmes
>>                                  > Linaro  Sr Technical Manager
>>                                  > LNG - ODP
>>
>>                                 
>>                          _________________________________________________
>>                                  lng-odp mailing list
>>                                  lng-odp@lists.linaro.org
>>                          <mailto:lng-odp@lists.linaro.org>
>>                          <mailto:lng-odp@lists.linaro.__org
>>                          <mailto:lng-odp@lists.linaro.org>>
>>                                 
>>                          http://lists.linaro.org/__mailman/listinfo/lng-odp
>>                          <http://lists.linaro.org/mailman/listinfo/lng-odp>
>>
>>
>>
>>
>>                          --
>>                          *Mike Holmes*
>>                          Linaro  Sr Technical Manager
>>                          LNG - ODP
>>
>>
>>                          _________________________________________________
>>                          lng-odp mailing list
>>                          lng-odp@lists.linaro.org
>>                          <mailto:lng-odp@lists.linaro.org>
>>                          http://lists.linaro.org/__mailman/listinfo/lng-odp
>>                          <http://lists.linaro.org/mailman/listinfo/lng-odp>
>>
>>
>>
>>                      _________________________________________________
>>                      lng-odp mailing list
>>                      lng-odp@lists.linaro.org
>>                      <mailto:lng-odp@lists.linaro.org>
>>                      http://lists.linaro.org/__mailman/listinfo/lng-odp
>>                      <http://lists.linaro.org/mailman/listinfo/lng-odp>
>>
>>
>>
>>
>>                  --
>>                  *Mike Holmes*
>>                  Linaro  Sr Technical Manager
>>                  LNG - ODP
>>
>>                  _______________________________________________
>>                  lng-odp mailing list
>>                  lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>                  http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>
>>
>>
>>      --
>>      *Mike Holmes*
>>      Linaro  Sr Technical Manager
>>      LNG - ODP
>>
>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>

Taras Kondratiuk Dec. 25, 2014, 10:55 a.m. UTC | #22

On 12/25/2014 11:53 AM, Maxim Uvarov wrote:
> On 12/25/2014 12:34 PM, Taras Kondratiuk wrote:
>> On 12/24/2014 12:21 PM, Maxim Uvarov wrote:
>>> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote:
>>>> Linux-generic shouldn't have arch specific parts, because it is
>>>> *generic*. Instead it can fall back to strong __sync functions if
>>>> __atomic are not available.
>>> Taras I think that even linux-generic should have it's own requirements.
>>> It might be compiler version.
>>> Dependence on libssl. Kernel version and etc.
>>>
>>> We do not say that linux-generic should work on any linux, even 20 years
>>> old. We just define needed requirements
>>> to compile it. The same is with atomics. Instead of creating 20 branches
>>> for all gcc and not gcc version we just
>>> stick with more fresh gcc and say that it's minimal requirement to run
>>> linux-generic.
>> I don't think it worth to make such a tight restriction to use only gcc
>> 4.7+. Having fallback to __sync_* function will relax it to something
>> like gcc 4.1+. Sure performance will be worse, but it should be ok for
>> linux-generic.
>>
> Why it is ok? If possible we need to make linux-generic version fast.
> Support of old gcc is not required. If it's needed for somebody he can
> do this in his own platform.

Because performance is not the main target for linux-generic.

I'm a bit puzzled with your arguments so want to clarify your point.
I see three options here:
1. __atomic_* only, so we don't support older compilers.
2. __atomic_* with fallback to __sync_*.
3. __atomic_* and inline assembler for some architectures.

Currently we have #3 in the code. As far as I understand you are for #1.
Right? I'd support #2.

Taras Kondratiuk Dec. 25, 2014, 10:59 a.m. UTC | #23

On 12/25/2014 12:23 PM, Maxim Uvarov wrote:
> On 12/25/2014 12:15 PM, Taras Kondratiuk wrote:
>> There is no need to make it so complex. At least until 'static'
>> before 'not static' functions is forbidden by some post-C11 standard).
>>
>> Me and Anders are making another try to separate clean API. In a
>> previous try common (clean) API headers were public. They had
>> <plat/...> include hooks to pull platform specific definitions. That
>> appeared to be not flexible enough. Current approach is to make common
>> API headers private in a sense that application won't include them
>> directly, but instead implementation will provide public headers and
>> embedded clean API headers there.
>>
>> Please check [1] for an example of this approach tried for a few
>> headers (atomic, byteorder). It seems to be working fine for now. I'm
>> reworking Keystone implementation based on this approach.
>>
>> [1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace
> 
> I see from your patch you are doing:
> 
> rename from platform/linux-generic/include/api/odp/buffer.h
> rename to include/api/odp/buffer.h
> 
> "api" looks like is not needed and it can be simple include/buffer.h

The exact path can be changed. It is just a demo of the approach.

> 
> btw,  what should I do with thread patch? Should I send updated version?

I think atomic code should be fixed first not to use inline assembler,
before you can add checker in configure.

Maxim Uvarov Dec. 25, 2014, 11:06 a.m. UTC | #24

On 12/25/2014 01:55 PM, Taras Kondratiuk wrote:
> On 12/25/2014 11:53 AM, Maxim Uvarov wrote:
>> On 12/25/2014 12:34 PM, Taras Kondratiuk wrote:
>>> On 12/24/2014 12:21 PM, Maxim Uvarov wrote:
>>>> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote:
>>>>> Linux-generic shouldn't have arch specific parts, because it is
>>>>> *generic*. Instead it can fall back to strong __sync functions if
>>>>> __atomic are not available.
>>>> Taras I think that even linux-generic should have it's own requirements.
>>>> It might be compiler version.
>>>> Dependence on libssl. Kernel version and etc.
>>>>
>>>> We do not say that linux-generic should work on any linux, even 20 years
>>>> old. We just define needed requirements
>>>> to compile it. The same is with atomics. Instead of creating 20 branches
>>>> for all gcc and not gcc version we just
>>>> stick with more fresh gcc and say that it's minimal requirement to run
>>>> linux-generic.
>>> I don't think it worth to make such a tight restriction to use only gcc
>>> 4.7+. Having fallback to __sync_* function will relax it to something
>>> like gcc 4.1+. Sure performance will be worse, but it should be ok for
>>> linux-generic.
>>>
>> Why it is ok? If possible we need to make linux-generic version fast.
>> Support of old gcc is not required. If it's needed for somebody he can
>> do this in his own platform.
> Because performance is not the main target for linux-generic.
>
> I'm a bit puzzled with your arguments so want to clarify your point.
> I see three options here:
> 1. __atomic_* only, so we don't support older compilers.
> 2. __atomic_* with fallback to __sync_*.
> 3. __atomic_* and inline assembler for some architectures.
>
> Currently we have #3 in the code. As far as I understand you are for #1.
> Right? I'd support #2.
>
Yes, currently we have #3. And what I understood from all discussions is 
to go with #1.
At lease fall-back was never discussed.

Ola's patch bellow replaced __sync with __atomic. If we need #2 
fall-back then we need restore original __sync code.
But before doing this we need to figure out who actually requires this.

commit 8363d4608e67ecdd000fb77b440ec1707d1cc6f6
Author: Ola Liljedahl <ola.liljedahl@linaro.org>
Date:   Mon Nov 24 23:38:49 2014 +0100

     api: odp_atomic.h: struct type, relaxed mm, missing funcs, use __atomic


Maxim.

[PATCHv3] configure.ac check for atomic operations support

Commit Message

Comments

Patch