[RFC] Add ipc.h

Message ID	1431986627-16629-1-git-send-email-ola.liljedahl@linaro.org
State	New
Headers	show Return-Path: <patchwork-forward+bncBDK5FOHVXUMBBU6D5GVAKGQEAUBYCYI@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.49 as permitted sender) client-ip=209.85.215.49; Received-SPF: pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) client-ip=54.225.227.206; From: Ola Liljedahl <ola.liljedahl@linaro.org> To: lng-odp@lists.linaro.org Date: Tue, 19 May 2015 00:03:47 +0200 Message-Id: <1431986627-16629-1-git-send-email-ola.liljedahl@linaro.org> Subject: [lng-odp] [RFC] Add ipc.h Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: lng-odp-bounces@lists.linaro.org Sender: "lng-odp" <lng-odp-bounces@lists.linaro.org> Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org

Ola Liljedahl May 18, 2015, 10:03 p.m. UTC

As promised, here is my first attempt at a standalone API for IPC - inter
process communication in a shared nothing architecture (message passing
between processes which do not share memory).

Currently all definitions are in the file ipc.h but it is possible to
break out some message/event related definitions (everything from
odp_ipc_sender) in a separate file message.h. This would mimic the
packet_io.h/packet.h separation.

The semantics of message passing is that sending a message to an endpoint
will always look like it succeeds. The appearance of endpoints is explicitly
notified through user-defined messages specified in the odp_ipc_resolve()
call. Similarly, the disappearance (e.g. death or otherwise lost connection)
is also explicitly notified through user-defined messages specified in the
odp_ipc_monitor() call. The send call does not fail because the addressed
endpoints has disappeared.

Messages (from endpoint A to endpoint B) are delivered in order. If message
N sent to an endpoint is delivered, then all messages <N have also been
delivered. Message delivery does not guarantee actual processing by the
recipient. End-to-end acknowledgements (using messages) should be used if
this guarantee is important to the user.

IPC endpoints can be seen as interfaces (taps) to an internal reliable
multidrop network where each endpoint has a unique address which is only
valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
and then recreated (with the same name), the new endpoint will have a
new address (eventually endpoints addresses will have to be recycled but
not for a very long time). Endpoints names do not necessarily have to be
unique.

Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
---
(This document/code contribution attached is provided under the terms of
agreement LES-LTM-21309)

 include/odp/api/ipc.h | 261 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 include/odp/api/ipc.h

Bill Fischofer May 19, 2015, 1:53 a.m. UTC | #1

See comments inline.  In general I like this, as it does seem clean and
minimal.

On Mon, May 18, 2015 at 5:03 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
wrote:

> As promised, here is my first attempt at a standalone API for IPC - inter
> process communication in a shared nothing architecture (message passing
> between processes which do not share memory).
>
> Currently all definitions are in the file ipc.h but it is possible to
> break out some message/event related definitions (everything from
> odp_ipc_sender) in a separate file message.h. This would mimic the
> packet_io.h/packet.h separation.
>
> The semantics of message passing is that sending a message to an endpoint
> will always look like it succeeds. The appearance of endpoints is
> explicitly
> notified through user-defined messages specified in the odp_ipc_resolve()
> call. Similarly, the disappearance (e.g. death or otherwise lost
> connection)
> is also explicitly notified through user-defined messages specified in the
> odp_ipc_monitor() call. The send call does not fail because the addressed
> endpoints has disappeared.
>
> Messages (from endpoint A to endpoint B) are delivered in order. If message
> N sent to an endpoint is delivered, then all messages <N have also been
> delivered. Message delivery does not guarantee actual processing by the
> recipient. End-to-end acknowledgements (using messages) should be used if
> this guarantee is important to the user.
>
> IPC endpoints can be seen as interfaces (taps) to an internal reliable
> multidrop network where each endpoint has a unique address which is only
> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> and then recreated (with the same name), the new endpoint will have a
> new address (eventually endpoints addresses will have to be recycled but
> not for a very long time). Endpoints names do not necessarily have to be
> unique.
>
> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> ---
> (This document/code contribution attached is provided under the terms of
> agreement LES-LTM-21309)
>
>  include/odp/api/ipc.h | 261
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 261 insertions(+)
>  create mode 100644 include/odp/api/ipc.h
>
> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
> new file mode 100644
> index 0000000..3395a34
> --- /dev/null
> +++ b/include/odp/api/ipc.h
> @@ -0,0 +1,261 @@
> +/* Copyright (c) 2015, Linaro Limited
> + * All rights reserved.
> + *
> + * SPDX-License-Identifier:     BSD-3-Clause
> + */
> +
> +
> +/**
> + * @file
> + *
> + * ODP IPC API
> + */
> +
> +#ifndef ODP_API_IPC_H_
> +#define ODP_API_IPC_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** @defgroup odp_ipc ODP IPC
> + *  @{
> + */
> +
> +/**
> + * @typedef odp_ipc_t
> + * ODP IPC handle
> + */
> +
> +/**
> + * @typedef odp_ipc_msg_t
> + * ODP IPC message handle
> + */
> +
> +
> +/**
> + * @def ODP_IPC_ADDR_SIZE
> + * Size of the address of an IPC endpoint
> + */
> +
> +/**
> + * Create IPC endpoint
> + *
> + * @param name Name of local IPC endpoint
> + * @param pool Pool for incoming messages
>

Should document the type of the pool being used. Since the object type for
IPC channels is odp_ipc_msg_t, that would imply that this should be a new
pool type (ODP_POOL_IPC or ODP_POOL_IPC_MSG) to buffer these objects.


> + *
> + * @return IPC handle on success
> + * @retval ODP_IPC_INVALID on failure and errno set
> + */
> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
> +
> +/**
> + * Destroy IPC endpoint
> + *
> + * @param ipc IPC handle
> + *
> + * @retval 0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_destroy(odp_ipc_t ipc);
> +
> +/**
> + * Set the default input queue for an IPC endpoint
> + *
> + * @param ipc   IPC handle
> + * @param queue Queue handle
> + *
> + * @retval  0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> +
> +/**
> + * Remove the default input queue
> + *
> + * Remove (disassociate) the default input queue from an IPC endpoint.
> + * The queue itself is not touched.
> + *
> + * @param ipc  IPC handle
> + *
> + * @retval 0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
> +
> +/**
> + * Resolve endpoint by name
> + *
> + * Look up an existing or future endpoint by name.
> + * When the endpoint exists, return the specified message with the
> endpoint
> + * as the sender.
> + *
> + * @param ipc IPC handle
> + * @param name Name to resolve
> + * @param msg Message to return
> + */
> +void odp_ipc_resolve(odp_ipc_t ipc,
> +                    const char *name,
> +                    odp_ipc_msg_t msg);
>

Shouldn't this be odp_ipc_lookup() for consistency with the other named
lookup APIs?


> +
> +/**
> + * Monitor endpoint
> + *
> + * Monitor an existing (potentially already dead) endpoint.
> + * When the endpoint is dead, return the specified message with the
> endpoint
> + * as the sender.
> + *
> + * Unrecognized or invalid endpoint addresses are treated as dead
> endpoints.
> + *
> + * @param ipc IPC handle
> + * @param addr Address of monitored endpoint
> + * @param msg Message to return
> + */
> +void odp_ipc_monitor(odp_ipc_t ipc,
> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
> +                    odp_ipc_msg_t msg);
> +
> +/**
> + * Send message
> + *
> + * Send a message to an endpoint (which may already be dead).
> + * Message delivery is ordered and reliable. All (accepted) messages will
> be
> + * delivered up to the point of endpoint death or lost connection.
> + * Actual reception and processing is not guaranteed (use end-to-end
> + * acknowledgements for that).
> + * Monitor the remote endpoint to detect death or lost connection.
> + *
> + * @param ipc IPC handle
> + * @param msg Message to send
> + * @param addr Address of remote endpoint
> + *
> + * @retval 0 on success
> + * @retval <0 on error
> + */
> +int odp_ipc_send(odp_ipc_t ipc,
> +                odp_ipc_msg_t msg,
> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
> +
> +/**
> + * Get address of sender (source) of message
> + *
> + * @param msg Message handle
> + * @param addr Address of sender endpoint
> + */
> +void odp_ipc_sender(odp_ipc_msg_t msg,
> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
> +
> +/**
> + * Message data pointer
> + *
> + * Return a pointer to the message data
> + *
> + * @param msg Message handle
> + *
> + * @return Pointer to the message data
> + */
> +void *odp_ipc_data(odp_ipc_msg_t msg);
> +
> +/**
> + * Message data length
> + *
> + * Return length of the message data.
> + *
> + * @param msg Message handle
> + *
> + * @return Message length
> + */
> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>

Should these two be combined to eliminate the need to call both?  Wouldn't
one expect to need both address and length in most instances?

Also, does this imply that all odp_ipc_msg_t objects are contiguously
addressable? That requirement might be problematic for some
implementations. We can certainly allow the application to define a minimum
msg segment size (as part of the pool creation), but if "large" objects can
be passed via the IPC mechanism (e.g., packets) then it would seem that the
API should be defined to support segmented addressability.  This is
especially true if the endpoints are defined in different pools (as one
would expect) that have independently-configurable segmentation.


> +
> +/**
> + * Set message length
> + *
> + * Set length of the message data.
> + *
> + * @param msg Message handle
> + * @param len New length
> + *
> + * @retval 0 on success
> + * @retval <0 on error
> + */
> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
> +
> +/**
> + * Allocate message
> + *
> + * Allocate a message of a specific size.
> + *
> + * @param pool Message pool to allocate message from
> + * @param len Length of the allocated message
> + *
> + * @return IPC message handle on success
> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
> + */
> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
> +
> +/**
> + * Free message
> + *
> + * Free message back to the message pool it was allocated from.
> + *
> + * @param msg Handle of message to free
> + */
> +void odp_ipc_free(odp_ipc_msg_t msg);
> +
> +/**
> + * Get message handle from event
> + *
> + * Converts an ODP_EVENT_MESSAGE type event to a message.
> + *
> + * @param ev   Event handle
> + *
> + * @return Message handle
> + *
> + * @see odp_event_type()
> + */
> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
> +
> +/**
> + * Convert message handle to event
> + *
> + * @param msg  Message handle
> + *
> + * @return Event handle
> + */
> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
> +
> +/**
> + * Get printable value for an odp_ipc_t
> + *
> + * @param ipc  IPC handle to be printed
> + * @return     uint64_t value that can be used to print/display this
> + *             handle
> + *
> + * @note This routine is intended to be used for diagnostic purposes
> + * to enable applications to generate a printable value that represents
> + * an odp_ipc_t handle.
> + */
> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
> +
> +/**
> + * Get printable value for an odp_ipc_msg_t
> + *
> + * @param msg  Message handle to be printed
> + * @return     uint64_t value that can be used to print/display this
> + *             handle
> + *
> + * @note This routine is intended to be used for diagnostic purposes
> + * to enable applications to generate a printable value that represents
> + * an odp_ipc_msg_t handle.
> + */
> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
> +
> +/**
> + * @}
> + */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> --
> 1.9.1
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>

Ola Liljedahl May 19, 2015, 8:49 a.m. UTC | #2

On 19 May 2015 at 03:53, Bill Fischofer <bill.fischofer@linaro.org> wrote:

> See comments inline.  In general I like this, as it does seem clean and
> minimal.
>
> On Mon, May 18, 2015 at 5:03 PM, Ola Liljedahl <ola.liljedahl@linaro.org>
> wrote:
>
>> As promised, here is my first attempt at a standalone API for IPC - inter
>> process communication in a shared nothing architecture (message passing
>> between processes which do not share memory).
>>
>> Currently all definitions are in the file ipc.h but it is possible to
>> break out some message/event related definitions (everything from
>> odp_ipc_sender) in a separate file message.h. This would mimic the
>> packet_io.h/packet.h separation.
>>
>> The semantics of message passing is that sending a message to an endpoint
>> will always look like it succeeds. The appearance of endpoints is
>> explicitly
>> notified through user-defined messages specified in the odp_ipc_resolve()
>> call. Similarly, the disappearance (e.g. death or otherwise lost
>> connection)
>> is also explicitly notified through user-defined messages specified in the
>> odp_ipc_monitor() call. The send call does not fail because the addressed
>> endpoints has disappeared.
>>
>> Messages (from endpoint A to endpoint B) are delivered in order. If
>> message
>> N sent to an endpoint is delivered, then all messages <N have also been
>> delivered. Message delivery does not guarantee actual processing by the
>> recipient. End-to-end acknowledgements (using messages) should be used if
>> this guarantee is important to the user.
>>
>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> multidrop network where each endpoint has a unique address which is only
>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>> and then recreated (with the same name), the new endpoint will have a
>> new address (eventually endpoints addresses will have to be recycled but
>> not for a very long time). Endpoints names do not necessarily have to be
>> unique.
>>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> (This document/code contribution attached is provided under the terms of
>> agreement LES-LTM-21309)
>>
>>  include/odp/api/ipc.h | 261
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 261 insertions(+)
>>  create mode 100644 include/odp/api/ipc.h
>>
>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>> new file mode 100644
>> index 0000000..3395a34
>> --- /dev/null
>> +++ b/include/odp/api/ipc.h
>> @@ -0,0 +1,261 @@
>> +/* Copyright (c) 2015, Linaro Limited
>> + * All rights reserved.
>> + *
>> + * SPDX-License-Identifier:     BSD-3-Clause
>> + */
>> +
>> +
>> +/**
>> + * @file
>> + *
>> + * ODP IPC API
>> + */
>> +
>> +#ifndef ODP_API_IPC_H_
>> +#define ODP_API_IPC_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/** @defgroup odp_ipc ODP IPC
>> + *  @{
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_t
>> + * ODP IPC handle
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_msg_t
>> + * ODP IPC message handle
>> + */
>> +
>> +
>> +/**
>> + * @def ODP_IPC_ADDR_SIZE
>> + * Size of the address of an IPC endpoint
>> + */
>> +
>> +/**
>> + * Create IPC endpoint
>> + *
>> + * @param name Name of local IPC endpoint
>> + * @param pool Pool for incoming messages
>>
>
> Should document the type of the pool being used. Since the object type for
> IPC channels is odp_ipc_msg_t, that would imply that this should be a new
> pool type (ODP_POOL_IPC or ODP_POOL_IPC_MSG) to buffer these objects.
>
OK. A message is basically a buffer with some extra metadata (source and
possibly destination addresses and size of message data).


>
>
>> + *
>> + * @return IPC handle on success
>> + * @retval ODP_IPC_INVALID on failure and errno set
>> + */
>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> +
>> +/**
>> + * Destroy IPC endpoint
>> + *
>> + * @param ipc IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_destroy(odp_ipc_t ipc);
>> +
>> +/**
>> + * Set the default input queue for an IPC endpoint
>> + *
>> + * @param ipc   IPC handle
>> + * @param queue Queue handle
>> + *
>> + * @retval  0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> +
>> +/**
>> + * Remove the default input queue
>> + *
>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>> + * The queue itself is not touched.
>> + *
>> + * @param ipc  IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>>
> I don't know why this call can return an error. I copied it from
packet_io.h. I think it is reasonable that the ipc handle is valid or
behaviour is undefined (e.g. crash or abort or terminate but not return).
Then there is nothing that can go wrong.



> +
>> +/**
>> + * Resolve endpoint by name
>> + *
>> + * Look up an existing or future endpoint by name.
>> + * When the endpoint exists, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * @param ipc IPC handle
>> + * @param name Name to resolve
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_resolve(odp_ipc_t ipc,
>> +                    const char *name,
>> +                    odp_ipc_msg_t msg);
>>
>
> Shouldn't this be odp_ipc_lookup() for consistency with the other named
> lookup APIs?
>
OK. I tried to mimic existing calls but missed this one.


>
>
>> +
>> +/**
>> + * Monitor endpoint
>> + *
>> + * Monitor an existing (potentially already dead) endpoint.
>> + * When the endpoint is dead, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * Unrecognized or invalid endpoint addresses are treated as dead
>> endpoints.
>> + *
>> + * @param ipc IPC handle
>> + * @param addr Address of monitored endpoint
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_monitor(odp_ipc_t ipc,
>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Send message
>> + *
>> + * Send a message to an endpoint (which may already be dead).
>> + * Message delivery is ordered and reliable. All (accepted) messages
>> will be
>> + * delivered up to the point of endpoint death or lost connection.
>> + * Actual reception and processing is not guaranteed (use end-to-end
>> + * acknowledgements for that).
>> + * Monitor the remote endpoint to detect death or lost connection.
>> + *
>> + * @param ipc IPC handle
>> + * @param msg Message to send
>> + * @param addr Address of remote endpoint
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_send(odp_ipc_t ipc,
>> +                odp_ipc_msg_t msg,
>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Get address of sender (source) of message
>> + *
>> + * @param msg Message handle
>> + * @param addr Address of sender endpoint
>> + */
>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Message data pointer
>> + *
>> + * Return a pointer to the message data
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Pointer to the message data
>> + */
>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Message data length
>> + *
>> + * Return length of the message data.
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Message length
>> + */
>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>
>
> Should these two be combined to eliminate the need to call both?  Wouldn't
> one expect to need both address and length in most instances?
>
Yes. But that goes for a lot of existing calls in e.g. packet.h as well.
And we still define them as separate calls.


>
> Also, does this imply that all odp_ipc_msg_t objects are contiguously
> addressable? That requirement might be problematic for some
> implementations. We can certainly allow the application to define a minimum
> msg segment size (as part of the pool creation), but if "large" objects can
> be passed via the IPC mechanism (e.g., packets) then it would seem that the
> API should be defined to support segmented addressability.  This is
> especially true if the endpoints are defined in different pools (as one
> would expect) that have independently-configurable segmentation.
>
The idea is not to pass packets between processes using the IPC mechanism.
Rather control-type messages for configuration changes in the data plane. I
don't expect these messages to be too large (a few hundred bytes at the
very most). The content is likely to be expressed as a struct and the ideal
usage is to cast the message data pointer to the struct type in question
(as identified by the message type which is customarily the first
(observable) element in the message). This is very different from how
packet payload is accessed (and more similar to how packet headers are
accessed and we allow for non-segmented access to those within certain
limits).

With message passing between processes not sharing memory, messages will
have to be copied. As long as a message fits into the message (buffer) pool
of the receiver (and obviously it has to fit into the message pool of the
sender as well), I don't see why we have to expose message size limitations
in any other way.



>
>
>> +
>> +/**
>> + * Set message length
>> + *
>> + * Set length of the message data.
>> + *
>> + * @param msg Message handle
>> + * @param len New length
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> +
>> +/**
>> + * Allocate message
>> + *
>> + * Allocate a message of a specific size.
>> + *
>> + * @param pool Message pool to allocate message from
>> + * @param len Length of the allocated message
>> + *
>> + * @return IPC message handle on success
>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>> + */
>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>> +
>> +/**
>> + * Free message
>> + *
>> + * Free message back to the message pool it was allocated from.
>> + *
>> + * @param msg Handle of message to free
>> + */
>> +void odp_ipc_free(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get message handle from event
>> + *
>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>> + *
>> + * @param ev   Event handle
>> + *
>> + * @return Message handle
>> + *
>> + * @see odp_event_type()
>> + */
>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>> +
>> +/**
>> + * Convert message handle to event
>> + *
>> + * @param msg  Message handle
>> + *
>> + * @return Event handle
>> + */
>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_t
>> + *
>> + * @param ipc  IPC handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_t handle.
>> + */
>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_msg_t
>> + *
>> + * @param msg  Message handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_msg_t handle.
>> + */
>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * @}
>> + */
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> --
>> 1.9.1
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>

Ciprian Barbu May 19, 2015, 2:19 p.m. UTC | #3

On Tue, May 19, 2015 at 1:03 AM, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
> As promised, here is my first attempt at a standalone API for IPC - inter
> process communication in a shared nothing architecture (message passing
> between processes which do not share memory).
>
> Currently all definitions are in the file ipc.h but it is possible to
> break out some message/event related definitions (everything from
> odp_ipc_sender) in a separate file message.h. This would mimic the
> packet_io.h/packet.h separation.
>
> The semantics of message passing is that sending a message to an endpoint
> will always look like it succeeds. The appearance of endpoints is explicitly
> notified through user-defined messages specified in the odp_ipc_resolve()
> call. Similarly, the disappearance (e.g. death or otherwise lost connection)
> is also explicitly notified through user-defined messages specified in the
> odp_ipc_monitor() call. The send call does not fail because the addressed
> endpoints has disappeared.
>
> Messages (from endpoint A to endpoint B) are delivered in order. If message
> N sent to an endpoint is delivered, then all messages <N have also been
> delivered. Message delivery does not guarantee actual processing by the
> recipient. End-to-end acknowledgements (using messages) should be used if
> this guarantee is important to the user.
>
> IPC endpoints can be seen as interfaces (taps) to an internal reliable
> multidrop network where each endpoint has a unique address which is only
> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> and then recreated (with the same name), the new endpoint will have a
> new address (eventually endpoints addresses will have to be recycled but
> not for a very long time). Endpoints names do not necessarily have to be
> unique.
>
> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> ---
> (This document/code contribution attached is provided under the terms of
> agreement LES-LTM-21309)
>
>  include/odp/api/ipc.h | 261 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 261 insertions(+)
>  create mode 100644 include/odp/api/ipc.h
>
> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
> new file mode 100644
> index 0000000..3395a34
> --- /dev/null
> +++ b/include/odp/api/ipc.h
> @@ -0,0 +1,261 @@
> +/* Copyright (c) 2015, Linaro Limited
> + * All rights reserved.
> + *
> + * SPDX-License-Identifier:     BSD-3-Clause
> + */
> +
> +
> +/**
> + * @file
> + *
> + * ODP IPC API
> + */
> +
> +#ifndef ODP_API_IPC_H_
> +#define ODP_API_IPC_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** @defgroup odp_ipc ODP IPC
> + *  @{
> + */
> +
> +/**
> + * @typedef odp_ipc_t
> + * ODP IPC handle
> + */
> +
> +/**
> + * @typedef odp_ipc_msg_t
> + * ODP IPC message handle
> + */
> +
> +
> +/**
> + * @def ODP_IPC_ADDR_SIZE
> + * Size of the address of an IPC endpoint
> + */
> +
> +/**
> + * Create IPC endpoint
> + *
> + * @param name Name of local IPC endpoint
> + * @param pool Pool for incoming messages
> + *
> + * @return IPC handle on success
> + * @retval ODP_IPC_INVALID on failure and errno set
> + */
> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
> +
> +/**
> + * Destroy IPC endpoint
> + *
> + * @param ipc IPC handle
> + *
> + * @retval 0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_destroy(odp_ipc_t ipc);
> +
> +/**
> + * Set the default input queue for an IPC endpoint
> + *
> + * @param ipc   IPC handle
> + * @param queue Queue handle
> + *
> + * @retval  0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> +
> +/**
> + * Remove the default input queue
> + *
> + * Remove (disassociate) the default input queue from an IPC endpoint.
> + * The queue itself is not touched.
> + *
> + * @param ipc  IPC handle
> + *
> + * @retval 0 on success
> + * @retval <0 on failure
> + */
> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
> +
> +/**
> + * Resolve endpoint by name
> + *
> + * Look up an existing or future endpoint by name.
> + * When the endpoint exists, return the specified message with the endpoint
> + * as the sender.
> + *
> + * @param ipc IPC handle
> + * @param name Name to resolve
> + * @param msg Message to return
> + */
> +void odp_ipc_resolve(odp_ipc_t ipc,
> +                    const char *name,
> +                    odp_ipc_msg_t msg);

How does this take into account the 'address' of the endpoint? Will
the name include the search path, something like
"domain_name/endpoint_name"? If so, should there be an API for
creating a communication channel with domain_X?

> +
> +/**
> + * Monitor endpoint
> + *
> + * Monitor an existing (potentially already dead) endpoint.
> + * When the endpoint is dead, return the specified message with the endpoint
> + * as the sender.
> + *
> + * Unrecognized or invalid endpoint addresses are treated as dead endpoints.
> + *
> + * @param ipc IPC handle
> + * @param addr Address of monitored endpoint
> + * @param msg Message to return
> + */
> +void odp_ipc_monitor(odp_ipc_t ipc,
> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
> +                    odp_ipc_msg_t msg);
> +
> +/**
> + * Send message
> + *
> + * Send a message to an endpoint (which may already be dead).
> + * Message delivery is ordered and reliable. All (accepted) messages will be
> + * delivered up to the point of endpoint death or lost connection.
> + * Actual reception and processing is not guaranteed (use end-to-end
> + * acknowledgements for that).
> + * Monitor the remote endpoint to detect death or lost connection.
> + *
> + * @param ipc IPC handle
> + * @param msg Message to send
> + * @param addr Address of remote endpoint
> + *
> + * @retval 0 on success
> + * @retval <0 on error
> + */
> +int odp_ipc_send(odp_ipc_t ipc,
> +                odp_ipc_msg_t msg,
> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
> +
> +/**
> + * Get address of sender (source) of message
> + *
> + * @param msg Message handle
> + * @param addr Address of sender endpoint
> + */
> +void odp_ipc_sender(odp_ipc_msg_t msg,
> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
> +
> +/**
> + * Message data pointer
> + *
> + * Return a pointer to the message data
> + *
> + * @param msg Message handle
> + *
> + * @return Pointer to the message data
> + */
> +void *odp_ipc_data(odp_ipc_msg_t msg);
> +
> +/**
> + * Message data length
> + *
> + * Return length of the message data.
> + *
> + * @param msg Message handle
> + *
> + * @return Message length
> + */
> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> +
> +/**
> + * Set message length
> + *
> + * Set length of the message data.
> + *
> + * @param msg Message handle
> + * @param len New length
> + *
> + * @retval 0 on success
> + * @retval <0 on error
> + */
> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
> +
> +/**
> + * Allocate message
> + *
> + * Allocate a message of a specific size.
> + *
> + * @param pool Message pool to allocate message from
> + * @param len Length of the allocated message
> + *
> + * @return IPC message handle on success
> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
> + */
> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
> +
> +/**
> + * Free message
> + *
> + * Free message back to the message pool it was allocated from.
> + *
> + * @param msg Handle of message to free
> + */
> +void odp_ipc_free(odp_ipc_msg_t msg);
> +
> +/**
> + * Get message handle from event
> + *
> + * Converts an ODP_EVENT_MESSAGE type event to a message.
> + *
> + * @param ev   Event handle
> + *
> + * @return Message handle
> + *
> + * @see odp_event_type()
> + */
> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
> +
> +/**
> + * Convert message handle to event
> + *
> + * @param msg  Message handle
> + *
> + * @return Event handle
> + */
> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
> +
> +/**
> + * Get printable value for an odp_ipc_t
> + *
> + * @param ipc  IPC handle to be printed
> + * @return     uint64_t value that can be used to print/display this
> + *             handle
> + *
> + * @note This routine is intended to be used for diagnostic purposes
> + * to enable applications to generate a printable value that represents
> + * an odp_ipc_t handle.
> + */
> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
> +
> +/**
> + * Get printable value for an odp_ipc_msg_t
> + *
> + * @param msg  Message handle to be printed
> + * @return     uint64_t value that can be used to print/display this
> + *             handle
> + *
> + * @note This routine is intended to be used for diagnostic purposes
> + * to enable applications to generate a printable value that represents
> + * an odp_ipc_msg_t handle.
> + */
> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
> +
> +/**
> + * @}
> + */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> --
> 1.9.1
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp

Ola Liljedahl May 19, 2015, 3:27 p.m. UTC | #4

On 19 May 2015 at 16:19, Ciprian Barbu <ciprian.barbu@linaro.org> wrote:
> On Tue, May 19, 2015 at 1:03 AM, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>> As promised, here is my first attempt at a standalone API for IPC - inter
>> process communication in a shared nothing architecture (message passing
>> between processes which do not share memory).
>>
>> Currently all definitions are in the file ipc.h but it is possible to
>> break out some message/event related definitions (everything from
>> odp_ipc_sender) in a separate file message.h. This would mimic the
>> packet_io.h/packet.h separation.
>>
>> The semantics of message passing is that sending a message to an endpoint
>> will always look like it succeeds. The appearance of endpoints is explicitly
>> notified through user-defined messages specified in the odp_ipc_resolve()
>> call. Similarly, the disappearance (e.g. death or otherwise lost connection)
>> is also explicitly notified through user-defined messages specified in the
>> odp_ipc_monitor() call. The send call does not fail because the addressed
>> endpoints has disappeared.
>>
>> Messages (from endpoint A to endpoint B) are delivered in order. If message
>> N sent to an endpoint is delivered, then all messages <N have also been
>> delivered. Message delivery does not guarantee actual processing by the
>> recipient. End-to-end acknowledgements (using messages) should be used if
>> this guarantee is important to the user.
>>
>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> multidrop network where each endpoint has a unique address which is only
>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>> and then recreated (with the same name), the new endpoint will have a
>> new address (eventually endpoints addresses will have to be recycled but
>> not for a very long time). Endpoints names do not necessarily have to be
>> unique.
>>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> (This document/code contribution attached is provided under the terms of
>> agreement LES-LTM-21309)
>>
>>  include/odp/api/ipc.h | 261 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 261 insertions(+)
>>  create mode 100644 include/odp/api/ipc.h
>>
>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>> new file mode 100644
>> index 0000000..3395a34
>> --- /dev/null
>> +++ b/include/odp/api/ipc.h
>> @@ -0,0 +1,261 @@
>> +/* Copyright (c) 2015, Linaro Limited
>> + * All rights reserved.
>> + *
>> + * SPDX-License-Identifier:     BSD-3-Clause
>> + */
>> +
>> +
>> +/**
>> + * @file
>> + *
>> + * ODP IPC API
>> + */
>> +
>> +#ifndef ODP_API_IPC_H_
>> +#define ODP_API_IPC_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/** @defgroup odp_ipc ODP IPC
>> + *  @{
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_t
>> + * ODP IPC handle
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_msg_t
>> + * ODP IPC message handle
>> + */
>> +
>> +
>> +/**
>> + * @def ODP_IPC_ADDR_SIZE
>> + * Size of the address of an IPC endpoint
>> + */
>> +
>> +/**
>> + * Create IPC endpoint
>> + *
>> + * @param name Name of local IPC endpoint
>> + * @param pool Pool for incoming messages
>> + *
>> + * @return IPC handle on success
>> + * @retval ODP_IPC_INVALID on failure and errno set
>> + */
>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> +
>> +/**
>> + * Destroy IPC endpoint
>> + *
>> + * @param ipc IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_destroy(odp_ipc_t ipc);
>> +
>> +/**
>> + * Set the default input queue for an IPC endpoint
>> + *
>> + * @param ipc   IPC handle
>> + * @param queue Queue handle
>> + *
>> + * @retval  0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> +
>> +/**
>> + * Remove the default input queue
>> + *
>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>> + * The queue itself is not touched.
>> + *
>> + * @param ipc  IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>> +
>> +/**
>> + * Resolve endpoint by name
>> + *
>> + * Look up an existing or future endpoint by name.
>> + * When the endpoint exists, return the specified message with the endpoint
>> + * as the sender.
>> + *
>> + * @param ipc IPC handle
>> + * @param name Name to resolve
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_resolve(odp_ipc_t ipc,
>> +                    const char *name,
>> +                    odp_ipc_msg_t msg);
>
> How does this take into account the 'address' of the endpoint? Will
> the name include the search path, something like
> "domain_name/endpoint_name"? If so, should there be an API for
> creating a communication channel with domain_X?
The semantics of names and any network topology is not defined and not
really important to the endpoints themselves (the applications).

A specific IPC implementation can support some form of structured or
partitioned network topology without changes to the IPC client-side
API (I think). To compare, in OSE, adding support for link handlers
and distributed systems did not change the original API designed for
local communication.

>
>> +
>> +/**
>> + * Monitor endpoint
>> + *
>> + * Monitor an existing (potentially already dead) endpoint.
>> + * When the endpoint is dead, return the specified message with the endpoint
>> + * as the sender.
>> + *
>> + * Unrecognized or invalid endpoint addresses are treated as dead endpoints.
>> + *
>> + * @param ipc IPC handle
>> + * @param addr Address of monitored endpoint
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_monitor(odp_ipc_t ipc,
>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Send message
>> + *
>> + * Send a message to an endpoint (which may already be dead).
>> + * Message delivery is ordered and reliable. All (accepted) messages will be
>> + * delivered up to the point of endpoint death or lost connection.
>> + * Actual reception and processing is not guaranteed (use end-to-end
>> + * acknowledgements for that).
>> + * Monitor the remote endpoint to detect death or lost connection.
>> + *
>> + * @param ipc IPC handle
>> + * @param msg Message to send
>> + * @param addr Address of remote endpoint
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_send(odp_ipc_t ipc,
>> +                odp_ipc_msg_t msg,
>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Get address of sender (source) of message
>> + *
>> + * @param msg Message handle
>> + * @param addr Address of sender endpoint
>> + */
>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Message data pointer
>> + *
>> + * Return a pointer to the message data
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Pointer to the message data
>> + */
>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Message data length
>> + *
>> + * Return length of the message data.
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Message length
>> + */
>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Set message length
>> + *
>> + * Set length of the message data.
>> + *
>> + * @param msg Message handle
>> + * @param len New length
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> +
>> +/**
>> + * Allocate message
>> + *
>> + * Allocate a message of a specific size.
>> + *
>> + * @param pool Message pool to allocate message from
>> + * @param len Length of the allocated message
>> + *
>> + * @return IPC message handle on success
>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>> + */
>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>> +
>> +/**
>> + * Free message
>> + *
>> + * Free message back to the message pool it was allocated from.
>> + *
>> + * @param msg Handle of message to free
>> + */
>> +void odp_ipc_free(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get message handle from event
>> + *
>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>> + *
>> + * @param ev   Event handle
>> + *
>> + * @return Message handle
>> + *
>> + * @see odp_event_type()
>> + */
>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>> +
>> +/**
>> + * Convert message handle to event
>> + *
>> + * @param msg  Message handle
>> + *
>> + * @return Event handle
>> + */
>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_t
>> + *
>> + * @param ipc  IPC handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_t handle.
>> + */
>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_msg_t
>> + *
>> + * @param msg  Message handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_msg_t handle.
>> + */
>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * @}
>> + */
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> --
>> 1.9.1
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp

Ola Liljedahl May 19, 2015, 10:08 p.m. UTC | #5

On 19 May 2015 at 19:04, Benoît Ganne <bganne@kalray.eu> wrote:
> Hi Ola,
>
> Thanks for sharing this. We are also looking at IPC at Kalray, where our ODP
> model is multi-process, shared-nothing, architecture.
> From our point of view, the main requirements for IPC would be:
>  - use it to communicate between different address spaces (AS), and as such
> our messages should be bigger than just a pointer to a shared mem area
I agree. The data must be contained in the message.

>  - use it for control/data planes signaling but also to exchange packets to
> be allowed to build packets processing pipelines
Control/data plane signaling must be reliable (per the definition in
my RFC). But do you have the same requirements for packet transfer
between pipeline stages (on different cores)? Or it just happens to be
reliable on your hardware? Is reliability (guaranteed delivery) an
inherent requirement for pipelined packet processing? (this would be
contrary to my experience).

Also couldn't you use ODP queues as the abstraction for transferring
packets between cores/pipeline stages? Per your pktio proposal below,
if you connect input and output queues to your "ipc" pktio instances,
applications can just enqueue and dequeue packets from these queues
and that will cause packets to be transferred between pipeline stages
(on different cores/AS's). This use case does not want or need the
ability to specify the destination address for each individual send
operation and this would likely just add overhead to a performance
critical operation.

Queues also support enqueueing of all types of events while my
proposed IPC mechanism only supports a new message event type.

>  - IPC ops should be mapped to our NoC HW as much as possible, especially
> for send/recv operations
Is there anything in the proposed API that would limit your
implementation freedom?

>
> From your proposal, it seems to me that what you proposed is more like an
> application messaging bus such as d-bus (especially the deferred lookup()
> and monitor()),
Perhaps d-bus can be used as the implementation, that could save me
some work. But the IPC mechanism is extremely inspired by the OSE
(application) message passing mechanism (there was a borked attempt to
adapt OSE message passing to passing packets between pipeline stages
which showed why message passing and packet passing should not be
mixed).

> with a centralized rendez-vous point for resource
> management. It is certainly useful, especially for control/data plane
> signaling, but I would like to have a more low-level IPC mechanism, on top
> of which we could build this sort of messaging bus.
I proposed an API for the functionality I need. How each platform
implements it is up to them. Why would you want to expose a lower
level API? How would such an API look like and how would you implement
the functionality in my proposal? Would the low level API be
independent of the underlying platform?

>
> For this kind of lower-level mechanism I think pktio might be a good match.
My prototyping started out by tweaking the linux-generic packet_io
implementation. By specifying a special pktio interface name, the IPC
endpoint would be created. An Ethernet-like packet format (48-bit
destination and source address, 32 bit message type) was be used for
messages. The packet.h API would be used for accessing and
manipulating messages (which are of the packet event type). This is
still a potential implementation for Linux generic but a little bit
too close the packet semantics (unreliable transfer, segmented buffers
etc) for my intended use case. But messages are not packets and
reliable transfer is important.

> But in that case you need to be able to identify the endpoints. What I
> proposed in precedent thread, was to use a hierarchical device naming:
>  - "/dev/<device>" for real devices
>  - "/ipc/<identifier>" for IPC
> What I had in mind so far for our platform was to used {AS identifier + pool
> name in the AS} as IPC identifier, which would avoid the need of a
> centralized rendez-vous point.
The semantics of the IPC endpoint names are not defined in my
proposal. I think it is a mistake to impose meaning on the names. I
expect names to identify applications but an IPC name can identify
anything. Example and real applications will likely impose some kind
of structure on and meaning to names. But it is probably not a good
idea that the platform imposes meaning to IPC names (e.g. names
represent SoC topology), I can imagine how this will complicate
portability. pktio names however are already platform specific.

Also I don't envision any (active) rendez-vous point, it's just a
global shared table in my implementation for linux-generic. This is
needed since names are user-defined and arbitrary.

>
> Another slightly different subject is that we would like to extend pktio
> (but it is a common requirement for IPC) to be able to open pktio for
> read-only, write-only or read-write operations. This is because opening
> communications on our HW allocate HW resources for each configured RX (you
> need DMA endpoints to be configured). Being able to open write-only pktio
> help to save those resources, and would make sense even in the standard
> pktio.
>
> How could it look like? Here are some examples:
>  - to open a read-only pktio in AS0:
> odp_pool_t local_pool = odp_pool_create("AS0_pool", ...);
> odp_pktio_open("/ipc/local", local_pool);
>    Notes: "/ipc/local" identifies a local, read-only ipc endpoints.
>  - to open a write-only pktio in AS1 to send data to AS0:
> odp_pktio_open("/ipc/AS0/AS0_pool", NULL);
Why are these pktio instances of type IPC? They could just as well be
network interfaces on which you send and/or receive packets with the
normal semantics. No need for the IPC API I proposed. What stops you
from implementing this today?

>    Notes: "/ipc/AS0/AS0_pool" identifies a remote endpoints for writing:
> "AS0" identifies the address space, whereas "AS0_pool" identifies the packet
> pool in AS0 address space. The fact that we do not associate any default
> pool with pktio means it is write-only.
So we need to amend the API spec that it is OK to specify
ODP_POOL_INVALID for the pool parameter to the opd_pktio_open() call
and this will indicate that the pktio interface is write-only. Is
there anything similar we need to do for read-only interfaces? You
would like to have the indication of read-only/write-only/read-write
at the time of pktio_open. Perhaps we need a new parameter for
odp_pktio_open(), the pool parameter is not enough for this purpose.

Your needs are real but I think reusing the ODP pktio concept is a
better solution for you, not trying to hijack the application
messaging API which is intended for a very different use case. IPC
seems to mean different things to different people so perhaps should
be avoided in order not to give people the wrong ideas.

>  - to open a read-write pktio between AS0 and AS1 in AS1:
> odp_pool_t local_pool = odp_pool_create("AS1_pool", ...);
> odp_pktio_open("/AS0/AS0_pool", local_pool);
>
> Thanks,
> ben
>
>
> On 05/19/2015 12:03 AM, Ola Liljedahl wrote:
>>
>> As promised, here is my first attempt at a standalone API for IPC - inter
>> process communication in a shared nothing architecture (message passing
>> between processes which do not share memory).
>>
>> Currently all definitions are in the file ipc.h but it is possible to
>> break out some message/event related definitions (everything from
>> odp_ipc_sender) in a separate file message.h. This would mimic the
>> packet_io.h/packet.h separation.
>>
>> The semantics of message passing is that sending a message to an endpoint
>> will always look like it succeeds. The appearance of endpoints is
>> explicitly
>> notified through user-defined messages specified in the odp_ipc_resolve()
>> call. Similarly, the disappearance (e.g. death or otherwise lost
>> connection)
>> is also explicitly notified through user-defined messages specified in the
>> odp_ipc_monitor() call. The send call does not fail because the addressed
>> endpoints has disappeared.
>>
>> Messages (from endpoint A to endpoint B) are delivered in order. If
>> message
>> N sent to an endpoint is delivered, then all messages <N have also been
>> delivered. Message delivery does not guarantee actual processing by the
>> recipient. End-to-end acknowledgements (using messages) should be used if
>> this guarantee is important to the user.
>>
>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> multidrop network where each endpoint has a unique address which is only
>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>> and then recreated (with the same name), the new endpoint will have a
>> new address (eventually endpoints addresses will have to be recycled but
>> not for a very long time). Endpoints names do not necessarily have to be
>> unique.
>>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> (This document/code contribution attached is provided under the terms of
>> agreement LES-LTM-21309)
>>
>>   include/odp/api/ipc.h | 261
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 261 insertions(+)
>>   create mode 100644 include/odp/api/ipc.h
>>
>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>> new file mode 100644
>> index 0000000..3395a34
>> --- /dev/null
>> +++ b/include/odp/api/ipc.h
>> @@ -0,0 +1,261 @@
>> +/* Copyright (c) 2015, Linaro Limited
>> + * All rights reserved.
>> + *
>> + * SPDX-License-Identifier:     BSD-3-Clause
>> + */
>> +
>> +
>> +/**
>> + * @file
>> + *
>> + * ODP IPC API
>> + */
>> +
>> +#ifndef ODP_API_IPC_H_
>> +#define ODP_API_IPC_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/** @defgroup odp_ipc ODP IPC
>> + *  @{
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_t
>> + * ODP IPC handle
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_msg_t
>> + * ODP IPC message handle
>> + */
>> +
>> +
>> +/**
>> + * @def ODP_IPC_ADDR_SIZE
>> + * Size of the address of an IPC endpoint
>> + */
>> +
>> +/**
>> + * Create IPC endpoint
>> + *
>> + * @param name Name of local IPC endpoint
>> + * @param pool Pool for incoming messages
>> + *
>> + * @return IPC handle on success
>> + * @retval ODP_IPC_INVALID on failure and errno set
>> + */
>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> +
>> +/**
>> + * Destroy IPC endpoint
>> + *
>> + * @param ipc IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_destroy(odp_ipc_t ipc);
>> +
>> +/**
>> + * Set the default input queue for an IPC endpoint
>> + *
>> + * @param ipc   IPC handle
>> + * @param queue Queue handle
>> + *
>> + * @retval  0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> +
>> +/**
>> + * Remove the default input queue
>> + *
>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>> + * The queue itself is not touched.
>> + *
>> + * @param ipc  IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>> +
>> +/**
>> + * Resolve endpoint by name
>> + *
>> + * Look up an existing or future endpoint by name.
>> + * When the endpoint exists, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * @param ipc IPC handle
>> + * @param name Name to resolve
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_resolve(odp_ipc_t ipc,
>> +                    const char *name,
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Monitor endpoint
>> + *
>> + * Monitor an existing (potentially already dead) endpoint.
>> + * When the endpoint is dead, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * Unrecognized or invalid endpoint addresses are treated as dead
>> endpoints.
>> + *
>> + * @param ipc IPC handle
>> + * @param addr Address of monitored endpoint
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_monitor(odp_ipc_t ipc,
>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Send message
>> + *
>> + * Send a message to an endpoint (which may already be dead).
>> + * Message delivery is ordered and reliable. All (accepted) messages will
>> be
>> + * delivered up to the point of endpoint death or lost connection.
>> + * Actual reception and processing is not guaranteed (use end-to-end
>> + * acknowledgements for that).
>> + * Monitor the remote endpoint to detect death or lost connection.
>> + *
>> + * @param ipc IPC handle
>> + * @param msg Message to send
>> + * @param addr Address of remote endpoint
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_send(odp_ipc_t ipc,
>> +                odp_ipc_msg_t msg,
>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Get address of sender (source) of message
>> + *
>> + * @param msg Message handle
>> + * @param addr Address of sender endpoint
>> + */
>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Message data pointer
>> + *
>> + * Return a pointer to the message data
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Pointer to the message data
>> + */
>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Message data length
>> + *
>> + * Return length of the message data.
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Message length
>> + */
>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Set message length
>> + *
>> + * Set length of the message data.
>> + *
>> + * @param msg Message handle
>> + * @param len New length
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> +
>> +/**
>> + * Allocate message
>> + *
>> + * Allocate a message of a specific size.
>> + *
>> + * @param pool Message pool to allocate message from
>> + * @param len Length of the allocated message
>> + *
>> + * @return IPC message handle on success
>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>> + */
>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>> +
>> +/**
>> + * Free message
>> + *
>> + * Free message back to the message pool it was allocated from.
>> + *
>> + * @param msg Handle of message to free
>> + */
>> +void odp_ipc_free(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get message handle from event
>> + *
>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>> + *
>> + * @param ev   Event handle
>> + *
>> + * @return Message handle
>> + *
>> + * @see odp_event_type()
>> + */
>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>> +
>> +/**
>> + * Convert message handle to event
>> + *
>> + * @param msg  Message handle
>> + *
>> + * @return Event handle
>> + */
>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_t
>> + *
>> + * @param ipc  IPC handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_t handle.
>> + */
>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_msg_t
>> + *
>> + * @param msg  Message handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_msg_t handle.
>> + */
>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * @}
>> + */
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>>
>
>
> --
> Benoît GANNE
> Field Application Engineer, Kalray
> +33 (0)648 125 843

Ola Liljedahl May 20, 2015, 10:21 a.m. UTC | #6

On 20 May 2015 at 10:25, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
> Hello Ola,
>
> please find some replays bellow.
>
> On 19 May 2015 at 01:03, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>> As promised, here is my first attempt at a standalone API for IPC - inter
>> process communication in a shared nothing architecture (message passing
>> between processes which do not share memory).
>>
>> Currently all definitions are in the file ipc.h but it is possible to
>> break out some message/event related definitions (everything from
>> odp_ipc_sender) in a separate file message.h. This would mimic the
>> packet_io.h/packet.h separation.
>>
>> The semantics of message passing is that sending a message to an endpoint
>> will always look like it succeeds. The appearance of endpoints is
>> explicitly
>> notified through user-defined messages specified in the odp_ipc_resolve()
>> call. Similarly, the disappearance (e.g. death or otherwise lost
>> connection)
>> is also explicitly notified through user-defined messages specified in the
>> odp_ipc_monitor() call. The send call does not fail because the addressed
>> endpoints has disappeared.
>>
>> Messages (from endpoint A to endpoint B) are delivered in order. If
>> message
>> N sent to an endpoint is delivered, then all messages <N have also been
>> delivered. Message delivery does not guarantee actual processing by the
>> recipient. End-to-end acknowledgements (using messages) should be used if
>> this guarantee is important to the user.
>>
>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> multidrop network where each endpoint has a unique address which is only
>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>> and then recreated (with the same name), the new endpoint will have a
>> new address (eventually endpoints addresses will have to be recycled but
>> not for a very long time). Endpoints names do not necessarily have to be
>> unique.
>>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> (This document/code contribution attached is provided under the terms of
>> agreement LES-LTM-21309)
>>
>>  include/odp/api/ipc.h | 261
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 261 insertions(+)
>>  create mode 100644 include/odp/api/ipc.h
>>
>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>> new file mode 100644
>> index 0000000..3395a34
>> --- /dev/null
>> +++ b/include/odp/api/ipc.h
>> @@ -0,0 +1,261 @@
>> +/* Copyright (c) 2015, Linaro Limited
>> + * All rights reserved.
>> + *
>> + * SPDX-License-Identifier:     BSD-3-Clause
>> + */
>> +
>> +
>> +/**
>> + * @file
>> + *
>> + * ODP IPC API
>> + */
>> +
>> +#ifndef ODP_API_IPC_H_
>> +#define ODP_API_IPC_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/** @defgroup odp_ipc ODP IPC
>> + *  @{
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_t
>> + * ODP IPC handle
>> + */
>> +
>> +/**
>> + * @typedef odp_ipc_msg_t
>> + * ODP IPC message handle
>> + */
>> +
>> +
>> +/**
>> + * @def ODP_IPC_ADDR_SIZE
>> + * Size of the address of an IPC endpoint
>> + */
>> +
>> +/**
>> + * Create IPC endpoint
>> + *
>> + * @param name Name of local IPC endpoint
>> + * @param pool Pool for incoming messages
>> + *
>> + * @return IPC handle on success
>> + * @retval ODP_IPC_INVALID on failure and errno set
>> + */
>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> +
>
>
> Will it be separate pool or you want to attach existing pool to your ipc?
It should be a pool with events of type messages (ODP_EVENT_MESSAGE)
so in practice it will likely be a separate pool.

>
>
>>
>> +/**
>> + * Destroy IPC endpoint
>> + *
>> + * @param ipc IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_destroy(odp_ipc_t ipc);
>> +
>> +/**
>> + * Set the default input queue for an IPC endpoint
>> + *
>> + * @param ipc   IPC handle
>> + * @param queue Queue handle
>> + *
>> + * @retval  0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> +
>> +/**
>> + * Remove the default input queue
>> + *
>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>> + * The queue itself is not touched.
>> + *
>> + * @param ipc  IPC handle
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>> +
>> +/**
>> + * Resolve endpoint by name
>> + *
>> + * Look up an existing or future endpoint by name.
>> + * When the endpoint exists, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * @param ipc IPC handle
>> + * @param name Name to resolve
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_resolve(odp_ipc_t ipc,
>> +                    const char *name,
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Monitor endpoint
>> + *
>> + * Monitor an existing (potentially already dead) endpoint.
>> + * When the endpoint is dead, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * Unrecognized or invalid endpoint addresses are treated as dead
>> endpoints.
>> + *
>> + * @param ipc IPC handle
>> + * @param addr Address of monitored endpoint
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_monitor(odp_ipc_t ipc,
>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> +                    odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Send message
>> + *
>> + * Send a message to an endpoint (which may already be dead).
>> + * Message delivery is ordered and reliable. All (accepted) messages will
>> be
>> + * delivered up to the point of endpoint death or lost connection.
>> + * Actual reception and processing is not guaranteed (use end-to-end
>> + * acknowledgements for that).
>> + * Monitor the remote endpoint to detect death or lost connection.
>> + *
>> + * @param ipc IPC handle
>> + * @param msg Message to send
>> + * @param addr Address of remote endpoint
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_send(odp_ipc_t ipc,
>> +                odp_ipc_msg_t msg,
>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Get address of sender (source) of message
>> + *
>> + * @param msg Message handle
>> + * @param addr Address of sender endpoint
>> + */
>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Message data pointer
>> + *
>> + * Return a pointer to the message data
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Pointer to the message data
>> + */
>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Message data length
>> + *
>> + * Return length of the message data.
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Message length
>> + */
>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Set message length
>> + *
>> + * Set length of the message data.
>> + *
>> + * @param msg Message handle
>> + * @param len New length
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>
>
> reset usually stand to reset to some default value. In that case it's better
> to be odp_ipc_len_set(msg, len).
Well I copied this from packet.h where odp_packet_reset(pkt, len)
seemed to be equivalent to what I want to achieve but I agree that
odp_ipc_len_set() is more direct. I can change.

> And reset is just void inline function to reset to default value
> odp_ipc_reset(msg).
> And len might be is not needed here because you transfer events. So just get
The IPC mechanism transfers messages. And different messages have
different lengths.
> size from event.
The message size specified when creating a message pool that is
associated with an IPC endpoint is basically an MTU (the largest
message size that can be sent or received).

>
> Might be something like:
>
> msg.event = ev; // len will be get from event
> msg.remote = addr;
I don't understand what you are trying to describe here.

>
>
>
>>
>> +
>> +/**
>> + * Allocate message
>> + *
>> + * Allocate a message of a specific size.
>> + *
>> + * @param pool Message pool to allocate message from
>> + * @param len Length of the allocated message
>> + *
>> + * @return IPC message handle on success
>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>> + */
>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>> +
>
>
> is len  sizeof(odp_ipc_msg_t) here?
No. odp_ipc_msg_t is a handle to a message, just like odp_packet_t is
a handle to a packet.
The implementation of odp_ipc_msg_t is platform specific and thus
sizeof(odp_ipc_msg_t) as well (but 32 or 64 bits is a good guess).

The len parameter in odp_ipc_alloc() specifies the current/actual
length of the message which depends on what the application wants to
use that specific message for.

>
>
>>
>> +/**
>> + * Free message
>> + *
>> + * Free message back to the message pool it was allocated from.
>> + *
>> + * @param msg Handle of message to free
>> + */
>> +void odp_ipc_free(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Get message handle from event
>> + *
>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>> + *
>> + * @param ev   Event handle
>> + *
>> + * @return Message handle
>> + *
>> + * @see odp_event_type()
>> + */
>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>> +
>> +/**
>> + * Convert message handle to event
>> + *
>> + * @param msg  Message handle
>> + *
>> + * @return Event handle
>> + */
>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>> +
>
>
> Most of functions look like pktio. But that version of IPC looks like fully
> software implementation.
I think some SoC's could use HW mechanism to transfer messages (which
are basically buffers with individual lengths) between endpoints (e.g.
different address spaces).

> Idea is good - send and receive odp events.
Send and receive ODP *messages*, not any type of ODP events.

> It looks like  odp_ipc_t ipc
> object also connected to
> some odp_pktio_t where actual packets will be sent. And it will be
> interesting to make queues interface
> work with that ipc.
>
> Might be:
> odp_pktio_t pktio can have ipc atribute like
>
> odp_ipc_pktio_set(odp_ipc_t ipc, odp_pktio_t pktio);
Now you are down to implementation details. Why should this be part of
the public API?


>
> then queue created for that pktio can send/recv packets from outside.
I think you can already do this with the current packet_io API. No
need to conflate this with IPC/message passing.

>
> If I understand right, that functions use queue only for ipc messages:
> int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> int odp_ipc_inq_remdef(odp_ipc_t ipc);
You want to be able to receive and dispatch IPC messages just like any
other type of events so incoming messages should be put on queues. The
queue could be specified when the IPC endpoint is created but I copied
the packet_io API where it was possible.

> and they are not linked with outside pktio.  Earlier you said that propose
> to send some packets to some internal tap device.
No not packets, messages. And the reference to "tap" was in a general
sense, not related to Linux tap interfaces.

> And in that case tap
> device can be pktio. And then this pktio-tap connected to ipc object to
> dispatch events from it.

>
> And probably we need better names for our ipc implementation. Mine can be
> odp_pktio_ipc and your can be odp_ipcmsg_ because the propose is different.
Actually I don't understand why your packet pipes need to change the
ODP API. They are just pktio interfaces with different semantics. By
using different pktio device names (which anyway are platform
specific), you create packet pipes instead of opening physical network
interfaces. But they behave the same from the user perspective. Or?

>
>
>
>
>>
>> +/**
>> + * Get printable value for an odp_ipc_t
>> + *
>> + * @param ipc  IPC handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_t handle.
>> + */
>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>> +
>> +/**
>> + * Get printable value for an odp_ipc_msg_t
>> + *
>> + * @param msg  Message handle to be printed
>> + * @return     uint64_t value that can be used to print/display this
>> + *             handle
>> + *
>> + * @note This routine is intended to be used for diagnostic purposes
>> + * to enable applications to generate a printable value that represents
>> + * an odp_ipc_msg_t handle.
>> + */
>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * @}
>> + */
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> --
>> 1.9.1
>
>
>
> After writing replay I came with idea why not build one pktio based on other
> pktio?
> In that case eth0_pktio = ocp_create_pktio("eth0") can be used for external
> packets exchange and
> ipc_eth0_pktio = odp_create_pktio(eth0_pktio, ..).
> So read from eth0_pktio will get only outside packets and read from
> ipc_eth0_pktio can deliver only
> ipc packets.
There are no IPC packets, just IPC messages.

> In that case you do not need to implement transport layer and
> just do functionality above
> on current pktio.
IPC/message passing has different semantics (e.g. reliable/in-order)
from network interfaces.
IPC/message passing is for communication between local entities (e.g.
applications on the same host), not for communicating with external
entities over some physical network.
So I see several serious incompatibilities and I see no reason to
build IPC/message passing on top of some physical network interface
and its representation in ODP.

>
> Thanks,
> Maxim.
>
>
>
>
>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Maxim Uvarov May 20, 2015, 10:47 a.m. UTC | #7

On 20 May 2015 at 13:21, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 20 May 2015 at 10:25, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
> > Hello Ola,
> >
> > please find some replays bellow.
> >
> > On 19 May 2015 at 01:03, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
> >>
> >> As promised, here is my first attempt at a standalone API for IPC -
> inter
> >> process communication in a shared nothing architecture (message passing
> >> between processes which do not share memory).
> >>
> >> Currently all definitions are in the file ipc.h but it is possible to
> >> break out some message/event related definitions (everything from
> >> odp_ipc_sender) in a separate file message.h. This would mimic the
> >> packet_io.h/packet.h separation.
> >>
> >> The semantics of message passing is that sending a message to an
> endpoint
> >> will always look like it succeeds. The appearance of endpoints is
> >> explicitly
> >> notified through user-defined messages specified in the
> odp_ipc_resolve()
> >> call. Similarly, the disappearance (e.g. death or otherwise lost
> >> connection)
> >> is also explicitly notified through user-defined messages specified in
> the
> >> odp_ipc_monitor() call. The send call does not fail because the
> addressed
> >> endpoints has disappeared.
> >>
> >> Messages (from endpoint A to endpoint B) are delivered in order. If
> >> message
> >> N sent to an endpoint is delivered, then all messages <N have also been
> >> delivered. Message delivery does not guarantee actual processing by the
> >> recipient. End-to-end acknowledgements (using messages) should be used
> if
> >> this guarantee is important to the user.
> >>
> >> IPC endpoints can be seen as interfaces (taps) to an internal reliable
> >> multidrop network where each endpoint has a unique address which is only
> >> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> >> and then recreated (with the same name), the new endpoint will have a
> >> new address (eventually endpoints addresses will have to be recycled but
> >> not for a very long time). Endpoints names do not necessarily have to be
> >> unique.
> >>
> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> >> ---
> >> (This document/code contribution attached is provided under the terms of
> >> agreement LES-LTM-21309)
> >>
> >>  include/odp/api/ipc.h | 261
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 261 insertions(+)
> >>  create mode 100644 include/odp/api/ipc.h
> >>
> >> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
> >> new file mode 100644
> >> index 0000000..3395a34
> >> --- /dev/null
> >> +++ b/include/odp/api/ipc.h
> >> @@ -0,0 +1,261 @@
> >> +/* Copyright (c) 2015, Linaro Limited
> >> + * All rights reserved.
> >> + *
> >> + * SPDX-License-Identifier:     BSD-3-Clause
> >> + */
> >> +
> >> +
> >> +/**
> >> + * @file
> >> + *
> >> + * ODP IPC API
> >> + */
> >> +
> >> +#ifndef ODP_API_IPC_H_
> >> +#define ODP_API_IPC_H_
> >> +
> >> +#ifdef __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >> +/** @defgroup odp_ipc ODP IPC
> >> + *  @{
> >> + */
> >> +
> >> +/**
> >> + * @typedef odp_ipc_t
> >> + * ODP IPC handle
> >> + */
> >> +
> >> +/**
> >> + * @typedef odp_ipc_msg_t
> >> + * ODP IPC message handle
> >> + */
> >> +
> >> +
> >> +/**
> >> + * @def ODP_IPC_ADDR_SIZE
> >> + * Size of the address of an IPC endpoint
> >> + */
> >> +
> >> +/**
> >> + * Create IPC endpoint
> >> + *
> >> + * @param name Name of local IPC endpoint
> >> + * @param pool Pool for incoming messages
> >> + *
> >> + * @return IPC handle on success
> >> + * @retval ODP_IPC_INVALID on failure and errno set
> >> + */
> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
> >> +
> >
> >
> > Will it be separate pool or you want to attach existing pool to your ipc?
> It should be a pool with events of type messages (ODP_EVENT_MESSAGE)
> so in practice it will likely be a separate pool.
>
> >
> >
> >>
> >> +/**
> >> + * Destroy IPC endpoint
> >> + *
> >> + * @param ipc IPC handle
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on failure
> >> + */
> >> +int odp_ipc_destroy(odp_ipc_t ipc);
> >> +
> >> +/**
> >> + * Set the default input queue for an IPC endpoint
> >> + *
> >> + * @param ipc   IPC handle
> >> + * @param queue Queue handle
> >> + *
> >> + * @retval  0 on success
> >> + * @retval <0 on failure
> >> + */
> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> >> +
> >> +/**
> >> + * Remove the default input queue
> >> + *
> >> + * Remove (disassociate) the default input queue from an IPC endpoint.
> >> + * The queue itself is not touched.
> >> + *
> >> + * @param ipc  IPC handle
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on failure
> >> + */
> >> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
> >> +
> >> +/**
> >> + * Resolve endpoint by name
> >> + *
> >> + * Look up an existing or future endpoint by name.
> >> + * When the endpoint exists, return the specified message with the
> >> endpoint
> >> + * as the sender.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param name Name to resolve
> >> + * @param msg Message to return
> >> + */
> >> +void odp_ipc_resolve(odp_ipc_t ipc,
> >> +                    const char *name,
> >> +                    odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Monitor endpoint
> >> + *
> >> + * Monitor an existing (potentially already dead) endpoint.
> >> + * When the endpoint is dead, return the specified message with the
> >> endpoint
> >> + * as the sender.
> >> + *
> >> + * Unrecognized or invalid endpoint addresses are treated as dead
> >> endpoints.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param addr Address of monitored endpoint
> >> + * @param msg Message to return
> >> + */
> >> +void odp_ipc_monitor(odp_ipc_t ipc,
> >> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
> >> +                    odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Send message
> >> + *
> >> + * Send a message to an endpoint (which may already be dead).
> >> + * Message delivery is ordered and reliable. All (accepted) messages
> will
> >> be
> >> + * delivered up to the point of endpoint death or lost connection.
> >> + * Actual reception and processing is not guaranteed (use end-to-end
> >> + * acknowledgements for that).
> >> + * Monitor the remote endpoint to detect death or lost connection.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param msg Message to send
> >> + * @param addr Address of remote endpoint
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on error
> >> + */
> >> +int odp_ipc_send(odp_ipc_t ipc,
> >> +                odp_ipc_msg_t msg,
> >> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >> +
> >> +/**
> >> + * Get address of sender (source) of message
> >> + *
> >> + * @param msg Message handle
> >> + * @param addr Address of sender endpoint
> >> + */
> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
> >> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >> +
> >> +/**
> >> + * Message data pointer
> >> + *
> >> + * Return a pointer to the message data
> >> + *
> >> + * @param msg Message handle
> >> + *
> >> + * @return Pointer to the message data
> >> + */
> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Message data length
> >> + *
> >> + * Return length of the message data.
> >> + *
> >> + * @param msg Message handle
> >> + *
> >> + * @return Message length
> >> + */
> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Set message length
> >> + *
> >> + * Set length of the message data.
> >> + *
> >> + * @param msg Message handle
> >> + * @param len New length
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on error
> >> + */
> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
> >
> >
> > reset usually stand to reset to some default value. In that case it's
> better
> > to be odp_ipc_len_set(msg, len).
> Well I copied this from packet.h where odp_packet_reset(pkt, len)
> seemed to be equivalent to what I want to achieve but I agree that
> odp_ipc_len_set() is more direct. I can change.
>
> > And reset is just void inline function to reset to default value
> > odp_ipc_reset(msg).
> > And len might be is not needed here because you transfer events. So just
> get
> The IPC mechanism transfers messages. And different messages have
> different lengths.
> > size from event.
> The message size specified when creating a message pool that is
> associated with an IPC endpoint is basically an MTU (the largest
> message size that can be sent or received).
>
> >
> > Might be something like:
> >
> > msg.event = ev; // len will be get from event
> > msg.remote = addr;
> I don't understand what you are trying to describe here.
>
> >
> >
> >
> >>
> >> +
> >> +/**
> >> + * Allocate message
> >> + *
> >> + * Allocate a message of a specific size.
> >> + *
> >> + * @param pool Message pool to allocate message from
> >> + * @param len Length of the allocated message
> >> + *
> >> + * @return IPC message handle on success
> >> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
> >> + */
> >> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
> >> +
> >
> >
> > is len  sizeof(odp_ipc_msg_t) here?
> No. odp_ipc_msg_t is a handle to a message, just like odp_packet_t is
> a handle to a packet.
> The implementation of odp_ipc_msg_t is platform specific and thus
> sizeof(odp_ipc_msg_t) as well (but 32 or 64 bits is a good guess).
>
> The len parameter in odp_ipc_alloc() specifies the current/actual
> length of the message which depends on what the application wants to
> use that specific message for.
>
> >
> >
> >>
> >> +/**
> >> + * Free message
> >> + *
> >> + * Free message back to the message pool it was allocated from.
> >> + *
> >> + * @param msg Handle of message to free
> >> + */
> >> +void odp_ipc_free(odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Get message handle from event
> >> + *
> >> + * Converts an ODP_EVENT_MESSAGE type event to a message.
> >> + *
> >> + * @param ev   Event handle
> >> + *
> >> + * @return Message handle
> >> + *
> >> + * @see odp_event_type()
> >> + */
> >> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
> >> +
> >> +/**
> >> + * Convert message handle to event
> >> + *
> >> + * @param msg  Message handle
> >> + *
> >> + * @return Event handle
> >> + */
> >> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
> >> +
> >
> >
> > Most of functions look like pktio. But that version of IPC looks like
> fully
> > software implementation.
> I think some SoC's could use HW mechanism to transfer messages (which
> are basically buffers with individual lengths) between endpoints (e.g.
> different address spaces).
>
> > Idea is good - send and receive odp events.
> Send and receive ODP *messages*, not any type of ODP events.
>
> > It looks like  odp_ipc_t ipc
> > object also connected to
> > some odp_pktio_t where actual packets will be sent. And it will be
> > interesting to make queues interface
> > work with that ipc.
> >
> > Might be:
> > odp_pktio_t pktio can have ipc atribute like
> >
> > odp_ipc_pktio_set(odp_ipc_t ipc, odp_pktio_t pktio);
> Now you are down to implementation details. Why should this be part of
> the public API?
>
>
> >
> > then queue created for that pktio can send/recv packets from outside.
> I think you can already do this with the current packet_io API. No
> need to conflate this with IPC/message passing.
>
> >
> > If I understand right, that functions use queue only for ipc messages:
> > int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> > int odp_ipc_inq_remdef(odp_ipc_t ipc);
> You want to be able to receive and dispatch IPC messages just like any
> other type of events so incoming messages should be put on queues. The
> queue could be specified when the IPC endpoint is created but I copied
> the packet_io API where it was possible.
>
> > and they are not linked with outside pktio.  Earlier you said that
> propose
> > to send some packets to some internal tap device.
> No not packets, messages. And the reference to "tap" was in a general
> sense, not related to Linux tap interfaces.
>
> > And in that case tap
> > device can be pktio. And then this pktio-tap connected to ipc object to
> > dispatch events from it.
>
> >
> > And probably we need better names for our ipc implementation. Mine can be
> > odp_pktio_ipc and your can be odp_ipcmsg_ because the propose is
> different.
> Actually I don't understand why your packet pipes need to change the
> ODP API. They are just pktio interfaces with different semantics. By
> using different pktio device names (which anyway are platform
> specific), you create packet pipes instead of opening physical network
> interfaces. But they behave the same from the user perspective. Or?
>
>
Yes in my case is only pipes and it's just pktio with different name. No
difference to application how to use this pktio.


> >
> >
> >
> >
> >>
> >> +/**
> >> + * Get printable value for an odp_ipc_t
> >> + *
> >> + * @param ipc  IPC handle to be printed
> >> + * @return     uint64_t value that can be used to print/display this
> >> + *             handle
> >> + *
> >> + * @note This routine is intended to be used for diagnostic purposes
> >> + * to enable applications to generate a printable value that represents
> >> + * an odp_ipc_t handle.
> >> + */
> >> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
> >> +
> >> +/**
> >> + * Get printable value for an odp_ipc_msg_t
> >> + *
> >> + * @param msg  Message handle to be printed
> >> + * @return     uint64_t value that can be used to print/display this
> >> + *             handle
> >> + *
> >> + * @note This routine is intended to be used for diagnostic purposes
> >> + * to enable applications to generate a printable value that represents
> >> + * an odp_ipc_msg_t handle.
> >> + */
> >> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * @}
> >> + */
> >> +
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +#endif
> >> --
> >> 1.9.1
> >
> >
> >
> > After writing replay I came with idea why not build one pktio based on
> other
> > pktio?
> > In that case eth0_pktio = ocp_create_pktio("eth0") can be used for
> external
> > packets exchange and
> > ipc_eth0_pktio = odp_create_pktio(eth0_pktio, ..).
> > So read from eth0_pktio will get only outside packets and read from
> > ipc_eth0_pktio can deliver only
> > ipc packets.
> There are no IPC packets, just IPC messages.
>
> > In that case you do not need to implement transport layer and
> > just do functionality above
> > on current pktio.
> IPC/message passing has different semantics (e.g. reliable/in-order)
> from network interfaces.
> IPC/message passing is for communication between local entities (e.g.
> applications on the same host), not for communicating with external
> entities over some physical network.
> So I see several serious incompatibilities and I see no reason to
> build IPC/message passing on top of some physical network interface
> and its representation in ODP.
>
>
Ok, I understand that it's exchange with some messages on one machine. I
just not sure why conrtol plane and data plane be different host. Like ODL
where  OVS on one machine and controller on other.

As I understand you don't what not transfer any odp events. Just ipc
messages. Btw what is inside that messages? How that message looks like?

Maxim.




> >
> > Thanks,
> > Maxim.
> >
> >
> >
> >
> >
> >>
> >>
> >> _______________________________________________
> >> lng-odp mailing list
> >> lng-odp@lists.linaro.org
> >> https://lists.linaro.org/mailman/listinfo/lng-odp
> >
> >
>

Ola Liljedahl May 20, 2015, 11:11 a.m. UTC | #8

On 20 May 2015 at 12:47, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>
>
> On 20 May 2015 at 13:21, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>> On 20 May 2015 at 10:25, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>> > Hello Ola,
>> >
>> > please find some replays bellow.
>> >
>> > On 19 May 2015 at 01:03, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>> >>
>> >> As promised, here is my first attempt at a standalone API for IPC -
>> >> inter
>> >> process communication in a shared nothing architecture (message passing
>> >> between processes which do not share memory).
>> >>
>> >> Currently all definitions are in the file ipc.h but it is possible to
>> >> break out some message/event related definitions (everything from
>> >> odp_ipc_sender) in a separate file message.h. This would mimic the
>> >> packet_io.h/packet.h separation.
>> >>
>> >> The semantics of message passing is that sending a message to an
>> >> endpoint
>> >> will always look like it succeeds. The appearance of endpoints is
>> >> explicitly
>> >> notified through user-defined messages specified in the
>> >> odp_ipc_resolve()
>> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>> >> connection)
>> >> is also explicitly notified through user-defined messages specified in
>> >> the
>> >> odp_ipc_monitor() call. The send call does not fail because the
>> >> addressed
>> >> endpoints has disappeared.
>> >>
>> >> Messages (from endpoint A to endpoint B) are delivered in order. If
>> >> message
>> >> N sent to an endpoint is delivered, then all messages <N have also been
>> >> delivered. Message delivery does not guarantee actual processing by the
>> >> recipient. End-to-end acknowledgements (using messages) should be used
>> >> if
>> >> this guarantee is important to the user.
>> >>
>> >> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> >> multidrop network where each endpoint has a unique address which is
>> >> only
>> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>> >> destroyed
>> >> and then recreated (with the same name), the new endpoint will have a
>> >> new address (eventually endpoints addresses will have to be recycled
>> >> but
>> >> not for a very long time). Endpoints names do not necessarily have to
>> >> be
>> >> unique.
>> >>
>> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> >> ---
>> >> (This document/code contribution attached is provided under the terms
>> >> of
>> >> agreement LES-LTM-21309)
>> >>
>> >>  include/odp/api/ipc.h | 261
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>  1 file changed, 261 insertions(+)
>> >>  create mode 100644 include/odp/api/ipc.h
>> >>
>> >> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>> >> new file mode 100644
>> >> index 0000000..3395a34
>> >> --- /dev/null
>> >> +++ b/include/odp/api/ipc.h
>> >> @@ -0,0 +1,261 @@
>> >> +/* Copyright (c) 2015, Linaro Limited
>> >> + * All rights reserved.
>> >> + *
>> >> + * SPDX-License-Identifier:     BSD-3-Clause
>> >> + */
>> >> +
>> >> +
>> >> +/**
>> >> + * @file
>> >> + *
>> >> + * ODP IPC API
>> >> + */
>> >> +
>> >> +#ifndef ODP_API_IPC_H_
>> >> +#define ODP_API_IPC_H_
>> >> +
>> >> +#ifdef __cplusplus
>> >> +extern "C" {
>> >> +#endif
>> >> +
>> >> +/** @defgroup odp_ipc ODP IPC
>> >> + *  @{
>> >> + */
>> >> +
>> >> +/**
>> >> + * @typedef odp_ipc_t
>> >> + * ODP IPC handle
>> >> + */
>> >> +
>> >> +/**
>> >> + * @typedef odp_ipc_msg_t
>> >> + * ODP IPC message handle
>> >> + */
>> >> +
>> >> +
>> >> +/**
>> >> + * @def ODP_IPC_ADDR_SIZE
>> >> + * Size of the address of an IPC endpoint
>> >> + */
>> >> +
>> >> +/**
>> >> + * Create IPC endpoint
>> >> + *
>> >> + * @param name Name of local IPC endpoint
>> >> + * @param pool Pool for incoming messages
>> >> + *
>> >> + * @return IPC handle on success
>> >> + * @retval ODP_IPC_INVALID on failure and errno set
>> >> + */
>> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> >> +
>> >
>> >
>> > Will it be separate pool or you want to attach existing pool to your
>> > ipc?
>> It should be a pool with events of type messages (ODP_EVENT_MESSAGE)
>> so in practice it will likely be a separate pool.
>>
>> >
>> >
>> >>
>> >> +/**
>> >> + * Destroy IPC endpoint
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on failure
>> >> + */
>> >> +int odp_ipc_destroy(odp_ipc_t ipc);
>> >> +
>> >> +/**
>> >> + * Set the default input queue for an IPC endpoint
>> >> + *
>> >> + * @param ipc   IPC handle
>> >> + * @param queue Queue handle
>> >> + *
>> >> + * @retval  0 on success
>> >> + * @retval <0 on failure
>> >> + */
>> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> >> +
>> >> +/**
>> >> + * Remove the default input queue
>> >> + *
>> >> + * Remove (disassociate) the default input queue from an IPC endpoint.
>> >> + * The queue itself is not touched.
>> >> + *
>> >> + * @param ipc  IPC handle
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on failure
>> >> + */
>> >> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>> >> +
>> >> +/**
>> >> + * Resolve endpoint by name
>> >> + *
>> >> + * Look up an existing or future endpoint by name.
>> >> + * When the endpoint exists, return the specified message with the
>> >> endpoint
>> >> + * as the sender.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param name Name to resolve
>> >> + * @param msg Message to return
>> >> + */
>> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>> >> +                    const char *name,
>> >> +                    odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Monitor endpoint
>> >> + *
>> >> + * Monitor an existing (potentially already dead) endpoint.
>> >> + * When the endpoint is dead, return the specified message with the
>> >> endpoint
>> >> + * as the sender.
>> >> + *
>> >> + * Unrecognized or invalid endpoint addresses are treated as dead
>> >> endpoints.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param addr Address of monitored endpoint
>> >> + * @param msg Message to return
>> >> + */
>> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>> >> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> >> +                    odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Send message
>> >> + *
>> >> + * Send a message to an endpoint (which may already be dead).
>> >> + * Message delivery is ordered and reliable. All (accepted) messages
>> >> will
>> >> be
>> >> + * delivered up to the point of endpoint death or lost connection.
>> >> + * Actual reception and processing is not guaranteed (use end-to-end
>> >> + * acknowledgements for that).
>> >> + * Monitor the remote endpoint to detect death or lost connection.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param msg Message to send
>> >> + * @param addr Address of remote endpoint
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on error
>> >> + */
>> >> +int odp_ipc_send(odp_ipc_t ipc,
>> >> +                odp_ipc_msg_t msg,
>> >> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >> +
>> >> +/**
>> >> + * Get address of sender (source) of message
>> >> + *
>> >> + * @param msg Message handle
>> >> + * @param addr Address of sender endpoint
>> >> + */
>> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> >> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >> +
>> >> +/**
>> >> + * Message data pointer
>> >> + *
>> >> + * Return a pointer to the message data
>> >> + *
>> >> + * @param msg Message handle
>> >> + *
>> >> + * @return Pointer to the message data
>> >> + */
>> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Message data length
>> >> + *
>> >> + * Return length of the message data.
>> >> + *
>> >> + * @param msg Message handle
>> >> + *
>> >> + * @return Message length
>> >> + */
>> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Set message length
>> >> + *
>> >> + * Set length of the message data.
>> >> + *
>> >> + * @param msg Message handle
>> >> + * @param len New length
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on error
>> >> + */
>> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> >
>> >
>> > reset usually stand to reset to some default value. In that case it's
>> > better
>> > to be odp_ipc_len_set(msg, len).
>> Well I copied this from packet.h where odp_packet_reset(pkt, len)
>> seemed to be equivalent to what I want to achieve but I agree that
>> odp_ipc_len_set() is more direct. I can change.
>>
>> > And reset is just void inline function to reset to default value
>> > odp_ipc_reset(msg).
>> > And len might be is not needed here because you transfer events. So just
>> > get
>> The IPC mechanism transfers messages. And different messages have
>> different lengths.
>> > size from event.
>> The message size specified when creating a message pool that is
>> associated with an IPC endpoint is basically an MTU (the largest
>> message size that can be sent or received).
>>
>> >
>> > Might be something like:
>> >
>> > msg.event = ev; // len will be get from event
>> > msg.remote = addr;
>> I don't understand what you are trying to describe here.
>>
>> >
>> >
>> >
>> >>
>> >> +
>> >> +/**
>> >> + * Allocate message
>> >> + *
>> >> + * Allocate a message of a specific size.
>> >> + *
>> >> + * @param pool Message pool to allocate message from
>> >> + * @param len Length of the allocated message
>> >> + *
>> >> + * @return IPC message handle on success
>> >> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>> >> + */
>> >> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>> >> +
>> >
>> >
>> > is len  sizeof(odp_ipc_msg_t) here?
>> No. odp_ipc_msg_t is a handle to a message, just like odp_packet_t is
>> a handle to a packet.
>> The implementation of odp_ipc_msg_t is platform specific and thus
>> sizeof(odp_ipc_msg_t) as well (but 32 or 64 bits is a good guess).
>>
>> The len parameter in odp_ipc_alloc() specifies the current/actual
>> length of the message which depends on what the application wants to
>> use that specific message for.
>>
>> >
>> >
>> >>
>> >> +/**
>> >> + * Free message
>> >> + *
>> >> + * Free message back to the message pool it was allocated from.
>> >> + *
>> >> + * @param msg Handle of message to free
>> >> + */
>> >> +void odp_ipc_free(odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Get message handle from event
>> >> + *
>> >> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>> >> + *
>> >> + * @param ev   Event handle
>> >> + *
>> >> + * @return Message handle
>> >> + *
>> >> + * @see odp_event_type()
>> >> + */
>> >> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>> >> +
>> >> +/**
>> >> + * Convert message handle to event
>> >> + *
>> >> + * @param msg  Message handle
>> >> + *
>> >> + * @return Event handle
>> >> + */
>> >> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>> >> +
>> >
>> >
>> > Most of functions look like pktio. But that version of IPC looks like
>> > fully
>> > software implementation.
>> I think some SoC's could use HW mechanism to transfer messages (which
>> are basically buffers with individual lengths) between endpoints (e.g.
>> different address spaces).
>>
>> > Idea is good - send and receive odp events.
>> Send and receive ODP *messages*, not any type of ODP events.
>>
>> > It looks like  odp_ipc_t ipc
>> > object also connected to
>> > some odp_pktio_t where actual packets will be sent. And it will be
>> > interesting to make queues interface
>> > work with that ipc.
>> >
>> > Might be:
>> > odp_pktio_t pktio can have ipc atribute like
>> >
>> > odp_ipc_pktio_set(odp_ipc_t ipc, odp_pktio_t pktio);
>> Now you are down to implementation details. Why should this be part of
>> the public API?
>>
>>
>> >
>> > then queue created for that pktio can send/recv packets from outside.
>> I think you can already do this with the current packet_io API. No
>> need to conflate this with IPC/message passing.
>>
>> >
>> > If I understand right, that functions use queue only for ipc messages:
>> > int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> > int odp_ipc_inq_remdef(odp_ipc_t ipc);
>> You want to be able to receive and dispatch IPC messages just like any
>> other type of events so incoming messages should be put on queues. The
>> queue could be specified when the IPC endpoint is created but I copied
>> the packet_io API where it was possible.
>>
>> > and they are not linked with outside pktio.  Earlier you said that
>> > propose
>> > to send some packets to some internal tap device.
>> No not packets, messages. And the reference to "tap" was in a general
>> sense, not related to Linux tap interfaces.
>>
>> > And in that case tap
>> > device can be pktio. And then this pktio-tap connected to ipc object to
>> > dispatch events from it.
>>
>> >
>> > And probably we need better names for our ipc implementation. Mine can
>> > be
>> > odp_pktio_ipc and your can be odp_ipcmsg_ because the propose is
>> > different.
>> Actually I don't understand why your packet pipes need to change the
>> ODP API. They are just pktio interfaces with different semantics. By
>> using different pktio device names (which anyway are platform
>> specific), you create packet pipes instead of opening physical network
>> interfaces. But they behave the same from the user perspective. Or?
>>
>
> Yes in my case is only pipes and it's just pktio with different name. No
> difference to application how to use this pktio.
>
>>
>> >
>> >
>> >
>> >
>> >>
>> >> +/**
>> >> + * Get printable value for an odp_ipc_t
>> >> + *
>> >> + * @param ipc  IPC handle to be printed
>> >> + * @return     uint64_t value that can be used to print/display this
>> >> + *             handle
>> >> + *
>> >> + * @note This routine is intended to be used for diagnostic purposes
>> >> + * to enable applications to generate a printable value that
>> >> represents
>> >> + * an odp_ipc_t handle.
>> >> + */
>> >> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>> >> +
>> >> +/**
>> >> + * Get printable value for an odp_ipc_msg_t
>> >> + *
>> >> + * @param msg  Message handle to be printed
>> >> + * @return     uint64_t value that can be used to print/display this
>> >> + *             handle
>> >> + *
>> >> + * @note This routine is intended to be used for diagnostic purposes
>> >> + * to enable applications to generate a printable value that
>> >> represents
>> >> + * an odp_ipc_msg_t handle.
>> >> + */
>> >> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * @}
>> >> + */
>> >> +
>> >> +#ifdef __cplusplus
>> >> +}
>> >> +#endif
>> >> +
>> >> +#endif
>> >> --
>> >> 1.9.1
>> >
>> >
>> >
>> > After writing replay I came with idea why not build one pktio based on
>> > other
>> > pktio?
>> > In that case eth0_pktio = ocp_create_pktio("eth0") can be used for
>> > external
>> > packets exchange and
>> > ipc_eth0_pktio = odp_create_pktio(eth0_pktio, ..).
>> > So read from eth0_pktio will get only outside packets and read from
>> > ipc_eth0_pktio can deliver only
>> > ipc packets.
>> There are no IPC packets, just IPC messages.
>>
>> > In that case you do not need to implement transport layer and
>> > just do functionality above
>> > on current pktio.
>> IPC/message passing has different semantics (e.g. reliable/in-order)
>> from network interfaces.
>> IPC/message passing is for communication between local entities (e.g.
>> applications on the same host), not for communicating with external
>> entities over some physical network.
>> So I see several serious incompatibilities and I see no reason to
>> build IPC/message passing on top of some physical network interface
>> and its representation in ODP.
>>
>
> Ok, I understand that it's exchange with some messages on one machine. I
> just not sure why conrtol plane and data plane be different host. Like ODL
> where  OVS on one machine and controller on other.
The OpenFlow controller can be anywhere on the network (typically
*not* on the same machine as the OpenFlow dataplane) so needs to use a
proper transport protocol that works over the Internet. This is a
different use case from the applications on a host which together
implement some network function (e.g. eNodeB).

>
> As I understand you don't what not transfer any odp events. Just ipc
> messages. Btw what is inside that messages? How that message looks like?
The messages exchanged between control and data plane applications are
requests and replies for changing  configuration, adding/removing
routes, setting up/tearing down bearers and tunnels, IPsec/IKE updates
etc. Also for viewing the current state of applications (e.g. read out
statistics, alarms).

So messages are normally quite small, maybe up to 100-200 bytes and
typically well structured. Messages could be expressed as C structs.
Each message will typically have a type which identifies the purpose
and structure of the message. I left this out of the IPC API as I
don't think it is strictly needed, it could just be a convention,
store the type first in the message. The IPC mechanism itself does not
use the type.

>
> Maxim.
>
>
>
>>
>> >
>> > Thanks,
>> > Maxim.
>> >
>> >
>> >
>> >
>> >
>> >>
>> >>
>> >> _______________________________________________
>> >> lng-odp mailing list
>> >> lng-odp@lists.linaro.org
>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>> >
>> >
>
>

Ola Liljedahl May 20, 2015, 12:33 p.m. UTC | #9

On 20 May 2015 at 13:29, Benoît Ganne <bganne@kalray.eu> wrote:
> Hi Ola,
>
> thanks for your feedback. I think part of the problem is that today we have
> 2 things dubbed "IPC" trying to solve different problems (I think): yours vs
> Maxim and me. Maybe we should clarify that.
Yes. Maybe all of us should stay away from the phrase IPC. And message
passing (which is what I am looking for) is just one type of IPC
(useful for shared nothing and distributed systems), in the past many
did IPC using shared memory and mutexes and semaphores. The horror.


> From my understanding, what Maxim is proposing is closer to what I was
> trying to achieve. The main differences of my proposal vs Maxim proposal
> was:
>  - use a more "POSIX namespace" approach for naming resources (eg.
> "/ipc/..." vs "ipc_..."
I see the names of pktio interfaces as platform specific so each
platform can use whatever syntax it wants.

>  - extend pktio to allow unidirectional communication to save HW resources
A slight tweak to the current packet_io API. Post a patch.

> I agree your need is different and need a different API. Maybe we could call
> it "message bus" or "odp bus" or something like that to disambiguate?
Yes. Need a nice three-letter-acronym as well...

>
> Other comments inline.
>
>>>   - use it for control/data planes signaling but also to exchange packets
>>> to
>>> be allowed to build packets processing pipelines
>>
>> Control/data plane signaling must be reliable (per the definition in
>> my RFC). But do you have the same requirements for packet transfer
>> between pipeline stages (on different cores)? Or it just happens to be
>> reliable on your hardware? Is reliability (guaranteed delivery) an
>> inherent requirement for pipelined packet processing? (this would be
>> contrary to my experience).
>
>
> Right, it won't be reliable, especially if the rx is too slow to consume
> messages, tx should get EAGAIN and need to retry later. But reliability can
> be build on top of that (buffering on tx side, spinning...).
Well if the sender gets a notice that send didn't succeed and the data
is still around (so can be resent later), I still see this as
reliable. Unreliably is when data goes missing without notification.

>
>> Also couldn't you use ODP queues as the abstraction for transferring
>> packets between cores/pipeline stages? Per your pktio proposal below,
>> if you connect input and output queues to your "ipc" pktio instances,
>> applications can just enqueue and dequeue packets from these queues
>> and that will cause packets to be transferred between pipeline stages
>> (on different cores/AS's).
>
>
> Sure. But as far as I understand, queues are not associated to pools by
> themselves. We still need a way to move data from one AS to another, and it
> means from one pool to another. I need a way to identify the rx pool.
Yes a queue by itself is not enough.

An egress queue leads to a (transmit-only) pktio interface which then
can magically transport packet to another (receive-only) pktio
interface in another AS. That receive pktio interface uses a pool in
the AS to allocate packet buffers from. Received buffers can be put on
an ingress queue. So the application code just sees the queues, only
the main logic needs handle the "packet pipe" pktio interfaces.

>
>> This use case does not want or need the ability to specify the
>> destination address for each individual send operation and this would
>> likely just add overhead to a performance critical operation.
>
>
> It needs to identify the destination pool, as we are moving from one AS to
> another, and pools are defined per-AS.
But you don't specify the destination pool on a per-packet basis. And
the producer (sender) of packets doesn't care which pool is used. Just
enqueue the packet on a queue.

>
>>>   - IPC ops should be mapped to our NoC HW as much as possible,
>>> especially
>>> for send/recv operations
>>
>> Is there anything in the proposed API that would limit your
>> implementation freedom?
>
>
> Not in the case of an application messaging bus as you propose. I just
> wanted to highlight the fact that we need a lower-level IPC mechanism where
> I can do unidirectional communications (to save HW resources) without any
> centralized logic.
>
>>>  From your proposal, it seems to me that what you proposed is more like
>>> an
>>> application messaging bus such as d-bus (especially the deferred lookup()
>>> and monitor()),
>>
>> Perhaps d-bus can be used as the implementation, that could save me
>> some work.
>
>
> I didn't mean to use D-Bus nor CORBA :)
> It was just to better understand what you were trying to achieve. That said,
> I think we can learn a few things from D-Bus in this case. For example, you
> did mention your need for delivering reliability, but what about message
> ordering and broadcast/multicast?
Messages are ordered on a per source/destination basis.
So far, I haven't seen any absolute need for broadcast or multicast. A
sender normally wants to know which endpoints it is communicating
with, perhaps it is expecting replies.
What is probably more useful is a way to subscribe for all new endpoints.

>
>>> But in that case you need to be able to identify the endpoints. What I
>>> proposed in precedent thread, was to use a hierarchical device naming:
>>>   - "/dev/<device>" for real devices
>>>   - "/ipc/<identifier>" for IPC
>>> What I had in mind so far for our platform was to used {AS identifier +
>>> pool
>>> name in the AS} as IPC identifier, which would avoid the need of a
>>> centralized rendez-vous point.
>>
>> The semantics of the IPC endpoint names are not defined in my
>> proposal. I think it is a mistake to impose meaning on the names.
>
>
> Agreed. I was just proposing a naming scheme for pktio in my case to ease
> readability.
>
>> Also I don't envision any (active) rendez-vous point, it's just a
>> global shared table in my implementation for linux-generic. This is
>> needed since names are user-defined and arbitrary.
>
>
> Hmm a global shared table looks like a rendez-vous point to me. You cannot
> implement that this way in a 0-sharing architecture.
The global shared table is an implementation detail. A true shared
nothing architecture will have to implement it in some other way.

> But anyway, I
> completely see the value of a messaging bus with discovery service, the
> implementation will use whatever suits it for this.
>
>> Why are these pktio instances of type IPC? They could just as well be
>> network interfaces on which you send and/or receive packets with the
>> normal semantics. No need for the IPC API I proposed. What stops you
>> from implementing this today?
>
>
> A specific type is still useful in my opinion:
>  - it eases readability
>  - real pktio may have specific characteristics missing for IPC pktio (eg.
> in our case, we have HW acceleration for packet classification/extraction
> for real pktio, but not for IPC pktio)
>
>> So we need to amend the API spec that it is OK to specify
>> ODP_POOL_INVALID for the pool parameter to the opd_pktio_open() call
>> and this will indicate that the pktio interface is write-only. Is
>> there anything similar we need to do for read-only interfaces? You
>> would like to have the indication of read-only/write-only/read-write
>> at the time of pktio_open. Perhaps we need a new parameter for
>> odp_pktio_open(), the pool parameter is not enough for this purpose.
>
>
> OK. What is the best way to make a proposal? A RFC patches serie to
> linux-generic proposing such implementation?
Or just vanilla patches. These are just minor tweaks to the API.

>
>> Your needs are real but I think reusing the ODP pktio concept is a
>> better solution for you, not trying to hijack the application
>> messaging API which is intended for a very different use case. IPC
>> seems to mean different things to different people so perhaps should
>> be avoided in order not to give people the wrong ideas.
>
>
> Agreed. We need to define better names. I personally like IPC pktio for my
> case and application message bus (odp_bus?) for your case but if you have
> better ideas, I would be happy to hear from you.
I have used "IPC" to mean application messaging network (it's more
than a simple bus, works for distributed systems as well) for twenty
years so yours and Maxim's use of IPC is confusing to me. I think none
of us should use "IPC". You both need pacḱet pipes, I need a messaging
network.


>
>
> ben
>
>>> On 05/19/2015 12:03 AM, Ola Liljedahl wrote:
>>>>
>>>>
>>>> As promised, here is my first attempt at a standalone API for IPC -
>>>> inter
>>>> process communication in a shared nothing architecture (message passing
>>>> between processes which do not share memory).
>>>>
>>>> Currently all definitions are in the file ipc.h but it is possible to
>>>> break out some message/event related definitions (everything from
>>>> odp_ipc_sender) in a separate file message.h. This would mimic the
>>>> packet_io.h/packet.h separation.
>>>>
>>>> The semantics of message passing is that sending a message to an
>>>> endpoint
>>>> will always look like it succeeds. The appearance of endpoints is
>>>> explicitly
>>>> notified through user-defined messages specified in the
>>>> odp_ipc_resolve()
>>>> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>> connection)
>>>> is also explicitly notified through user-defined messages specified in
>>>> the
>>>> odp_ipc_monitor() call. The send call does not fail because the
>>>> addressed
>>>> endpoints has disappeared.
>>>>
>>>> Messages (from endpoint A to endpoint B) are delivered in order. If
>>>> message
>>>> N sent to an endpoint is delivered, then all messages <N have also been
>>>> delivered. Message delivery does not guarantee actual processing by the
>>>> recipient. End-to-end acknowledgements (using messages) should be used
>>>> if
>>>> this guarantee is important to the user.
>>>>
>>>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>>>> multidrop network where each endpoint has a unique address which is only
>>>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>>>> and then recreated (with the same name), the new endpoint will have a
>>>> new address (eventually endpoints addresses will have to be recycled but
>>>> not for a very long time). Endpoints names do not necessarily have to be
>>>> unique.
>>>>
>>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> ---
>>>> (This document/code contribution attached is provided under the terms of
>>>> agreement LES-LTM-21309)
>>>>
>>>>    include/odp/api/ipc.h | 261
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 261 insertions(+)
>>>>    create mode 100644 include/odp/api/ipc.h
>>>>
>>>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>>>> new file mode 100644
>>>> index 0000000..3395a34
>>>> --- /dev/null
>>>> +++ b/include/odp/api/ipc.h
>>>> @@ -0,0 +1,261 @@
>>>> +/* Copyright (c) 2015, Linaro Limited
>>>> + * All rights reserved.
>>>> + *
>>>> + * SPDX-License-Identifier:     BSD-3-Clause
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @file
>>>> + *
>>>> + * ODP IPC API
>>>> + */
>>>> +
>>>> +#ifndef ODP_API_IPC_H_
>>>> +#define ODP_API_IPC_H_
>>>> +
>>>> +#ifdef __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +/** @defgroup odp_ipc ODP IPC
>>>> + *  @{
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_t
>>>> + * ODP IPC handle
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_msg_t
>>>> + * ODP IPC message handle
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @def ODP_IPC_ADDR_SIZE
>>>> + * Size of the address of an IPC endpoint
>>>> + */
>>>> +
>>>> +/**
>>>> + * Create IPC endpoint
>>>> + *
>>>> + * @param name Name of local IPC endpoint
>>>> + * @param pool Pool for incoming messages
>>>> + *
>>>> + * @return IPC handle on success
>>>> + * @retval ODP_IPC_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>> +
>>>> +/**
>>>> + * Destroy IPC endpoint
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_destroy(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Set the default input queue for an IPC endpoint
>>>> + *
>>>> + * @param ipc   IPC handle
>>>> + * @param queue Queue handle
>>>> + *
>>>> + * @retval  0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>> +
>>>> +/**
>>>> + * Remove the default input queue
>>>> + *
>>>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>>>> + * The queue itself is not touched.
>>>> + *
>>>> + * @param ipc  IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Resolve endpoint by name
>>>> + *
>>>> + * Look up an existing or future endpoint by name.
>>>> + * When the endpoint exists, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param name Name to resolve
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>> +                    const char *name,
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Monitor endpoint
>>>> + *
>>>> + * Monitor an existing (potentially already dead) endpoint.
>>>> + * When the endpoint is dead, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * Unrecognized or invalid endpoint addresses are treated as dead
>>>> endpoints.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param addr Address of monitored endpoint
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Send message
>>>> + *
>>>> + * Send a message to an endpoint (which may already be dead).
>>>> + * Message delivery is ordered and reliable. All (accepted) messages
>>>> will
>>>> be
>>>> + * delivered up to the point of endpoint death or lost connection.
>>>> + * Actual reception and processing is not guaranteed (use end-to-end
>>>> + * acknowledgements for that).
>>>> + * Monitor the remote endpoint to detect death or lost connection.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param msg Message to send
>>>> + * @param addr Address of remote endpoint
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_send(odp_ipc_t ipc,
>>>> +                odp_ipc_msg_t msg,
>>>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Get address of sender (source) of message
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param addr Address of sender endpoint
>>>> + */
>>>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Message data pointer
>>>> + *
>>>> + * Return a pointer to the message data
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Pointer to the message data
>>>> + */
>>>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Message data length
>>>> + *
>>>> + * Return length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Message length
>>>> + */
>>>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Set message length
>>>> + *
>>>> + * Set length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param len New length
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Allocate message
>>>> + *
>>>> + * Allocate a message of a specific size.
>>>> + *
>>>> + * @param pool Message pool to allocate message from
>>>> + * @param len Length of the allocated message
>>>> + *
>>>> + * @return IPC message handle on success
>>>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Free message
>>>> + *
>>>> + * Free message back to the message pool it was allocated from.
>>>> + *
>>>> + * @param msg Handle of message to free
>>>> + */
>>>> +void odp_ipc_free(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get message handle from event
>>>> + *
>>>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>>>> + *
>>>> + * @param ev   Event handle
>>>> + *
>>>> + * @return Message handle
>>>> + *
>>>> + * @see odp_event_type()
>>>> + */
>>>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>>>> +
>>>> +/**
>>>> + * Convert message handle to event
>>>> + *
>>>> + * @param msg  Message handle
>>>> + *
>>>> + * @return Event handle
>>>> + */
>>>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_t
>>>> + *
>>>> + * @param ipc  IPC handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_msg_t
>>>> + *
>>>> + * @param msg  Message handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_msg_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * @}
>>>> + */
>>>> +
>>>> +#ifdef __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +#endif
>>>>
>>>
>>>
>>> --
>>> Benoît GANNE
>>> Field Application Engineer, Kalray
>>> +33 (0)648 125 843
>
>
>
> --
> Benoît GANNE
> Field Application Engineer, Kalray
> +33 (0)648 125 843

Maxim Uvarov May 20, 2015, 1:29 p.m. UTC | #10

On 05/20/2015 15:33, Ola Liljedahl wrote:
> On 20 May 2015 at 13:29, Benoît Ganne <bganne@kalray.eu> wrote:
>> Hi Ola,
>>
>> thanks for your feedback. I think part of the problem is that today we have
>> 2 things dubbed "IPC" trying to solve different problems (I think): yours vs
>> Maxim and me. Maybe we should clarify that.
> Yes. Maybe all of us should stay away from the phrase IPC. And message
> passing (which is what I am looking for) is just one type of IPC
> (useful for shared nothing and distributed systems), in the past many
> did IPC using shared memory and mutexes and semaphores. The horror.
>
yes, IPC is very wide term.
>>  From my understanding, what Maxim is proposing is closer to what I was
>> trying to achieve. The main differences of my proposal vs Maxim proposal
>> was:
>>   - use a more "POSIX namespace" approach for naming resources (eg.
>> "/ipc/..." vs "ipc_..."
> I see the names of pktio interfaces as platform specific so each
> platform can use whatever syntax it wants.
yes, it's platform specific. In your case you can do that names. Or send 
patch when my version will
be merged to have same names in linux-generic.
>>   - extend pktio to allow unidirectional communication to save HW resources
> A slight tweak to the current packet_io API. Post a patch.

v4 was unidirectional then I broke my head how to make it bidirectional. 
If we need single direction pktio than you can just not implement recv 
or send in you specific packet i/o and dont' call corresponding 
functions from application. Maybe even flags are not needed if it's 
single or bi directional.

Maxim.


>> I agree your need is different and need a different API. Maybe we could call
>> it "message bus" or "odp bus" or something like that to disambiguate?
> Yes. Need a nice three-letter-acronym as well...
>
>> Other comments inline.
>>
>>>>    - use it for control/data planes signaling but also to exchange packets
>>>> to
>>>> be allowed to build packets processing pipelines
>>> Control/data plane signaling must be reliable (per the definition in
>>> my RFC). But do you have the same requirements for packet transfer
>>> between pipeline stages (on different cores)? Or it just happens to be
>>> reliable on your hardware? Is reliability (guaranteed delivery) an
>>> inherent requirement for pipelined packet processing? (this would be
>>> contrary to my experience).
>>
>> Right, it won't be reliable, especially if the rx is too slow to consume
>> messages, tx should get EAGAIN and need to retry later. But reliability can
>> be build on top of that (buffering on tx side, spinning...).
> Well if the sender gets a notice that send didn't succeed and the data
> is still around (so can be resent later), I still see this as
> reliable. Unreliably is when data goes missing without notification.
>
>>> Also couldn't you use ODP queues as the abstraction for transferring
>>> packets between cores/pipeline stages? Per your pktio proposal below,
>>> if you connect input and output queues to your "ipc" pktio instances,
>>> applications can just enqueue and dequeue packets from these queues
>>> and that will cause packets to be transferred between pipeline stages
>>> (on different cores/AS's).
>>
>> Sure. But as far as I understand, queues are not associated to pools by
>> themselves. We still need a way to move data from one AS to another, and it
>> means from one pool to another. I need a way to identify the rx pool.
> Yes a queue by itself is not enough.
>
> An egress queue leads to a (transmit-only) pktio interface which then
> can magically transport packet to another (receive-only) pktio
> interface in another AS. That receive pktio interface uses a pool in
> the AS to allocate packet buffers from. Received buffers can be put on
> an ingress queue. So the application code just sees the queues, only
> the main logic needs handle the "packet pipe" pktio interfaces.
>
>>> This use case does not want or need the ability to specify the
>>> destination address for each individual send operation and this would
>>> likely just add overhead to a performance critical operation.
>>
>> It needs to identify the destination pool, as we are moving from one AS to
>> another, and pools are defined per-AS.
> But you don't specify the destination pool on a per-packet basis. And
> the producer (sender) of packets doesn't care which pool is used. Just
> enqueue the packet on a queue.
>
>>>>    - IPC ops should be mapped to our NoC HW as much as possible,
>>>> especially
>>>> for send/recv operations
>>> Is there anything in the proposed API that would limit your
>>> implementation freedom?
>>
>> Not in the case of an application messaging bus as you propose. I just
>> wanted to highlight the fact that we need a lower-level IPC mechanism where
>> I can do unidirectional communications (to save HW resources) without any
>> centralized logic.
>>
>>>>   From your proposal, it seems to me that what you proposed is more like
>>>> an
>>>> application messaging bus such as d-bus (especially the deferred lookup()
>>>> and monitor()),
>>> Perhaps d-bus can be used as the implementation, that could save me
>>> some work.
>>
>> I didn't mean to use D-Bus nor CORBA :)
>> It was just to better understand what you were trying to achieve. That said,
>> I think we can learn a few things from D-Bus in this case. For example, you
>> did mention your need for delivering reliability, but what about message
>> ordering and broadcast/multicast?
> Messages are ordered on a per source/destination basis.
> So far, I haven't seen any absolute need for broadcast or multicast. A
> sender normally wants to know which endpoints it is communicating
> with, perhaps it is expecting replies.
> What is probably more useful is a way to subscribe for all new endpoints.
>
>>>> But in that case you need to be able to identify the endpoints. What I
>>>> proposed in precedent thread, was to use a hierarchical device naming:
>>>>    - "/dev/<device>" for real devices
>>>>    - "/ipc/<identifier>" for IPC
>>>> What I had in mind so far for our platform was to used {AS identifier +
>>>> pool
>>>> name in the AS} as IPC identifier, which would avoid the need of a
>>>> centralized rendez-vous point.
>>> The semantics of the IPC endpoint names are not defined in my
>>> proposal. I think it is a mistake to impose meaning on the names.
>>
>> Agreed. I was just proposing a naming scheme for pktio in my case to ease
>> readability.
>>
>>> Also I don't envision any (active) rendez-vous point, it's just a
>>> global shared table in my implementation for linux-generic. This is
>>> needed since names are user-defined and arbitrary.
>>
>> Hmm a global shared table looks like a rendez-vous point to me. You cannot
>> implement that this way in a 0-sharing architecture.
> The global shared table is an implementation detail. A true shared
> nothing architecture will have to implement it in some other way.
>
>> But anyway, I
>> completely see the value of a messaging bus with discovery service, the
>> implementation will use whatever suits it for this.
>>
>>> Why are these pktio instances of type IPC? They could just as well be
>>> network interfaces on which you send and/or receive packets with the
>>> normal semantics. No need for the IPC API I proposed. What stops you
>>> from implementing this today?
>>
>> A specific type is still useful in my opinion:
>>   - it eases readability
>>   - real pktio may have specific characteristics missing for IPC pktio (eg.
>> in our case, we have HW acceleration for packet classification/extraction
>> for real pktio, but not for IPC pktio)
>>
>>> So we need to amend the API spec that it is OK to specify
>>> ODP_POOL_INVALID for the pool parameter to the opd_pktio_open() call
>>> and this will indicate that the pktio interface is write-only. Is
>>> there anything similar we need to do for read-only interfaces? You
>>> would like to have the indication of read-only/write-only/read-write
>>> at the time of pktio_open. Perhaps we need a new parameter for
>>> odp_pktio_open(), the pool parameter is not enough for this purpose.
>>
>> OK. What is the best way to make a proposal? A RFC patches serie to
>> linux-generic proposing such implementation?
> Or just vanilla patches. These are just minor tweaks to the API.
>
>>> Your needs are real but I think reusing the ODP pktio concept is a
>>> better solution for you, not trying to hijack the application
>>> messaging API which is intended for a very different use case. IPC
>>> seems to mean different things to different people so perhaps should
>>> be avoided in order not to give people the wrong ideas.
>>
>> Agreed. We need to define better names. I personally like IPC pktio for my
>> case and application message bus (odp_bus?) for your case but if you have
>> better ideas, I would be happy to hear from you.
> I have used "IPC" to mean application messaging network (it's more
> than a simple bus, works for distributed systems as well) for twenty
> years so yours and Maxim's use of IPC is confusing to me. I think none
> of us should use "IPC". You both need pacḱet pipes, I need a messaging
> network.
>
>
>>
>> ben
>>
>>>> On 05/19/2015 12:03 AM, Ola Liljedahl wrote:
>>>>>
>>>>> As promised, here is my first attempt at a standalone API for IPC -
>>>>> inter
>>>>> process communication in a shared nothing architecture (message passing
>>>>> between processes which do not share memory).
>>>>>
>>>>> Currently all definitions are in the file ipc.h but it is possible to
>>>>> break out some message/event related definitions (everything from
>>>>> odp_ipc_sender) in a separate file message.h. This would mimic the
>>>>> packet_io.h/packet.h separation.
>>>>>
>>>>> The semantics of message passing is that sending a message to an
>>>>> endpoint
>>>>> will always look like it succeeds. The appearance of endpoints is
>>>>> explicitly
>>>>> notified through user-defined messages specified in the
>>>>> odp_ipc_resolve()
>>>>> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>>> connection)
>>>>> is also explicitly notified through user-defined messages specified in
>>>>> the
>>>>> odp_ipc_monitor() call. The send call does not fail because the
>>>>> addressed
>>>>> endpoints has disappeared.
>>>>>
>>>>> Messages (from endpoint A to endpoint B) are delivered in order. If
>>>>> message
>>>>> N sent to an endpoint is delivered, then all messages <N have also been
>>>>> delivered. Message delivery does not guarantee actual processing by the
>>>>> recipient. End-to-end acknowledgements (using messages) should be used
>>>>> if
>>>>> this guarantee is important to the user.
>>>>>
>>>>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>>>>> multidrop network where each endpoint has a unique address which is only
>>>>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>>>>> and then recreated (with the same name), the new endpoint will have a
>>>>> new address (eventually endpoints addresses will have to be recycled but
>>>>> not for a very long time). Endpoints names do not necessarily have to be
>>>>> unique.
>>>>>
>>>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> ---
>>>>> (This document/code contribution attached is provided under the terms of
>>>>> agreement LES-LTM-21309)
>>>>>
>>>>>     include/odp/api/ipc.h | 261
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>     1 file changed, 261 insertions(+)
>>>>>     create mode 100644 include/odp/api/ipc.h
>>>>>
>>>>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>>>>> new file mode 100644
>>>>> index 0000000..3395a34
>>>>> --- /dev/null
>>>>> +++ b/include/odp/api/ipc.h
>>>>> @@ -0,0 +1,261 @@
>>>>> +/* Copyright (c) 2015, Linaro Limited
>>>>> + * All rights reserved.
>>>>> + *
>>>>> + * SPDX-License-Identifier:     BSD-3-Clause
>>>>> + */
>>>>> +
>>>>> +
>>>>> +/**
>>>>> + * @file
>>>>> + *
>>>>> + * ODP IPC API
>>>>> + */
>>>>> +
>>>>> +#ifndef ODP_API_IPC_H_
>>>>> +#define ODP_API_IPC_H_
>>>>> +
>>>>> +#ifdef __cplusplus
>>>>> +extern "C" {
>>>>> +#endif
>>>>> +
>>>>> +/** @defgroup odp_ipc ODP IPC
>>>>> + *  @{
>>>>> + */
>>>>> +
>>>>> +/**
>>>>> + * @typedef odp_ipc_t
>>>>> + * ODP IPC handle
>>>>> + */
>>>>> +
>>>>> +/**
>>>>> + * @typedef odp_ipc_msg_t
>>>>> + * ODP IPC message handle
>>>>> + */
>>>>> +
>>>>> +
>>>>> +/**
>>>>> + * @def ODP_IPC_ADDR_SIZE
>>>>> + * Size of the address of an IPC endpoint
>>>>> + */
>>>>> +
>>>>> +/**
>>>>> + * Create IPC endpoint
>>>>> + *
>>>>> + * @param name Name of local IPC endpoint
>>>>> + * @param pool Pool for incoming messages
>>>>> + *
>>>>> + * @return IPC handle on success
>>>>> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>> + */
>>>>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>> +
>>>>> +/**
>>>>> + * Destroy IPC endpoint
>>>>> + *
>>>>> + * @param ipc IPC handle
>>>>> + *
>>>>> + * @retval 0 on success
>>>>> + * @retval <0 on failure
>>>>> + */
>>>>> +int odp_ipc_destroy(odp_ipc_t ipc);
>>>>> +
>>>>> +/**
>>>>> + * Set the default input queue for an IPC endpoint
>>>>> + *
>>>>> + * @param ipc   IPC handle
>>>>> + * @param queue Queue handle
>>>>> + *
>>>>> + * @retval  0 on success
>>>>> + * @retval <0 on failure
>>>>> + */
>>>>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>> +
>>>>> +/**
>>>>> + * Remove the default input queue
>>>>> + *
>>>>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>>>>> + * The queue itself is not touched.
>>>>> + *
>>>>> + * @param ipc  IPC handle
>>>>> + *
>>>>> + * @retval 0 on success
>>>>> + * @retval <0 on failure
>>>>> + */
>>>>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>>>>> +
>>>>> +/**
>>>>> + * Resolve endpoint by name
>>>>> + *
>>>>> + * Look up an existing or future endpoint by name.
>>>>> + * When the endpoint exists, return the specified message with the
>>>>> endpoint
>>>>> + * as the sender.
>>>>> + *
>>>>> + * @param ipc IPC handle
>>>>> + * @param name Name to resolve
>>>>> + * @param msg Message to return
>>>>> + */
>>>>> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>> +                    const char *name,
>>>>> +                    odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Monitor endpoint
>>>>> + *
>>>>> + * Monitor an existing (potentially already dead) endpoint.
>>>>> + * When the endpoint is dead, return the specified message with the
>>>>> endpoint
>>>>> + * as the sender.
>>>>> + *
>>>>> + * Unrecognized or invalid endpoint addresses are treated as dead
>>>>> endpoints.
>>>>> + *
>>>>> + * @param ipc IPC handle
>>>>> + * @param addr Address of monitored endpoint
>>>>> + * @param msg Message to return
>>>>> + */
>>>>> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>> +                    odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Send message
>>>>> + *
>>>>> + * Send a message to an endpoint (which may already be dead).
>>>>> + * Message delivery is ordered and reliable. All (accepted) messages
>>>>> will
>>>>> be
>>>>> + * delivered up to the point of endpoint death or lost connection.
>>>>> + * Actual reception and processing is not guaranteed (use end-to-end
>>>>> + * acknowledgements for that).
>>>>> + * Monitor the remote endpoint to detect death or lost connection.
>>>>> + *
>>>>> + * @param ipc IPC handle
>>>>> + * @param msg Message to send
>>>>> + * @param addr Address of remote endpoint
>>>>> + *
>>>>> + * @retval 0 on success
>>>>> + * @retval <0 on error
>>>>> + */
>>>>> +int odp_ipc_send(odp_ipc_t ipc,
>>>>> +                odp_ipc_msg_t msg,
>>>>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> +
>>>>> +/**
>>>>> + * Get address of sender (source) of message
>>>>> + *
>>>>> + * @param msg Message handle
>>>>> + * @param addr Address of sender endpoint
>>>>> + */
>>>>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> +
>>>>> +/**
>>>>> + * Message data pointer
>>>>> + *
>>>>> + * Return a pointer to the message data
>>>>> + *
>>>>> + * @param msg Message handle
>>>>> + *
>>>>> + * @return Pointer to the message data
>>>>> + */
>>>>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Message data length
>>>>> + *
>>>>> + * Return length of the message data.
>>>>> + *
>>>>> + * @param msg Message handle
>>>>> + *
>>>>> + * @return Message length
>>>>> + */
>>>>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Set message length
>>>>> + *
>>>>> + * Set length of the message data.
>>>>> + *
>>>>> + * @param msg Message handle
>>>>> + * @param len New length
>>>>> + *
>>>>> + * @retval 0 on success
>>>>> + * @retval <0 on error
>>>>> + */
>>>>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>> +
>>>>> +/**
>>>>> + * Allocate message
>>>>> + *
>>>>> + * Allocate a message of a specific size.
>>>>> + *
>>>>> + * @param pool Message pool to allocate message from
>>>>> + * @param len Length of the allocated message
>>>>> + *
>>>>> + * @return IPC message handle on success
>>>>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>>>>> + */
>>>>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>>>>> +
>>>>> +/**
>>>>> + * Free message
>>>>> + *
>>>>> + * Free message back to the message pool it was allocated from.
>>>>> + *
>>>>> + * @param msg Handle of message to free
>>>>> + */
>>>>> +void odp_ipc_free(odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Get message handle from event
>>>>> + *
>>>>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>>>>> + *
>>>>> + * @param ev   Event handle
>>>>> + *
>>>>> + * @return Message handle
>>>>> + *
>>>>> + * @see odp_event_type()
>>>>> + */
>>>>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>>>>> +
>>>>> +/**
>>>>> + * Convert message handle to event
>>>>> + *
>>>>> + * @param msg  Message handle
>>>>> + *
>>>>> + * @return Event handle
>>>>> + */
>>>>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * Get printable value for an odp_ipc_t
>>>>> + *
>>>>> + * @param ipc  IPC handle to be printed
>>>>> + * @return     uint64_t value that can be used to print/display this
>>>>> + *             handle
>>>>> + *
>>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>>> + * to enable applications to generate a printable value that represents
>>>>> + * an odp_ipc_t handle.
>>>>> + */
>>>>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>>>>> +
>>>>> +/**
>>>>> + * Get printable value for an odp_ipc_msg_t
>>>>> + *
>>>>> + * @param msg  Message handle to be printed
>>>>> + * @return     uint64_t value that can be used to print/display this
>>>>> + *             handle
>>>>> + *
>>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>>> + * to enable applications to generate a printable value that represents
>>>>> + * an odp_ipc_msg_t handle.
>>>>> + */
>>>>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>>>>> +
>>>>> +/**
>>>>> + * @}
>>>>> + */
>>>>> +
>>>>> +#ifdef __cplusplus
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> +#endif
>>>>>
>>>>
>>>> --
>>>> Benoît GANNE
>>>> Field Application Engineer, Kalray
>>>> +33 (0)648 125 843
>>
>>
>> --
>> Benoît GANNE
>> Field Application Engineer, Kalray
>> +33 (0)648 125 843
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp

Ola Liljedahl May 20, 2015, 5:15 p.m. UTC | #11

On 20 May 2015 at 13:29, Benoît Ganne <bganne@kalray.eu> wrote:
> Hi Ola,
>
> thanks for your feedback. I think part of the problem is that today we have
> 2 things dubbed "IPC" trying to solve different problems (I think): yours vs
> Maxim and me. Maybe we should clarify that.
> From my understanding, what Maxim is proposing is closer to what I was
> trying to achieve. The main differences of my proposal vs Maxim proposal
> was:
>  - use a more "POSIX namespace" approach for naming resources (eg.
> "/ipc/..." vs "ipc_..."
>  - extend pktio to allow unidirectional communication to save HW resources
> I agree your need is different and need a different API. Maybe we could call
> it "message bus" or "odp bus" or something like that to disambiguate?
>
> Other comments inline.
>
>>>   - use it for control/data planes signaling but also to exchange packets
>>> to
>>> be allowed to build packets processing pipelines
>>
>> Control/data plane signaling must be reliable (per the definition in
>> my RFC). But do you have the same requirements for packet transfer
>> between pipeline stages (on different cores)? Or it just happens to be
>> reliable on your hardware? Is reliability (guaranteed delivery) an
>> inherent requirement for pipelined packet processing? (this would be
>> contrary to my experience).
>
>
> Right, it won't be reliable, especially if the rx is too slow to consume
> messages, tx should get EAGAIN and need to retry later. But reliability can
> be build on top of that (buffering on tx side, spinning...).
>
>> Also couldn't you use ODP queues as the abstraction for transferring
>> packets between cores/pipeline stages? Per your pktio proposal below,
>> if you connect input and output queues to your "ipc" pktio instances,
>> applications can just enqueue and dequeue packets from these queues
>> and that will cause packets to be transferred between pipeline stages
>> (on different cores/AS's).
>
>
> Sure. But as far as I understand, queues are not associated to pools by
> themselves. We still need a way to move data from one AS to another, and it
> means from one pool to another. I need a way to identify the rx pool.
>
>> This use case does not want or need the ability to specify the
>> destination address for each individual send operation and this would
>> likely just add overhead to a performance critical operation.
>
>
> It needs to identify the destination pool, as we are moving from one AS to
> another, and pools are defined per-AS.
>
>>>   - IPC ops should be mapped to our NoC HW as much as possible,
>>> especially
>>> for send/recv operations
>>
>> Is there anything in the proposed API that would limit your
>> implementation freedom?
>
>
> Not in the case of an application messaging bus as you propose. I just
> wanted to highlight the fact that we need a lower-level IPC mechanism where
> I can do unidirectional communications (to save HW resources) without any
> centralized logic.
>
>>>  From your proposal, it seems to me that what you proposed is more like
>>> an
>>> application messaging bus such as d-bus (especially the deferred lookup()
>>> and monitor()),
>>
>> Perhaps d-bus can be used as the implementation, that could save me
>> some work.
>
>
> I didn't mean to use D-Bus nor CORBA :)
CORBA, ain't it dead yet?

Actually kdbus (D-bus re-implemented in the kernel) looks interesting:
https://lwn.net/Articles/580194/
All kernels need proper IPC/message passing support.

> It was just to better understand what you were trying to achieve. That said,
> I think we can learn a few things from D-Bus in this case. For example, you
> did mention your need for delivering reliability, but what about message
> ordering and broadcast/multicast?
>
>>> But in that case you need to be able to identify the endpoints. What I
>>> proposed in precedent thread, was to use a hierarchical device naming:
>>>   - "/dev/<device>" for real devices
>>>   - "/ipc/<identifier>" for IPC
>>> What I had in mind so far for our platform was to used {AS identifier +
>>> pool
>>> name in the AS} as IPC identifier, which would avoid the need of a
>>> centralized rendez-vous point.
>>
>> The semantics of the IPC endpoint names are not defined in my
>> proposal. I think it is a mistake to impose meaning on the names.
>
>
> Agreed. I was just proposing a naming scheme for pktio in my case to ease
> readability.
>
>> Also I don't envision any (active) rendez-vous point, it's just a
>> global shared table in my implementation for linux-generic. This is
>> needed since names are user-defined and arbitrary.
>
>
> Hmm a global shared table looks like a rendez-vous point to me. You cannot
> implement that this way in a 0-sharing architecture. But anyway, I
> completely see the value of a messaging bus with discovery service, the
> implementation will use whatever suits it for this.
>
>> Why are these pktio instances of type IPC? They could just as well be
>> network interfaces on which you send and/or receive packets with the
>> normal semantics. No need for the IPC API I proposed. What stops you
>> from implementing this today?
>
>
> A specific type is still useful in my opinion:
>  - it eases readability
>  - real pktio may have specific characteristics missing for IPC pktio (eg.
> in our case, we have HW acceleration for packet classification/extraction
> for real pktio, but not for IPC pktio)
>
>> So we need to amend the API spec that it is OK to specify
>> ODP_POOL_INVALID for the pool parameter to the opd_pktio_open() call
>> and this will indicate that the pktio interface is write-only. Is
>> there anything similar we need to do for read-only interfaces? You
>> would like to have the indication of read-only/write-only/read-write
>> at the time of pktio_open. Perhaps we need a new parameter for
>> odp_pktio_open(), the pool parameter is not enough for this purpose.
>
>
> OK. What is the best way to make a proposal? A RFC patches serie to
> linux-generic proposing such implementation?
>
>> Your needs are real but I think reusing the ODP pktio concept is a
>> better solution for you, not trying to hijack the application
>> messaging API which is intended for a very different use case. IPC
>> seems to mean different things to different people so perhaps should
>> be avoided in order not to give people the wrong ideas.
>
>
> Agreed. We need to define better names. I personally like IPC pktio for my
> case and application message bus (odp_bus?) for your case but if you have
> better ideas, I would be happy to hear from you.
>
>
> ben
>
>>> On 05/19/2015 12:03 AM, Ola Liljedahl wrote:
>>>>
>>>>
>>>> As promised, here is my first attempt at a standalone API for IPC -
>>>> inter
>>>> process communication in a shared nothing architecture (message passing
>>>> between processes which do not share memory).
>>>>
>>>> Currently all definitions are in the file ipc.h but it is possible to
>>>> break out some message/event related definitions (everything from
>>>> odp_ipc_sender) in a separate file message.h. This would mimic the
>>>> packet_io.h/packet.h separation.
>>>>
>>>> The semantics of message passing is that sending a message to an
>>>> endpoint
>>>> will always look like it succeeds. The appearance of endpoints is
>>>> explicitly
>>>> notified through user-defined messages specified in the
>>>> odp_ipc_resolve()
>>>> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>> connection)
>>>> is also explicitly notified through user-defined messages specified in
>>>> the
>>>> odp_ipc_monitor() call. The send call does not fail because the
>>>> addressed
>>>> endpoints has disappeared.
>>>>
>>>> Messages (from endpoint A to endpoint B) are delivered in order. If
>>>> message
>>>> N sent to an endpoint is delivered, then all messages <N have also been
>>>> delivered. Message delivery does not guarantee actual processing by the
>>>> recipient. End-to-end acknowledgements (using messages) should be used
>>>> if
>>>> this guarantee is important to the user.
>>>>
>>>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>>>> multidrop network where each endpoint has a unique address which is only
>>>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>>>> and then recreated (with the same name), the new endpoint will have a
>>>> new address (eventually endpoints addresses will have to be recycled but
>>>> not for a very long time). Endpoints names do not necessarily have to be
>>>> unique.
>>>>
>>>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> ---
>>>> (This document/code contribution attached is provided under the terms of
>>>> agreement LES-LTM-21309)
>>>>
>>>>    include/odp/api/ipc.h | 261
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 261 insertions(+)
>>>>    create mode 100644 include/odp/api/ipc.h
>>>>
>>>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>>>> new file mode 100644
>>>> index 0000000..3395a34
>>>> --- /dev/null
>>>> +++ b/include/odp/api/ipc.h
>>>> @@ -0,0 +1,261 @@
>>>> +/* Copyright (c) 2015, Linaro Limited
>>>> + * All rights reserved.
>>>> + *
>>>> + * SPDX-License-Identifier:     BSD-3-Clause
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @file
>>>> + *
>>>> + * ODP IPC API
>>>> + */
>>>> +
>>>> +#ifndef ODP_API_IPC_H_
>>>> +#define ODP_API_IPC_H_
>>>> +
>>>> +#ifdef __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +/** @defgroup odp_ipc ODP IPC
>>>> + *  @{
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_t
>>>> + * ODP IPC handle
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_msg_t
>>>> + * ODP IPC message handle
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @def ODP_IPC_ADDR_SIZE
>>>> + * Size of the address of an IPC endpoint
>>>> + */
>>>> +
>>>> +/**
>>>> + * Create IPC endpoint
>>>> + *
>>>> + * @param name Name of local IPC endpoint
>>>> + * @param pool Pool for incoming messages
>>>> + *
>>>> + * @return IPC handle on success
>>>> + * @retval ODP_IPC_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>> +
>>>> +/**
>>>> + * Destroy IPC endpoint
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_destroy(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Set the default input queue for an IPC endpoint
>>>> + *
>>>> + * @param ipc   IPC handle
>>>> + * @param queue Queue handle
>>>> + *
>>>> + * @retval  0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>> +
>>>> +/**
>>>> + * Remove the default input queue
>>>> + *
>>>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>>>> + * The queue itself is not touched.
>>>> + *
>>>> + * @param ipc  IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Resolve endpoint by name
>>>> + *
>>>> + * Look up an existing or future endpoint by name.
>>>> + * When the endpoint exists, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param name Name to resolve
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>> +                    const char *name,
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Monitor endpoint
>>>> + *
>>>> + * Monitor an existing (potentially already dead) endpoint.
>>>> + * When the endpoint is dead, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * Unrecognized or invalid endpoint addresses are treated as dead
>>>> endpoints.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param addr Address of monitored endpoint
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Send message
>>>> + *
>>>> + * Send a message to an endpoint (which may already be dead).
>>>> + * Message delivery is ordered and reliable. All (accepted) messages
>>>> will
>>>> be
>>>> + * delivered up to the point of endpoint death or lost connection.
>>>> + * Actual reception and processing is not guaranteed (use end-to-end
>>>> + * acknowledgements for that).
>>>> + * Monitor the remote endpoint to detect death or lost connection.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param msg Message to send
>>>> + * @param addr Address of remote endpoint
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_send(odp_ipc_t ipc,
>>>> +                odp_ipc_msg_t msg,
>>>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Get address of sender (source) of message
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param addr Address of sender endpoint
>>>> + */
>>>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Message data pointer
>>>> + *
>>>> + * Return a pointer to the message data
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Pointer to the message data
>>>> + */
>>>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Message data length
>>>> + *
>>>> + * Return length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Message length
>>>> + */
>>>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Set message length
>>>> + *
>>>> + * Set length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param len New length
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Allocate message
>>>> + *
>>>> + * Allocate a message of a specific size.
>>>> + *
>>>> + * @param pool Message pool to allocate message from
>>>> + * @param len Length of the allocated message
>>>> + *
>>>> + * @return IPC message handle on success
>>>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Free message
>>>> + *
>>>> + * Free message back to the message pool it was allocated from.
>>>> + *
>>>> + * @param msg Handle of message to free
>>>> + */
>>>> +void odp_ipc_free(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get message handle from event
>>>> + *
>>>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>>>> + *
>>>> + * @param ev   Event handle
>>>> + *
>>>> + * @return Message handle
>>>> + *
>>>> + * @see odp_event_type()
>>>> + */
>>>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>>>> +
>>>> +/**
>>>> + * Convert message handle to event
>>>> + *
>>>> + * @param msg  Message handle
>>>> + *
>>>> + * @return Event handle
>>>> + */
>>>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_t
>>>> + *
>>>> + * @param ipc  IPC handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_msg_t
>>>> + *
>>>> + * @param msg  Message handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_msg_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * @}
>>>> + */
>>>> +
>>>> +#ifdef __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +#endif
>>>>
>>>
>>>
>>> --
>>> Benoît GANNE
>>> Field Application Engineer, Kalray
>>> +33 (0)648 125 843
>
>
>
> --
> Benoît GANNE
> Field Application Engineer, Kalray
> +33 (0)648 125 843

Ola Liljedahl May 21, 2015, 11:12 a.m. UTC | #12

On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
<petri.savolainen@nokia.com> wrote:
>
>
>> -----Original Message-----
>> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of ext
>> Ola Liljedahl
>> Sent: Tuesday, May 19, 2015 1:04 AM
>> To: lng-odp@lists.linaro.org
>> Subject: [lng-odp] [RFC] Add ipc.h
>>
>> As promised, here is my first attempt at a standalone API for IPC - inter
>> process communication in a shared nothing architecture (message passing
>> between processes which do not share memory).
>>
>> Currently all definitions are in the file ipc.h but it is possible to
>> break out some message/event related definitions (everything from
>> odp_ipc_sender) in a separate file message.h. This would mimic the
>> packet_io.h/packet.h separation.
>>
>> The semantics of message passing is that sending a message to an endpoint
>> will always look like it succeeds. The appearance of endpoints is
>> explicitly
>> notified through user-defined messages specified in the odp_ipc_resolve()
>> call. Similarly, the disappearance (e.g. death or otherwise lost
>> connection)
>> is also explicitly notified through user-defined messages specified in the
>> odp_ipc_monitor() call. The send call does not fail because the addressed
>> endpoints has disappeared.
>>
>> Messages (from endpoint A to endpoint B) are delivered in order. If
>> message
>> N sent to an endpoint is delivered, then all messages <N have also been
>> delivered. Message delivery does not guarantee actual processing by the
>
> Ordered is OK requirement, but "all messages <N have also been delivered" means in practice loss less delivery (== re-tries and retransmission windows, etc). Lossy vs loss less link should be an configuration option.
I am just targeting internal communication which I expect to be
reliable. There is not any physical "link" involved. If an
implementation chooses to use some unreliable media, then it will need
to take some counter measures. Any loss of message could be detected
using sequence numbers (and timeouts) and handled by (temporary)
disconnection (so that no more messages will be delivered should one
go missing).

I am OK with adding the lossless/lossy configuration to the API as
long as lossless option is always implemented. Is this a configuration
when creating the local  IPC endpoint or when sending a message to
another endpoint?

>
> Also what "delivered" means?'
>
> Message:
>  - transmitted successfully over the link ?
>  - is now under control of the remote node (post office) ?
>  - delivered into application input queue ?
Probably this one but I am not sure the exact definition matters, "has
been delivered" or "will eventually be delivered unless connection to
the destination is lost". Maybe there is a better word than
"delivered?

"Made available into the destination (recipient) address space"?

>  - has been dequeued from application queue ?
>
>
>> recipient. End-to-end acknowledgements (using messages) should be used if
>> this guarantee is important to the user.
>>
>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> multidrop network where each endpoint has a unique address which is only
>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>> and then recreated (with the same name), the new endpoint will have a
>> new address (eventually endpoints addresses will have to be recycled but
>> not for a very long time). Endpoints names do not necessarily have to be
>> unique.
>
> How widely these addresses are unique: inside one VM, multiple VMs under the same host, multiple devices on a LAN (VLAN), ...
Currently, the scope of the name and address space is defined by the
implementation. Perhaps we should define it? My current interest is
within an OS instance (bare metal or virtualised). Between different
OS instances, I expect something based on IP to be used (because you
don't know where those different OS/VM instances will be deployed so
you need topology-independent addressing).

Based on other feedback, I have dropped the contented usage of "IPC"
and now call it "message bus" (MBUS).

"MBUS endpoints can be seen as interfaces (taps) to an OS-internal
reliable multidrop network"...

>
>
>>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> ---
>> (This document/code contribution attached is provided under the terms of
>> agreement LES-LTM-21309)
>>
>
>
>> +/**
>> + * Create IPC endpoint
>> + *
>> + * @param name Name of local IPC endpoint
>> + * @param pool Pool for incoming messages
>> + *
>> + * @return IPC handle on success
>> + * @retval ODP_IPC_INVALID on failure and errno set
>> + */
>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>
> This creates (implicitly) the local end point address.
>
>
>> +
>> +/**
>> + * Set the default input queue for an IPC endpoint
>> + *
>> + * @param ipc   IPC handle
>> + * @param queue Queue handle
>> + *
>> + * @retval  0 on success
>> + * @retval <0 on failure
>> + */
>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>
> Multiple input queues are likely needed for different priority messages.
>
>> +
>> +/**
>> + * Resolve endpoint by name
>> + *
>> + * Look up an existing or future endpoint by name.
>> + * When the endpoint exists, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * @param ipc IPC handle
>> + * @param name Name to resolve
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_resolve(odp_ipc_t ipc,
>> +                  const char *name,
>> +                  odp_ipc_msg_t msg);
>
> How widely these names are visible? Inside one VM, multiple VMs under the same host, multiple devices on a LAN (VLAN), ...
>
> I think name service (or address resolution) are better handled in middleware layer. If ODP provides unique addresses and message passing mechanism, additional services can be built on top.
>
>
>> +
>> +/**
>> + * Monitor endpoint
>> + *
>> + * Monitor an existing (potentially already dead) endpoint.
>> + * When the endpoint is dead, return the specified message with the
>> endpoint
>> + * as the sender.
>> + *
>> + * Unrecognized or invalid endpoint addresses are treated as dead
>> endpoints.
>> + *
>> + * @param ipc IPC handle
>> + * @param addr Address of monitored endpoint
>> + * @param msg Message to return
>> + */
>> +void odp_ipc_monitor(odp_ipc_t ipc,
>> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> +                  odp_ipc_msg_t msg);
>
> Again, I'd see node health monitoring and alarms as middleware services.
>
>> +
>> +/**
>> + * Send message
>> + *
>> + * Send a message to an endpoint (which may already be dead).
>> + * Message delivery is ordered and reliable. All (accepted) messages will
>> be
>> + * delivered up to the point of endpoint death or lost connection.
>> + * Actual reception and processing is not guaranteed (use end-to-end
>> + * acknowledgements for that).
>> + * Monitor the remote endpoint to detect death or lost connection.
>> + *
>> + * @param ipc IPC handle
>> + * @param msg Message to send
>> + * @param addr Address of remote endpoint
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_send(odp_ipc_t ipc,
>> +              odp_ipc_msg_t msg,
>> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>
> This would be used to send a message to an address, but normal odp_queue_enq() could be used to circulate this event inside an application (ODP instance).
>
>> +
>> +/**
>> + * Get address of sender (source) of message
>> + *
>> + * @param msg Message handle
>> + * @param addr Address of sender endpoint
>> + */
>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> +
>> +/**
>> + * Message data pointer
>> + *
>> + * Return a pointer to the message data
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Pointer to the message data
>> + */
>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Message data length
>> + *
>> + * Return length of the message data.
>> + *
>> + * @param msg Message handle
>> + *
>> + * @return Message length
>> + */
>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> +
>> +/**
>> + * Set message length
>> + *
>> + * Set length of the message data.
>> + *
>> + * @param msg Message handle
>> + * @param len New length
>> + *
>> + * @retval 0 on success
>> + * @retval <0 on error
>> + */
>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>
> When data ptr or data len is modified: push/pull head, push/pull tail would be analogies from packet API
>
>
> -Petri
>
>

Alexandru Badicioiu May 21, 2015, 11:22 a.m. UTC | #13

Hi,
would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
purpose of ODP IPC (within a single OS instance)?

Thanks,
Alex

On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
> <petri.savolainen@nokia.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
> ext
> >> Ola Liljedahl
> >> Sent: Tuesday, May 19, 2015 1:04 AM
> >> To: lng-odp@lists.linaro.org
> >> Subject: [lng-odp] [RFC] Add ipc.h
> >>
> >> As promised, here is my first attempt at a standalone API for IPC -
> inter
> >> process communication in a shared nothing architecture (message passing
> >> between processes which do not share memory).
> >>
> >> Currently all definitions are in the file ipc.h but it is possible to
> >> break out some message/event related definitions (everything from
> >> odp_ipc_sender) in a separate file message.h. This would mimic the
> >> packet_io.h/packet.h separation.
> >>
> >> The semantics of message passing is that sending a message to an
> endpoint
> >> will always look like it succeeds. The appearance of endpoints is
> >> explicitly
> >> notified through user-defined messages specified in the
> odp_ipc_resolve()
> >> call. Similarly, the disappearance (e.g. death or otherwise lost
> >> connection)
> >> is also explicitly notified through user-defined messages specified in
> the
> >> odp_ipc_monitor() call. The send call does not fail because the
> addressed
> >> endpoints has disappeared.
> >>
> >> Messages (from endpoint A to endpoint B) are delivered in order. If
> >> message
> >> N sent to an endpoint is delivered, then all messages <N have also been
> >> delivered. Message delivery does not guarantee actual processing by the
> >
> > Ordered is OK requirement, but "all messages <N have also been
> delivered" means in practice loss less delivery (== re-tries and
> retransmission windows, etc). Lossy vs loss less link should be an
> configuration option.
> I am just targeting internal communication which I expect to be
> reliable. There is not any physical "link" involved. If an
> implementation chooses to use some unreliable media, then it will need
> to take some counter measures. Any loss of message could be detected
> using sequence numbers (and timeouts) and handled by (temporary)
> disconnection (so that no more messages will be delivered should one
> go missing).
>
> I am OK with adding the lossless/lossy configuration to the API as
> long as lossless option is always implemented. Is this a configuration
> when creating the local  IPC endpoint or when sending a message to
> another endpoint?
>
> >
> > Also what "delivered" means?'
> >
> > Message:
> >  - transmitted successfully over the link ?
> >  - is now under control of the remote node (post office) ?
> >  - delivered into application input queue ?
> Probably this one but I am not sure the exact definition matters, "has
> been delivered" or "will eventually be delivered unless connection to
> the destination is lost". Maybe there is a better word than
> "delivered?
>
> "Made available into the destination (recipient) address space"?
>
> >  - has been dequeued from application queue ?
> >
> >
> >> recipient. End-to-end acknowledgements (using messages) should be used
> if
> >> this guarantee is important to the user.
> >>
> >> IPC endpoints can be seen as interfaces (taps) to an internal reliable
> >> multidrop network where each endpoint has a unique address which is only
> >> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> >> and then recreated (with the same name), the new endpoint will have a
> >> new address (eventually endpoints addresses will have to be recycled but
> >> not for a very long time). Endpoints names do not necessarily have to be
> >> unique.
> >
> > How widely these addresses are unique: inside one VM, multiple VMs under
> the same host, multiple devices on a LAN (VLAN), ...
> Currently, the scope of the name and address space is defined by the
> implementation. Perhaps we should define it? My current interest is
> within an OS instance (bare metal or virtualised). Between different
> OS instances, I expect something based on IP to be used (because you
> don't know where those different OS/VM instances will be deployed so
> you need topology-independent addressing).
>
> Based on other feedback, I have dropped the contented usage of "IPC"
> and now call it "message bus" (MBUS).
>
> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
> reliable multidrop network"...
>
> >
> >
> >>
> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> >> ---
> >> (This document/code contribution attached is provided under the terms of
> >> agreement LES-LTM-21309)
> >>
> >
> >
> >> +/**
> >> + * Create IPC endpoint
> >> + *
> >> + * @param name Name of local IPC endpoint
> >> + * @param pool Pool for incoming messages
> >> + *
> >> + * @return IPC handle on success
> >> + * @retval ODP_IPC_INVALID on failure and errno set
> >> + */
> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
> >
> > This creates (implicitly) the local end point address.
> >
> >
> >> +
> >> +/**
> >> + * Set the default input queue for an IPC endpoint
> >> + *
> >> + * @param ipc   IPC handle
> >> + * @param queue Queue handle
> >> + *
> >> + * @retval  0 on success
> >> + * @retval <0 on failure
> >> + */
> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> >
> > Multiple input queues are likely needed for different priority messages.
> >
> >> +
> >> +/**
> >> + * Resolve endpoint by name
> >> + *
> >> + * Look up an existing or future endpoint by name.
> >> + * When the endpoint exists, return the specified message with the
> >> endpoint
> >> + * as the sender.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param name Name to resolve
> >> + * @param msg Message to return
> >> + */
> >> +void odp_ipc_resolve(odp_ipc_t ipc,
> >> +                  const char *name,
> >> +                  odp_ipc_msg_t msg);
> >
> > How widely these names are visible? Inside one VM, multiple VMs under
> the same host, multiple devices on a LAN (VLAN), ...
> >
> > I think name service (or address resolution) are better handled in
> middleware layer. If ODP provides unique addresses and message passing
> mechanism, additional services can be built on top.
> >
> >
> >> +
> >> +/**
> >> + * Monitor endpoint
> >> + *
> >> + * Monitor an existing (potentially already dead) endpoint.
> >> + * When the endpoint is dead, return the specified message with the
> >> endpoint
> >> + * as the sender.
> >> + *
> >> + * Unrecognized or invalid endpoint addresses are treated as dead
> >> endpoints.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param addr Address of monitored endpoint
> >> + * @param msg Message to return
> >> + */
> >> +void odp_ipc_monitor(odp_ipc_t ipc,
> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
> >> +                  odp_ipc_msg_t msg);
> >
> > Again, I'd see node health monitoring and alarms as middleware services.
> >
> >> +
> >> +/**
> >> + * Send message
> >> + *
> >> + * Send a message to an endpoint (which may already be dead).
> >> + * Message delivery is ordered and reliable. All (accepted) messages
> will
> >> be
> >> + * delivered up to the point of endpoint death or lost connection.
> >> + * Actual reception and processing is not guaranteed (use end-to-end
> >> + * acknowledgements for that).
> >> + * Monitor the remote endpoint to detect death or lost connection.
> >> + *
> >> + * @param ipc IPC handle
> >> + * @param msg Message to send
> >> + * @param addr Address of remote endpoint
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on error
> >> + */
> >> +int odp_ipc_send(odp_ipc_t ipc,
> >> +              odp_ipc_msg_t msg,
> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >
> > This would be used to send a message to an address, but normal
> odp_queue_enq() could be used to circulate this event inside an application
> (ODP instance).
> >
> >> +
> >> +/**
> >> + * Get address of sender (source) of message
> >> + *
> >> + * @param msg Message handle
> >> + * @param addr Address of sender endpoint
> >> + */
> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >> +
> >> +/**
> >> + * Message data pointer
> >> + *
> >> + * Return a pointer to the message data
> >> + *
> >> + * @param msg Message handle
> >> + *
> >> + * @return Pointer to the message data
> >> + */
> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Message data length
> >> + *
> >> + * Return length of the message data.
> >> + *
> >> + * @param msg Message handle
> >> + *
> >> + * @return Message length
> >> + */
> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> >> +
> >> +/**
> >> + * Set message length
> >> + *
> >> + * Set length of the message data.
> >> + *
> >> + * @param msg Message handle
> >> + * @param len New length
> >> + *
> >> + * @retval 0 on success
> >> + * @retval <0 on error
> >> + */
> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
> >
> > When data ptr or data len is modified: push/pull head, push/pull tail
> would be analogies from packet API
> >
> >
> > -Petri
> >
> >
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>

Ola Liljedahl May 21, 2015, 12:55 p.m. UTC | #14

On 21 May 2015 at 13:22, Alexandru Badicioiu <alexandru.badicioiu@linaro.org>
wrote:
> Hi,
> would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
purpose
> of ODP IPC (within a single OS instance)?
I interpret this as a question whether Netlink would be fit as an
implementation of the ODP IPC (now called message bus because "IPC" is so
contended and imbued with different meanings).

It is perhaps possible. Netlink seems a bit focused on intra-kernel and
kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
(application-to-application).

I see a couple of primary requirements:

   - Support communication (message exchange) between user space processes.
   - Support arbitrary used-defined messages.
   - Ordered, reliable delivery of messages.


From the little I can quickly read up on Netlink, the first two
requirements do not seem supported. But perhaps someone with more intimate
knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
to support u2u and user-defined messages, the current specialization (e.g.
specialized addressing, specialized message formats) seems contrary to the
goals of providing generic mechanisms in the kernel that can be used for
different things.

My IPC/MBUS reference implementation for linux-generic builds upon POSIX
message queues. One of my issues is that I want the message queue
associated with a process to go away when the process goes away. The
message queues are not independent entities.

-- Ola

>
> Thanks,
> Alex
>
> On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>> <petri.savolainen@nokia.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
>> >> ext
>> >> Ola Liljedahl
>> >> Sent: Tuesday, May 19, 2015 1:04 AM
>> >> To: lng-odp@lists.linaro.org
>> >> Subject: [lng-odp] [RFC] Add ipc.h
>> >>
>> >> As promised, here is my first attempt at a standalone API for IPC -
>> >> inter
>> >> process communication in a shared nothing architecture (message
passing
>> >> between processes which do not share memory).
>> >>
>> >> Currently all definitions are in the file ipc.h but it is possible to
>> >> break out some message/event related definitions (everything from
>> >> odp_ipc_sender) in a separate file message.h. This would mimic the
>> >> packet_io.h/packet.h separation.
>> >>
>> >> The semantics of message passing is that sending a message to an
>> >> endpoint
>> >> will always look like it succeeds. The appearance of endpoints is
>> >> explicitly
>> >> notified through user-defined messages specified in the
>> >> odp_ipc_resolve()
>> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>> >> connection)
>> >> is also explicitly notified through user-defined messages specified in
>> >> the
>> >> odp_ipc_monitor() call. The send call does not fail because the
>> >> addressed
>> >> endpoints has disappeared.
>> >>
>> >> Messages (from endpoint A to endpoint B) are delivered in order. If
>> >> message
>> >> N sent to an endpoint is delivered, then all messages <N have also
been
>> >> delivered. Message delivery does not guarantee actual processing by
the
>> >
>> > Ordered is OK requirement, but "all messages <N have also been
>> > delivered" means in practice loss less delivery (== re-tries and
>> > retransmission windows, etc). Lossy vs loss less link should be an
>> > configuration option.
>> I am just targeting internal communication which I expect to be
>> reliable. There is not any physical "link" involved. If an
>> implementation chooses to use some unreliable media, then it will need
>> to take some counter measures. Any loss of message could be detected
>> using sequence numbers (and timeouts) and handled by (temporary)
>> disconnection (so that no more messages will be delivered should one
>> go missing).
>>
>> I am OK with adding the lossless/lossy configuration to the API as
>> long as lossless option is always implemented. Is this a configuration
>> when creating the local  IPC endpoint or when sending a message to
>> another endpoint?
>>
>> >
>> > Also what "delivered" means?'
>> >
>> > Message:
>> >  - transmitted successfully over the link ?
>> >  - is now under control of the remote node (post office) ?
>> >  - delivered into application input queue ?
>> Probably this one but I am not sure the exact definition matters, "has
>> been delivered" or "will eventually be delivered unless connection to
>> the destination is lost". Maybe there is a better word than
>> "delivered?
>>
>> "Made available into the destination (recipient) address space"?
>>
>> >  - has been dequeued from application queue ?
>> >
>> >
>> >> recipient. End-to-end acknowledgements (using messages) should be used
>> >> if
>> >> this guarantee is important to the user.
>> >>
>> >> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>> >> multidrop network where each endpoint has a unique address which is
>> >> only
>> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>> >> destroyed
>> >> and then recreated (with the same name), the new endpoint will have a
>> >> new address (eventually endpoints addresses will have to be recycled
>> >> but
>> >> not for a very long time). Endpoints names do not necessarily have to
>> >> be
>> >> unique.
>> >
>> > How widely these addresses are unique: inside one VM, multiple VMs
under
>> > the same host, multiple devices on a LAN (VLAN), ...
>> Currently, the scope of the name and address space is defined by the
>> implementation. Perhaps we should define it? My current interest is
>> within an OS instance (bare metal or virtualised). Between different
>> OS instances, I expect something based on IP to be used (because you
>> don't know where those different OS/VM instances will be deployed so
>> you need topology-independent addressing).
>>
>> Based on other feedback, I have dropped the contented usage of "IPC"
>> and now call it "message bus" (MBUS).
>>
>> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>> reliable multidrop network"...
>>
>> >
>> >
>> >>
>> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> >> ---
>> >> (This document/code contribution attached is provided under the terms
>> >> of
>> >> agreement LES-LTM-21309)
>> >>
>> >
>> >
>> >> +/**
>> >> + * Create IPC endpoint
>> >> + *
>> >> + * @param name Name of local IPC endpoint
>> >> + * @param pool Pool for incoming messages
>> >> + *
>> >> + * @return IPC handle on success
>> >> + * @retval ODP_IPC_INVALID on failure and errno set
>> >> + */
>> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> >
>> > This creates (implicitly) the local end point address.
>> >
>> >
>> >> +
>> >> +/**
>> >> + * Set the default input queue for an IPC endpoint
>> >> + *
>> >> + * @param ipc   IPC handle
>> >> + * @param queue Queue handle
>> >> + *
>> >> + * @retval  0 on success
>> >> + * @retval <0 on failure
>> >> + */
>> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> >
>> > Multiple input queues are likely needed for different priority
messages.
>> >
>> >> +
>> >> +/**
>> >> + * Resolve endpoint by name
>> >> + *
>> >> + * Look up an existing or future endpoint by name.
>> >> + * When the endpoint exists, return the specified message with the
>> >> endpoint
>> >> + * as the sender.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param name Name to resolve
>> >> + * @param msg Message to return
>> >> + */
>> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>> >> +                  const char *name,
>> >> +                  odp_ipc_msg_t msg);
>> >
>> > How widely these names are visible? Inside one VM, multiple VMs under
>> > the same host, multiple devices on a LAN (VLAN), ...
>> >
>> > I think name service (or address resolution) are better handled in
>> > middleware layer. If ODP provides unique addresses and message passing
>> > mechanism, additional services can be built on top.
>> >
>> >
>> >> +
>> >> +/**
>> >> + * Monitor endpoint
>> >> + *
>> >> + * Monitor an existing (potentially already dead) endpoint.
>> >> + * When the endpoint is dead, return the specified message with the
>> >> endpoint
>> >> + * as the sender.
>> >> + *
>> >> + * Unrecognized or invalid endpoint addresses are treated as dead
>> >> endpoints.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param addr Address of monitored endpoint
>> >> + * @param msg Message to return
>> >> + */
>> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> >> +                  odp_ipc_msg_t msg);
>> >
>> > Again, I'd see node health monitoring and alarms as middleware
services.
>> >
>> >> +
>> >> +/**
>> >> + * Send message
>> >> + *
>> >> + * Send a message to an endpoint (which may already be dead).
>> >> + * Message delivery is ordered and reliable. All (accepted) messages
>> >> will
>> >> be
>> >> + * delivered up to the point of endpoint death or lost connection.
>> >> + * Actual reception and processing is not guaranteed (use end-to-end
>> >> + * acknowledgements for that).
>> >> + * Monitor the remote endpoint to detect death or lost connection.
>> >> + *
>> >> + * @param ipc IPC handle
>> >> + * @param msg Message to send
>> >> + * @param addr Address of remote endpoint
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on error
>> >> + */
>> >> +int odp_ipc_send(odp_ipc_t ipc,
>> >> +              odp_ipc_msg_t msg,
>> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >
>> > This would be used to send a message to an address, but normal
>> > odp_queue_enq() could be used to circulate this event inside an
application
>> > (ODP instance).
>> >
>> >> +
>> >> +/**
>> >> + * Get address of sender (source) of message
>> >> + *
>> >> + * @param msg Message handle
>> >> + * @param addr Address of sender endpoint
>> >> + */
>> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >> +
>> >> +/**
>> >> + * Message data pointer
>> >> + *
>> >> + * Return a pointer to the message data
>> >> + *
>> >> + * @param msg Message handle
>> >> + *
>> >> + * @return Pointer to the message data
>> >> + */
>> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Message data length
>> >> + *
>> >> + * Return length of the message data.
>> >> + *
>> >> + * @param msg Message handle
>> >> + *
>> >> + * @return Message length
>> >> + */
>> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> >> +
>> >> +/**
>> >> + * Set message length
>> >> + *
>> >> + * Set length of the message data.
>> >> + *
>> >> + * @param msg Message handle
>> >> + * @param len New length
>> >> + *
>> >> + * @retval 0 on success
>> >> + * @retval <0 on error
>> >> + */
>> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> >
>> > When data ptr or data len is modified: push/pull head, push/pull tail
>> > would be analogies from packet API
>> >
>> >
>> > -Petri
>> >
>> >
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Alexandru Badicioiu May 21, 2015, 1:05 p.m. UTC | #15

I was referring to the  Netlink protocol in itself, as a model for ODP MBUS
(or IPC).

The interaction between the FEC and the CPC, in the Netlink context,
   defines a protocol.  Netlink provides mechanisms for the CPC
   (residing in user space) and the FEC (residing in kernel space) to
   have their own protocol definition -- *kernel space and user space
   just mean different protection domains*.  Therefore, a wire protocol
   is needed to communicate.  The wire protocol is normally provided by
   some privileged service that is able to copy between multiple
   protection domains.  We will refer to this service as the Netlink
   service.  The Netlink service can also be encapsulated in a different
   transport layer, if the CPC executes on a different node than the
   FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
   a reliable protocol between each other.  By default, however, Netlink
   provides an unreliable communication.

   Note that the FEC and CPC can both live in the same memory protection
   domain and use the connect() system call to create a path to the peer
   and talk to each other.  We will not discuss this mechanism further
   other than to say that it is available. Throughout this document, we
   will refer interchangeably to the FEC to mean kernel space and the
   CPC to mean user space.  This denomination is not meant, however, to
   restrict the two components to these protection domains or to the
   same compute node.



On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 21 May 2015 at 13:22, Alexandru Badicioiu <
> alexandru.badicioiu@linaro.org> wrote:
> > Hi,
> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
> purpose
> > of ODP IPC (within a single OS instance)?
> I interpret this as a question whether Netlink would be fit as an
> implementation of the ODP IPC (now called message bus because "IPC" is so
> contended and imbued with different meanings).
>
> It is perhaps possible. Netlink seems a bit focused on intra-kernel and
> kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
> (application-to-application).
>
> I see a couple of primary requirements:
>
>    - Support communication (message exchange) between user space
>    processes.
>    - Support arbitrary used-defined messages.
>    - Ordered, reliable delivery of messages.
>
>
> From the little I can quickly read up on Netlink, the first two
> requirements do not seem supported. But perhaps someone with more intimate
> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
> to support u2u and user-defined messages, the current specialization (e.g.
> specialized addressing, specialized message formats) seems contrary to the
> goals of providing generic mechanisms in the kernel that can be used for
> different things.
>
> My IPC/MBUS reference implementation for linux-generic builds upon POSIX
> message queues. One of my issues is that I want the message queue
> associated with a process to go away when the process goes away. The
> message queues are not independent entities.
>
> -- Ola
>
> >
> > Thanks,
> > Alex
> >
> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
> >>
> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
> >> <petri.savolainen@nokia.com> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
> >> >> ext
> >> >> Ola Liljedahl
> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
> >> >> To: lng-odp@lists.linaro.org
> >> >> Subject: [lng-odp] [RFC] Add ipc.h
> >> >>
> >> >> As promised, here is my first attempt at a standalone API for IPC -
> >> >> inter
> >> >> process communication in a shared nothing architecture (message
> passing
> >> >> between processes which do not share memory).
> >> >>
> >> >> Currently all definitions are in the file ipc.h but it is possible to
> >> >> break out some message/event related definitions (everything from
> >> >> odp_ipc_sender) in a separate file message.h. This would mimic the
> >> >> packet_io.h/packet.h separation.
> >> >>
> >> >> The semantics of message passing is that sending a message to an
> >> >> endpoint
> >> >> will always look like it succeeds. The appearance of endpoints is
> >> >> explicitly
> >> >> notified through user-defined messages specified in the
> >> >> odp_ipc_resolve()
> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
> >> >> connection)
> >> >> is also explicitly notified through user-defined messages specified
> in
> >> >> the
> >> >> odp_ipc_monitor() call. The send call does not fail because the
> >> >> addressed
> >> >> endpoints has disappeared.
> >> >>
> >> >> Messages (from endpoint A to endpoint B) are delivered in order. If
> >> >> message
> >> >> N sent to an endpoint is delivered, then all messages <N have also
> been
> >> >> delivered. Message delivery does not guarantee actual processing by
> the
> >> >
> >> > Ordered is OK requirement, but "all messages <N have also been
> >> > delivered" means in practice loss less delivery (== re-tries and
> >> > retransmission windows, etc). Lossy vs loss less link should be an
> >> > configuration option.
> >> I am just targeting internal communication which I expect to be
> >> reliable. There is not any physical "link" involved. If an
> >> implementation chooses to use some unreliable media, then it will need
> >> to take some counter measures. Any loss of message could be detected
> >> using sequence numbers (and timeouts) and handled by (temporary)
> >> disconnection (so that no more messages will be delivered should one
> >> go missing).
> >>
> >> I am OK with adding the lossless/lossy configuration to the API as
> >> long as lossless option is always implemented. Is this a configuration
> >> when creating the local  IPC endpoint or when sending a message to
> >> another endpoint?
> >>
> >> >
> >> > Also what "delivered" means?'
> >> >
> >> > Message:
> >> >  - transmitted successfully over the link ?
> >> >  - is now under control of the remote node (post office) ?
> >> >  - delivered into application input queue ?
> >> Probably this one but I am not sure the exact definition matters, "has
> >> been delivered" or "will eventually be delivered unless connection to
> >> the destination is lost". Maybe there is a better word than
> >> "delivered?
> >>
> >> "Made available into the destination (recipient) address space"?
> >>
> >> >  - has been dequeued from application queue ?
> >> >
> >> >
> >> >> recipient. End-to-end acknowledgements (using messages) should be
> used
> >> >> if
> >> >> this guarantee is important to the user.
> >> >>
> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
> reliable
> >> >> multidrop network where each endpoint has a unique address which is
> >> >> only
> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
> >> >> destroyed
> >> >> and then recreated (with the same name), the new endpoint will have a
> >> >> new address (eventually endpoints addresses will have to be recycled
> >> >> but
> >> >> not for a very long time). Endpoints names do not necessarily have to
> >> >> be
> >> >> unique.
> >> >
> >> > How widely these addresses are unique: inside one VM, multiple VMs
> under
> >> > the same host, multiple devices on a LAN (VLAN), ...
> >> Currently, the scope of the name and address space is defined by the
> >> implementation. Perhaps we should define it? My current interest is
> >> within an OS instance (bare metal or virtualised). Between different
> >> OS instances, I expect something based on IP to be used (because you
> >> don't know where those different OS/VM instances will be deployed so
> >> you need topology-independent addressing).
> >>
> >> Based on other feedback, I have dropped the contented usage of "IPC"
> >> and now call it "message bus" (MBUS).
> >>
> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
> >> reliable multidrop network"...
> >>
> >> >
> >> >
> >> >>
> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> >> >> ---
> >> >> (This document/code contribution attached is provided under the terms
> >> >> of
> >> >> agreement LES-LTM-21309)
> >> >>
> >> >
> >> >
> >> >> +/**
> >> >> + * Create IPC endpoint
> >> >> + *
> >> >> + * @param name Name of local IPC endpoint
> >> >> + * @param pool Pool for incoming messages
> >> >> + *
> >> >> + * @return IPC handle on success
> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
> >> >> + */
> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
> >> >
> >> > This creates (implicitly) the local end point address.
> >> >
> >> >
> >> >> +
> >> >> +/**
> >> >> + * Set the default input queue for an IPC endpoint
> >> >> + *
> >> >> + * @param ipc   IPC handle
> >> >> + * @param queue Queue handle
> >> >> + *
> >> >> + * @retval  0 on success
> >> >> + * @retval <0 on failure
> >> >> + */
> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
> >> >
> >> > Multiple input queues are likely needed for different priority
> messages.
> >> >
> >> >> +
> >> >> +/**
> >> >> + * Resolve endpoint by name
> >> >> + *
> >> >> + * Look up an existing or future endpoint by name.
> >> >> + * When the endpoint exists, return the specified message with the
> >> >> endpoint
> >> >> + * as the sender.
> >> >> + *
> >> >> + * @param ipc IPC handle
> >> >> + * @param name Name to resolve
> >> >> + * @param msg Message to return
> >> >> + */
> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
> >> >> +                  const char *name,
> >> >> +                  odp_ipc_msg_t msg);
> >> >
> >> > How widely these names are visible? Inside one VM, multiple VMs under
> >> > the same host, multiple devices on a LAN (VLAN), ...
> >> >
> >> > I think name service (or address resolution) are better handled in
> >> > middleware layer. If ODP provides unique addresses and message passing
> >> > mechanism, additional services can be built on top.
> >> >
> >> >
> >> >> +
> >> >> +/**
> >> >> + * Monitor endpoint
> >> >> + *
> >> >> + * Monitor an existing (potentially already dead) endpoint.
> >> >> + * When the endpoint is dead, return the specified message with the
> >> >> endpoint
> >> >> + * as the sender.
> >> >> + *
> >> >> + * Unrecognized or invalid endpoint addresses are treated as dead
> >> >> endpoints.
> >> >> + *
> >> >> + * @param ipc IPC handle
> >> >> + * @param addr Address of monitored endpoint
> >> >> + * @param msg Message to return
> >> >> + */
> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
> >> >> +                  odp_ipc_msg_t msg);
> >> >
> >> > Again, I'd see node health monitoring and alarms as middleware
> services.
> >> >
> >> >> +
> >> >> +/**
> >> >> + * Send message
> >> >> + *
> >> >> + * Send a message to an endpoint (which may already be dead).
> >> >> + * Message delivery is ordered and reliable. All (accepted) messages
> >> >> will
> >> >> be
> >> >> + * delivered up to the point of endpoint death or lost connection.
> >> >> + * Actual reception and processing is not guaranteed (use end-to-end
> >> >> + * acknowledgements for that).
> >> >> + * Monitor the remote endpoint to detect death or lost connection.
> >> >> + *
> >> >> + * @param ipc IPC handle
> >> >> + * @param msg Message to send
> >> >> + * @param addr Address of remote endpoint
> >> >> + *
> >> >> + * @retval 0 on success
> >> >> + * @retval <0 on error
> >> >> + */
> >> >> +int odp_ipc_send(odp_ipc_t ipc,
> >> >> +              odp_ipc_msg_t msg,
> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >> >
> >> > This would be used to send a message to an address, but normal
> >> > odp_queue_enq() could be used to circulate this event inside an
> application
> >> > (ODP instance).
> >> >
> >> >> +
> >> >> +/**
> >> >> + * Get address of sender (source) of message
> >> >> + *
> >> >> + * @param msg Message handle
> >> >> + * @param addr Address of sender endpoint
> >> >> + */
> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
> >> >> +
> >> >> +/**
> >> >> + * Message data pointer
> >> >> + *
> >> >> + * Return a pointer to the message data
> >> >> + *
> >> >> + * @param msg Message handle
> >> >> + *
> >> >> + * @return Pointer to the message data
> >> >> + */
> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
> >> >> +
> >> >> +/**
> >> >> + * Message data length
> >> >> + *
> >> >> + * Return length of the message data.
> >> >> + *
> >> >> + * @param msg Message handle
> >> >> + *
> >> >> + * @return Message length
> >> >> + */
> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> >> >> +
> >> >> +/**
> >> >> + * Set message length
> >> >> + *
> >> >> + * Set length of the message data.
> >> >> + *
> >> >> + * @param msg Message handle
> >> >> + * @param len New length
> >> >> + *
> >> >> + * @retval 0 on success
> >> >> + * @retval <0 on error
> >> >> + */
> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
> >> >
> >> > When data ptr or data len is modified: push/pull head, push/pull tail
> >> > would be analogies from packet API
> >> >
> >> >
> >> > -Petri
> >> >
> >> >
> >> _______________________________________________
> >> lng-odp mailing list
> >> lng-odp@lists.linaro.org
> >> https://lists.linaro.org/mailman/listinfo/lng-odp
> >
> >
>
>

Ola Liljedahl May 21, 2015, 1:23 p.m. UTC | #16

On 21 May 2015 at 15:05, Alexandru Badicioiu <alexandru.badicioiu@linaro.org
> wrote:

> I was referring to the  Netlink protocol in itself, as a model for ODP
> MBUS (or IPC).
>
Isn't the Netlink protocol what the endpoints send between them? This is
not specified by the ODP IPC/MBUS API, applications can define or re-use
whatever protocol they like. The protocol definition is heavily dependent
on what you actually use the IPC for and we shouldn't force ODP users to
use some specific predefined protocol.

Also the "wire protocol" is left undefined, this is up to the
implementation to define and each platform can have its own definition.

And netlink isn't even reliable. I know that that creates problems, e.g.
impossible to get a clean and complete snapshot of e.g. the routing table.


> The interaction between the FEC and the CPC, in the Netlink context,
>    defines a protocol.  Netlink provides mechanisms for the CPC
>    (residing in user space) and the FEC (residing in kernel space) to
>    have their own protocol definition -- *kernel space and user space
>    just mean different protection domains*.  Therefore, a wire protocol
>    is needed to communicate.  The wire protocol is normally provided by
>    some privileged service that is able to copy between multiple
>    protection domains.  We will refer to this service as the Netlink
>    service.  The Netlink service can also be encapsulated in a different
>    transport layer, if the CPC executes on a different node than the
>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>    a reliable protocol between each other.  By default, however, Netlink
>    provides an unreliable communication.
>
>    Note that the FEC and CPC can both live in the same memory protection
>    domain and use the connect() system call to create a path to the peer
>    and talk to each other.  We will not discuss this mechanism further
>    other than to say that it is available. Throughout this document, we
>    will refer interchangeably to the FEC to mean kernel space and the
>    CPC to mean user space.  This denomination is not meant, however, to
>    restrict the two components to these protection domains or to the
>    same compute node.
>
>
>
> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>
>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>> alexandru.badicioiu@linaro.org> wrote:
>> > Hi,
>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
>> purpose
>> > of ODP IPC (within a single OS instance)?
>> I interpret this as a question whether Netlink would be fit as an
>> implementation of the ODP IPC (now called message bus because "IPC" is so
>> contended and imbued with different meanings).
>>
>> It is perhaps possible. Netlink seems a bit focused on intra-kernel and
>> kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>> (application-to-application).
>>
>> I see a couple of primary requirements:
>>
>>    - Support communication (message exchange) between user space
>>    processes.
>>    - Support arbitrary used-defined messages.
>>    - Ordered, reliable delivery of messages.
>>
>>
>> From the little I can quickly read up on Netlink, the first two
>> requirements do not seem supported. But perhaps someone with more intimate
>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>> to support u2u and user-defined messages, the current specialization (e.g.
>> specialized addressing, specialized message formats) seems contrary to the
>> goals of providing generic mechanisms in the kernel that can be used for
>> different things.
>>
>> My IPC/MBUS reference implementation for linux-generic builds upon POSIX
>> message queues. One of my issues is that I want the message queue
>> associated with a process to go away when the process goes away. The
>> message queues are not independent entities.
>>
>> -- Ola
>>
>> >
>> > Thanks,
>> > Alex
>> >
>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>> wrote:
>> >>
>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>> >> <petri.savolainen@nokia.com> wrote:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf
>> Of
>> >> >> ext
>> >> >> Ola Liljedahl
>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>> >> >> To: lng-odp@lists.linaro.org
>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>> >> >>
>> >> >> As promised, here is my first attempt at a standalone API for IPC -
>> >> >> inter
>> >> >> process communication in a shared nothing architecture (message
>> passing
>> >> >> between processes which do not share memory).
>> >> >>
>> >> >> Currently all definitions are in the file ipc.h but it is possible
>> to
>> >> >> break out some message/event related definitions (everything from
>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic the
>> >> >> packet_io.h/packet.h separation.
>> >> >>
>> >> >> The semantics of message passing is that sending a message to an
>> >> >> endpoint
>> >> >> will always look like it succeeds. The appearance of endpoints is
>> >> >> explicitly
>> >> >> notified through user-defined messages specified in the
>> >> >> odp_ipc_resolve()
>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>> >> >> connection)
>> >> >> is also explicitly notified through user-defined messages specified
>> in
>> >> >> the
>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>> >> >> addressed
>> >> >> endpoints has disappeared.
>> >> >>
>> >> >> Messages (from endpoint A to endpoint B) are delivered in order. If
>> >> >> message
>> >> >> N sent to an endpoint is delivered, then all messages <N have also
>> been
>> >> >> delivered. Message delivery does not guarantee actual processing by
>> the
>> >> >
>> >> > Ordered is OK requirement, but "all messages <N have also been
>> >> > delivered" means in practice loss less delivery (== re-tries and
>> >> > retransmission windows, etc). Lossy vs loss less link should be an
>> >> > configuration option.
>> >> I am just targeting internal communication which I expect to be
>> >> reliable. There is not any physical "link" involved. If an
>> >> implementation chooses to use some unreliable media, then it will need
>> >> to take some counter measures. Any loss of message could be detected
>> >> using sequence numbers (and timeouts) and handled by (temporary)
>> >> disconnection (so that no more messages will be delivered should one
>> >> go missing).
>> >>
>> >> I am OK with adding the lossless/lossy configuration to the API as
>> >> long as lossless option is always implemented. Is this a configuration
>> >> when creating the local  IPC endpoint or when sending a message to
>> >> another endpoint?
>> >>
>> >> >
>> >> > Also what "delivered" means?'
>> >> >
>> >> > Message:
>> >> >  - transmitted successfully over the link ?
>> >> >  - is now under control of the remote node (post office) ?
>> >> >  - delivered into application input queue ?
>> >> Probably this one but I am not sure the exact definition matters, "has
>> >> been delivered" or "will eventually be delivered unless connection to
>> >> the destination is lost". Maybe there is a better word than
>> >> "delivered?
>> >>
>> >> "Made available into the destination (recipient) address space"?
>> >>
>> >> >  - has been dequeued from application queue ?
>> >> >
>> >> >
>> >> >> recipient. End-to-end acknowledgements (using messages) should be
>> used
>> >> >> if
>> >> >> this guarantee is important to the user.
>> >> >>
>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>> reliable
>> >> >> multidrop network where each endpoint has a unique address which is
>> >> >> only
>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>> >> >> destroyed
>> >> >> and then recreated (with the same name), the new endpoint will have
>> a
>> >> >> new address (eventually endpoints addresses will have to be recycled
>> >> >> but
>> >> >> not for a very long time). Endpoints names do not necessarily have
>> to
>> >> >> be
>> >> >> unique.
>> >> >
>> >> > How widely these addresses are unique: inside one VM, multiple VMs
>> under
>> >> > the same host, multiple devices on a LAN (VLAN), ...
>> >> Currently, the scope of the name and address space is defined by the
>> >> implementation. Perhaps we should define it? My current interest is
>> >> within an OS instance (bare metal or virtualised). Between different
>> >> OS instances, I expect something based on IP to be used (because you
>> >> don't know where those different OS/VM instances will be deployed so
>> >> you need topology-independent addressing).
>> >>
>> >> Based on other feedback, I have dropped the contented usage of "IPC"
>> >> and now call it "message bus" (MBUS).
>> >>
>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>> >> reliable multidrop network"...
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>> >> >> ---
>> >> >> (This document/code contribution attached is provided under the
>> terms
>> >> >> of
>> >> >> agreement LES-LTM-21309)
>> >> >>
>> >> >
>> >> >
>> >> >> +/**
>> >> >> + * Create IPC endpoint
>> >> >> + *
>> >> >> + * @param name Name of local IPC endpoint
>> >> >> + * @param pool Pool for incoming messages
>> >> >> + *
>> >> >> + * @return IPC handle on success
>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>> >> >> + */
>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>> >> >
>> >> > This creates (implicitly) the local end point address.
>> >> >
>> >> >
>> >> >> +
>> >> >> +/**
>> >> >> + * Set the default input queue for an IPC endpoint
>> >> >> + *
>> >> >> + * @param ipc   IPC handle
>> >> >> + * @param queue Queue handle
>> >> >> + *
>> >> >> + * @retval  0 on success
>> >> >> + * @retval <0 on failure
>> >> >> + */
>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>> >> >
>> >> > Multiple input queues are likely needed for different priority
>> messages.
>> >> >
>> >> >> +
>> >> >> +/**
>> >> >> + * Resolve endpoint by name
>> >> >> + *
>> >> >> + * Look up an existing or future endpoint by name.
>> >> >> + * When the endpoint exists, return the specified message with the
>> >> >> endpoint
>> >> >> + * as the sender.
>> >> >> + *
>> >> >> + * @param ipc IPC handle
>> >> >> + * @param name Name to resolve
>> >> >> + * @param msg Message to return
>> >> >> + */
>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>> >> >> +                  const char *name,
>> >> >> +                  odp_ipc_msg_t msg);
>> >> >
>> >> > How widely these names are visible? Inside one VM, multiple VMs under
>> >> > the same host, multiple devices on a LAN (VLAN), ...
>> >> >
>> >> > I think name service (or address resolution) are better handled in
>> >> > middleware layer. If ODP provides unique addresses and message
>> passing
>> >> > mechanism, additional services can be built on top.
>> >> >
>> >> >
>> >> >> +
>> >> >> +/**
>> >> >> + * Monitor endpoint
>> >> >> + *
>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>> >> >> + * When the endpoint is dead, return the specified message with the
>> >> >> endpoint
>> >> >> + * as the sender.
>> >> >> + *
>> >> >> + * Unrecognized or invalid endpoint addresses are treated as dead
>> >> >> endpoints.
>> >> >> + *
>> >> >> + * @param ipc IPC handle
>> >> >> + * @param addr Address of monitored endpoint
>> >> >> + * @param msg Message to return
>> >> >> + */
>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>> >> >> +                  odp_ipc_msg_t msg);
>> >> >
>> >> > Again, I'd see node health monitoring and alarms as middleware
>> services.
>> >> >
>> >> >> +
>> >> >> +/**
>> >> >> + * Send message
>> >> >> + *
>> >> >> + * Send a message to an endpoint (which may already be dead).
>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>> messages
>> >> >> will
>> >> >> be
>> >> >> + * delivered up to the point of endpoint death or lost connection.
>> >> >> + * Actual reception and processing is not guaranteed (use
>> end-to-end
>> >> >> + * acknowledgements for that).
>> >> >> + * Monitor the remote endpoint to detect death or lost connection.
>> >> >> + *
>> >> >> + * @param ipc IPC handle
>> >> >> + * @param msg Message to send
>> >> >> + * @param addr Address of remote endpoint
>> >> >> + *
>> >> >> + * @retval 0 on success
>> >> >> + * @retval <0 on error
>> >> >> + */
>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>> >> >> +              odp_ipc_msg_t msg,
>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >> >
>> >> > This would be used to send a message to an address, but normal
>> >> > odp_queue_enq() could be used to circulate this event inside an
>> application
>> >> > (ODP instance).
>> >> >
>> >> >> +
>> >> >> +/**
>> >> >> + * Get address of sender (source) of message
>> >> >> + *
>> >> >> + * @param msg Message handle
>> >> >> + * @param addr Address of sender endpoint
>> >> >> + */
>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>> >> >> +
>> >> >> +/**
>> >> >> + * Message data pointer
>> >> >> + *
>> >> >> + * Return a pointer to the message data
>> >> >> + *
>> >> >> + * @param msg Message handle
>> >> >> + *
>> >> >> + * @return Pointer to the message data
>> >> >> + */
>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>> >> >> +
>> >> >> +/**
>> >> >> + * Message data length
>> >> >> + *
>> >> >> + * Return length of the message data.
>> >> >> + *
>> >> >> + * @param msg Message handle
>> >> >> + *
>> >> >> + * @return Message length
>> >> >> + */
>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>> >> >> +
>> >> >> +/**
>> >> >> + * Set message length
>> >> >> + *
>> >> >> + * Set length of the message data.
>> >> >> + *
>> >> >> + * @param msg Message handle
>> >> >> + * @param len New length
>> >> >> + *
>> >> >> + * @retval 0 on success
>> >> >> + * @retval <0 on error
>> >> >> + */
>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>> >> >
>> >> > When data ptr or data len is modified: push/pull head, push/pull tail
>> >> > would be analogies from packet API
>> >> >
>> >> >
>> >> > -Petri
>> >> >
>> >> >
>> >> _______________________________________________
>> >> lng-odp mailing list
>> >> lng-odp@lists.linaro.org
>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>> >
>> >
>>
>>
>

Alexandru Badicioiu May 21, 2015, 1:56 p.m. UTC | #17

I got the impression that ODP MBUS API would define a transport
protocol/API between an ODP application and a control plane application,
like TCP is the transport protocol for HTTP applications (e.g Web). Netlink
defines exactly that - transport protocol for configuration messages.
Maxim asked about the messages - should applications define the message
format and/or the message content? Wouldn't be an easier task for the
application to define only the content and let ODP to define a format?
Reliability could be an issue but Netlink spec says how applications can
create reliable protocols:


One could create a reliable protocol between an FEC and a CPC by
   using the combination of sequence numbers, ACKs, and retransmit
   timers.  Both sequence numbers and ACKs are provided by Netlink;
   timers are provided by Linux.

   One could create a heartbeat protocol between the FEC and CPC by
   using the ECHO flags and the NLMSG_NOOP message.







On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 21 May 2015 at 15:05, Alexandru Badicioiu <
> alexandru.badicioiu@linaro.org> wrote:
>
>> I was referring to the  Netlink protocol in itself, as a model for ODP
>> MBUS (or IPC).
>>
> Isn't the Netlink protocol what the endpoints send between them? This is
> not specified by the ODP IPC/MBUS API, applications can define or re-use
> whatever protocol they like. The protocol definition is heavily dependent
> on what you actually use the IPC for and we shouldn't force ODP users to
> use some specific predefined protocol.
>
> Also the "wire protocol" is left undefined, this is up to the
> implementation to define and each platform can have its own definition.
>
> And netlink isn't even reliable. I know that that creates problems, e.g.
> impossible to get a clean and complete snapshot of e.g. the routing table.
>
>
>> The interaction between the FEC and the CPC, in the Netlink context,
>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>    (residing in user space) and the FEC (residing in kernel space) to
>>    have their own protocol definition -- *kernel space and user space
>>    just mean different protection domains*.  Therefore, a wire protocol
>>    is needed to communicate.  The wire protocol is normally provided by
>>    some privileged service that is able to copy between multiple
>>    protection domains.  We will refer to this service as the Netlink
>>    service.  The Netlink service can also be encapsulated in a different
>>    transport layer, if the CPC executes on a different node than the
>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>    a reliable protocol between each other.  By default, however, Netlink
>>    provides an unreliable communication.
>>
>>    Note that the FEC and CPC can both live in the same memory protection
>>    domain and use the connect() system call to create a path to the peer
>>    and talk to each other.  We will not discuss this mechanism further
>>    other than to say that it is available. Throughout this document, we
>>    will refer interchangeably to the FEC to mean kernel space and the
>>    CPC to mean user space.  This denomination is not meant, however, to
>>    restrict the two components to these protection domains or to the
>>    same compute node.
>>
>>
>>
>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>> alexandru.badicioiu@linaro.org> wrote:
>>> > Hi,
>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
>>> purpose
>>> > of ODP IPC (within a single OS instance)?
>>> I interpret this as a question whether Netlink would be fit as an
>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>> contended and imbued with different meanings).
>>>
>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel and
>>> kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>> (application-to-application).
>>>
>>> I see a couple of primary requirements:
>>>
>>>    - Support communication (message exchange) between user space
>>>    processes.
>>>    - Support arbitrary used-defined messages.
>>>    - Ordered, reliable delivery of messages.
>>>
>>>
>>> From the little I can quickly read up on Netlink, the first two
>>> requirements do not seem supported. But perhaps someone with more intimate
>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>> to support u2u and user-defined messages, the current specialization (e.g.
>>> specialized addressing, specialized message formats) seems contrary to the
>>> goals of providing generic mechanisms in the kernel that can be used for
>>> different things.
>>>
>>> My IPC/MBUS reference implementation for linux-generic builds upon POSIX
>>> message queues. One of my issues is that I want the message queue
>>> associated with a process to go away when the process goes away. The
>>> message queues are not independent entities.
>>>
>>> -- Ola
>>>
>>> >
>>> > Thanks,
>>> > Alex
>>> >
>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>> wrote:
>>> >>
>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>> >> <petri.savolainen@nokia.com> wrote:
>>> >> >
>>> >> >
>>> >> >> -----Original Message-----
>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf
>>> Of
>>> >> >> ext
>>> >> >> Ola Liljedahl
>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>> >> >> To: lng-odp@lists.linaro.org
>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>> >> >>
>>> >> >> As promised, here is my first attempt at a standalone API for IPC -
>>> >> >> inter
>>> >> >> process communication in a shared nothing architecture (message
>>> passing
>>> >> >> between processes which do not share memory).
>>> >> >>
>>> >> >> Currently all definitions are in the file ipc.h but it is possible
>>> to
>>> >> >> break out some message/event related definitions (everything from
>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic the
>>> >> >> packet_io.h/packet.h separation.
>>> >> >>
>>> >> >> The semantics of message passing is that sending a message to an
>>> >> >> endpoint
>>> >> >> will always look like it succeeds. The appearance of endpoints is
>>> >> >> explicitly
>>> >> >> notified through user-defined messages specified in the
>>> >> >> odp_ipc_resolve()
>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>> >> >> connection)
>>> >> >> is also explicitly notified through user-defined messages
>>> specified in
>>> >> >> the
>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>> >> >> addressed
>>> >> >> endpoints has disappeared.
>>> >> >>
>>> >> >> Messages (from endpoint A to endpoint B) are delivered in order. If
>>> >> >> message
>>> >> >> N sent to an endpoint is delivered, then all messages <N have also
>>> been
>>> >> >> delivered. Message delivery does not guarantee actual processing
>>> by the
>>> >> >
>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>> >> > retransmission windows, etc). Lossy vs loss less link should be an
>>> >> > configuration option.
>>> >> I am just targeting internal communication which I expect to be
>>> >> reliable. There is not any physical "link" involved. If an
>>> >> implementation chooses to use some unreliable media, then it will need
>>> >> to take some counter measures. Any loss of message could be detected
>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>> >> disconnection (so that no more messages will be delivered should one
>>> >> go missing).
>>> >>
>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>> >> long as lossless option is always implemented. Is this a configuration
>>> >> when creating the local  IPC endpoint or when sending a message to
>>> >> another endpoint?
>>> >>
>>> >> >
>>> >> > Also what "delivered" means?'
>>> >> >
>>> >> > Message:
>>> >> >  - transmitted successfully over the link ?
>>> >> >  - is now under control of the remote node (post office) ?
>>> >> >  - delivered into application input queue ?
>>> >> Probably this one but I am not sure the exact definition matters, "has
>>> >> been delivered" or "will eventually be delivered unless connection to
>>> >> the destination is lost". Maybe there is a better word than
>>> >> "delivered?
>>> >>
>>> >> "Made available into the destination (recipient) address space"?
>>> >>
>>> >> >  - has been dequeued from application queue ?
>>> >> >
>>> >> >
>>> >> >> recipient. End-to-end acknowledgements (using messages) should be
>>> used
>>> >> >> if
>>> >> >> this guarantee is important to the user.
>>> >> >>
>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>> reliable
>>> >> >> multidrop network where each endpoint has a unique address which is
>>> >> >> only
>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>> >> >> destroyed
>>> >> >> and then recreated (with the same name), the new endpoint will
>>> have a
>>> >> >> new address (eventually endpoints addresses will have to be
>>> recycled
>>> >> >> but
>>> >> >> not for a very long time). Endpoints names do not necessarily have
>>> to
>>> >> >> be
>>> >> >> unique.
>>> >> >
>>> >> > How widely these addresses are unique: inside one VM, multiple VMs
>>> under
>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>> >> Currently, the scope of the name and address space is defined by the
>>> >> implementation. Perhaps we should define it? My current interest is
>>> >> within an OS instance (bare metal or virtualised). Between different
>>> >> OS instances, I expect something based on IP to be used (because you
>>> >> don't know where those different OS/VM instances will be deployed so
>>> >> you need topology-independent addressing).
>>> >>
>>> >> Based on other feedback, I have dropped the contented usage of "IPC"
>>> >> and now call it "message bus" (MBUS).
>>> >>
>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>> >> reliable multidrop network"...
>>> >>
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>> >> >> ---
>>> >> >> (This document/code contribution attached is provided under the
>>> terms
>>> >> >> of
>>> >> >> agreement LES-LTM-21309)
>>> >> >>
>>> >> >
>>> >> >
>>> >> >> +/**
>>> >> >> + * Create IPC endpoint
>>> >> >> + *
>>> >> >> + * @param name Name of local IPC endpoint
>>> >> >> + * @param pool Pool for incoming messages
>>> >> >> + *
>>> >> >> + * @return IPC handle on success
>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>> >> >> + */
>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>> >> >
>>> >> > This creates (implicitly) the local end point address.
>>> >> >
>>> >> >
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Set the default input queue for an IPC endpoint
>>> >> >> + *
>>> >> >> + * @param ipc   IPC handle
>>> >> >> + * @param queue Queue handle
>>> >> >> + *
>>> >> >> + * @retval  0 on success
>>> >> >> + * @retval <0 on failure
>>> >> >> + */
>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>> >> >
>>> >> > Multiple input queues are likely needed for different priority
>>> messages.
>>> >> >
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Resolve endpoint by name
>>> >> >> + *
>>> >> >> + * Look up an existing or future endpoint by name.
>>> >> >> + * When the endpoint exists, return the specified message with the
>>> >> >> endpoint
>>> >> >> + * as the sender.
>>> >> >> + *
>>> >> >> + * @param ipc IPC handle
>>> >> >> + * @param name Name to resolve
>>> >> >> + * @param msg Message to return
>>> >> >> + */
>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>> >> >> +                  const char *name,
>>> >> >> +                  odp_ipc_msg_t msg);
>>> >> >
>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>> under
>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>> >> >
>>> >> > I think name service (or address resolution) are better handled in
>>> >> > middleware layer. If ODP provides unique addresses and message
>>> passing
>>> >> > mechanism, additional services can be built on top.
>>> >> >
>>> >> >
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Monitor endpoint
>>> >> >> + *
>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>> >> >> + * When the endpoint is dead, return the specified message with
>>> the
>>> >> >> endpoint
>>> >> >> + * as the sender.
>>> >> >> + *
>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as dead
>>> >> >> endpoints.
>>> >> >> + *
>>> >> >> + * @param ipc IPC handle
>>> >> >> + * @param addr Address of monitored endpoint
>>> >> >> + * @param msg Message to return
>>> >> >> + */
>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>> >> >> +                  odp_ipc_msg_t msg);
>>> >> >
>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>> services.
>>> >> >
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Send message
>>> >> >> + *
>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>> messages
>>> >> >> will
>>> >> >> be
>>> >> >> + * delivered up to the point of endpoint death or lost connection.
>>> >> >> + * Actual reception and processing is not guaranteed (use
>>> end-to-end
>>> >> >> + * acknowledgements for that).
>>> >> >> + * Monitor the remote endpoint to detect death or lost connection.
>>> >> >> + *
>>> >> >> + * @param ipc IPC handle
>>> >> >> + * @param msg Message to send
>>> >> >> + * @param addr Address of remote endpoint
>>> >> >> + *
>>> >> >> + * @retval 0 on success
>>> >> >> + * @retval <0 on error
>>> >> >> + */
>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>> >> >> +              odp_ipc_msg_t msg,
>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>> >> >
>>> >> > This would be used to send a message to an address, but normal
>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>> application
>>> >> > (ODP instance).
>>> >> >
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Get address of sender (source) of message
>>> >> >> + *
>>> >> >> + * @param msg Message handle
>>> >> >> + * @param addr Address of sender endpoint
>>> >> >> + */
>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Message data pointer
>>> >> >> + *
>>> >> >> + * Return a pointer to the message data
>>> >> >> + *
>>> >> >> + * @param msg Message handle
>>> >> >> + *
>>> >> >> + * @return Pointer to the message data
>>> >> >> + */
>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Message data length
>>> >> >> + *
>>> >> >> + * Return length of the message data.
>>> >> >> + *
>>> >> >> + * @param msg Message handle
>>> >> >> + *
>>> >> >> + * @return Message length
>>> >> >> + */
>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>> >> >> +
>>> >> >> +/**
>>> >> >> + * Set message length
>>> >> >> + *
>>> >> >> + * Set length of the message data.
>>> >> >> + *
>>> >> >> + * @param msg Message handle
>>> >> >> + * @param len New length
>>> >> >> + *
>>> >> >> + * @retval 0 on success
>>> >> >> + * @retval <0 on error
>>> >> >> + */
>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>> >> >
>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>> tail
>>> >> > would be analogies from packet API
>>> >> >
>>> >> >
>>> >> > -Petri
>>> >> >
>>> >> >
>>> >> _______________________________________________
>>> >> lng-odp mailing list
>>> >> lng-odp@lists.linaro.org
>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>> >
>>> >
>>>
>>>
>>
>

Ola Liljedahl May 21, 2015, 2:46 p.m. UTC | #18

On 21 May 2015 at 15:56, Alexandru Badicioiu <alexandru.badicioiu@linaro.org
> wrote:

> I got the impression that ODP MBUS API would define a transport
> protocol/API between an ODP
>
No the MBUS API is just an API for message passing (think of the OSE IPC
API) and doesn't specify use cases or content. Just like the ODP packet API
doesn't specify what the content in a packet means or the format of the
content.


> application and a control plane application, like TCP is the transport
> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
> transport protocol for configuration messages.
> Maxim asked about the messages - should applications define the message
> format and/or the message content? Wouldn't be an easier task for the
> application to define only the content and let ODP to define a format?
>
How can you define a format when you don't know what the messages are used
for and what data needs to be transferred? Why should the MBUS API or
implementations care about the message format? It's just payload and none
of their business.

If you want to, you can specify formats for specific purposes, e.g. reuse
Netlink formats for the functions that Netlink supports. Some ODP
applications may use this, other not (because they use some other protocol
or they implement some other functionality).



> Reliability could be an issue but Netlink spec says how applications can
> create reliable protocols:
>
>
> One could create a reliable protocol between an FEC and a CPC by
>    using the combination of sequence numbers, ACKs, and retransmit
>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>    timers are provided by Linux.
>
> And you could do the same in ODP but I prefer not to, this adds a level of
complexity to the application code I do not want. Perhaps the actual MBUS
implementation has to do this but then hidden from the applications. Just
like TCP reliability and ordering etc is hidden from the applications that
just do read and write.


>    One could create a heartbeat protocol between the FEC and CPC by
>    using the ECHO flags and the NLMSG_NOOP message.
>
>
>
>
>
>
>
> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>
>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>> alexandru.badicioiu@linaro.org> wrote:
>>
>>> I was referring to the  Netlink protocol in itself, as a model for ODP
>>> MBUS (or IPC).
>>>
>> Isn't the Netlink protocol what the endpoints send between them? This is
>> not specified by the ODP IPC/MBUS API, applications can define or re-use
>> whatever protocol they like. The protocol definition is heavily dependent
>> on what you actually use the IPC for and we shouldn't force ODP users to
>> use some specific predefined protocol.
>>
>> Also the "wire protocol" is left undefined, this is up to the
>> implementation to define and each platform can have its own definition.
>>
>> And netlink isn't even reliable. I know that that creates problems, e.g.
>> impossible to get a clean and complete snapshot of e.g. the routing table.
>>
>>
>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>    have their own protocol definition -- *kernel space and user space
>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>    is needed to communicate.  The wire protocol is normally provided by
>>>    some privileged service that is able to copy between multiple
>>>    protection domains.  We will refer to this service as the Netlink
>>>    service.  The Netlink service can also be encapsulated in a different
>>>    transport layer, if the CPC executes on a different node than the
>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>    a reliable protocol between each other.  By default, however, Netlink
>>>    provides an unreliable communication.
>>>
>>>    Note that the FEC and CPC can both live in the same memory protection
>>>    domain and use the connect() system call to create a path to the peer
>>>    and talk to each other.  We will not discuss this mechanism further
>>>    other than to say that it is available. Throughout this document, we
>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>    restrict the two components to these protection domains or to the
>>>    same compute node.
>>>
>>>
>>>
>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>>
>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>> alexandru.badicioiu@linaro.org> wrote:
>>>> > Hi,
>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit the
>>>> purpose
>>>> > of ODP IPC (within a single OS instance)?
>>>> I interpret this as a question whether Netlink would be fit as an
>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>> contended and imbued with different meanings).
>>>>
>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel and
>>>> kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>> (application-to-application).
>>>>
>>>> I see a couple of primary requirements:
>>>>
>>>>    - Support communication (message exchange) between user space
>>>>    processes.
>>>>    - Support arbitrary used-defined messages.
>>>>    - Ordered, reliable delivery of messages.
>>>>
>>>>
>>>> From the little I can quickly read up on Netlink, the first two
>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>> specialized addressing, specialized message formats) seems contrary to the
>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>> different things.
>>>>
>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>> POSIX message queues. One of my issues is that I want the message queue
>>>> associated with a process to go away when the process goes away. The
>>>> message queues are not independent entities.
>>>>
>>>> -- Ola
>>>>
>>>> >
>>>> > Thanks,
>>>> > Alex
>>>> >
>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> wrote:
>>>> >>
>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>> >> <petri.savolainen@nokia.com> wrote:
>>>> >> >
>>>> >> >
>>>> >> >> -----Original Message-----
>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>> Behalf Of
>>>> >> >> ext
>>>> >> >> Ola Liljedahl
>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>> >> >> To: lng-odp@lists.linaro.org
>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>> >> >>
>>>> >> >> As promised, here is my first attempt at a standalone API for IPC
>>>> -
>>>> >> >> inter
>>>> >> >> process communication in a shared nothing architecture (message
>>>> passing
>>>> >> >> between processes which do not share memory).
>>>> >> >>
>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>> possible to
>>>> >> >> break out some message/event related definitions (everything from
>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic the
>>>> >> >> packet_io.h/packet.h separation.
>>>> >> >>
>>>> >> >> The semantics of message passing is that sending a message to an
>>>> >> >> endpoint
>>>> >> >> will always look like it succeeds. The appearance of endpoints is
>>>> >> >> explicitly
>>>> >> >> notified through user-defined messages specified in the
>>>> >> >> odp_ipc_resolve()
>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>> >> >> connection)
>>>> >> >> is also explicitly notified through user-defined messages
>>>> specified in
>>>> >> >> the
>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>> >> >> addressed
>>>> >> >> endpoints has disappeared.
>>>> >> >>
>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in order.
>>>> If
>>>> >> >> message
>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>> also been
>>>> >> >> delivered. Message delivery does not guarantee actual processing
>>>> by the
>>>> >> >
>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>> >> > retransmission windows, etc). Lossy vs loss less link should be an
>>>> >> > configuration option.
>>>> >> I am just targeting internal communication which I expect to be
>>>> >> reliable. There is not any physical "link" involved. If an
>>>> >> implementation chooses to use some unreliable media, then it will
>>>> need
>>>> >> to take some counter measures. Any loss of message could be detected
>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>> >> disconnection (so that no more messages will be delivered should one
>>>> >> go missing).
>>>> >>
>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>> >> long as lossless option is always implemented. Is this a
>>>> configuration
>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>> >> another endpoint?
>>>> >>
>>>> >> >
>>>> >> > Also what "delivered" means?'
>>>> >> >
>>>> >> > Message:
>>>> >> >  - transmitted successfully over the link ?
>>>> >> >  - is now under control of the remote node (post office) ?
>>>> >> >  - delivered into application input queue ?
>>>> >> Probably this one but I am not sure the exact definition matters,
>>>> "has
>>>> >> been delivered" or "will eventually be delivered unless connection to
>>>> >> the destination is lost". Maybe there is a better word than
>>>> >> "delivered?
>>>> >>
>>>> >> "Made available into the destination (recipient) address space"?
>>>> >>
>>>> >> >  - has been dequeued from application queue ?
>>>> >> >
>>>> >> >
>>>> >> >> recipient. End-to-end acknowledgements (using messages) should be
>>>> used
>>>> >> >> if
>>>> >> >> this guarantee is important to the user.
>>>> >> >>
>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>> reliable
>>>> >> >> multidrop network where each endpoint has a unique address which
>>>> is
>>>> >> >> only
>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>> >> >> destroyed
>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>> have a
>>>> >> >> new address (eventually endpoints addresses will have to be
>>>> recycled
>>>> >> >> but
>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>> have to
>>>> >> >> be
>>>> >> >> unique.
>>>> >> >
>>>> >> > How widely these addresses are unique: inside one VM, multiple VMs
>>>> under
>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>> >> Currently, the scope of the name and address space is defined by the
>>>> >> implementation. Perhaps we should define it? My current interest is
>>>> >> within an OS instance (bare metal or virtualised). Between different
>>>> >> OS instances, I expect something based on IP to be used (because you
>>>> >> don't know where those different OS/VM instances will be deployed so
>>>> >> you need topology-independent addressing).
>>>> >>
>>>> >> Based on other feedback, I have dropped the contented usage of "IPC"
>>>> >> and now call it "message bus" (MBUS).
>>>> >>
>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>> >> reliable multidrop network"...
>>>> >>
>>>> >> >
>>>> >> >
>>>> >> >>
>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> >> >> ---
>>>> >> >> (This document/code contribution attached is provided under the
>>>> terms
>>>> >> >> of
>>>> >> >> agreement LES-LTM-21309)
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> >> +/**
>>>> >> >> + * Create IPC endpoint
>>>> >> >> + *
>>>> >> >> + * @param name Name of local IPC endpoint
>>>> >> >> + * @param pool Pool for incoming messages
>>>> >> >> + *
>>>> >> >> + * @return IPC handle on success
>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>> >> >> + */
>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>> >> >
>>>> >> > This creates (implicitly) the local end point address.
>>>> >> >
>>>> >> >
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>> >> >> + *
>>>> >> >> + * @param ipc   IPC handle
>>>> >> >> + * @param queue Queue handle
>>>> >> >> + *
>>>> >> >> + * @retval  0 on success
>>>> >> >> + * @retval <0 on failure
>>>> >> >> + */
>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>> >> >
>>>> >> > Multiple input queues are likely needed for different priority
>>>> messages.
>>>> >> >
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Resolve endpoint by name
>>>> >> >> + *
>>>> >> >> + * Look up an existing or future endpoint by name.
>>>> >> >> + * When the endpoint exists, return the specified message with
>>>> the
>>>> >> >> endpoint
>>>> >> >> + * as the sender.
>>>> >> >> + *
>>>> >> >> + * @param ipc IPC handle
>>>> >> >> + * @param name Name to resolve
>>>> >> >> + * @param msg Message to return
>>>> >> >> + */
>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>> >> >> +                  const char *name,
>>>> >> >> +                  odp_ipc_msg_t msg);
>>>> >> >
>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>> under
>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>> >> >
>>>> >> > I think name service (or address resolution) are better handled in
>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>> passing
>>>> >> > mechanism, additional services can be built on top.
>>>> >> >
>>>> >> >
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Monitor endpoint
>>>> >> >> + *
>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>> >> >> + * When the endpoint is dead, return the specified message with
>>>> the
>>>> >> >> endpoint
>>>> >> >> + * as the sender.
>>>> >> >> + *
>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as dead
>>>> >> >> endpoints.
>>>> >> >> + *
>>>> >> >> + * @param ipc IPC handle
>>>> >> >> + * @param addr Address of monitored endpoint
>>>> >> >> + * @param msg Message to return
>>>> >> >> + */
>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>> >> >> +                  odp_ipc_msg_t msg);
>>>> >> >
>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>> services.
>>>> >> >
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Send message
>>>> >> >> + *
>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>> messages
>>>> >> >> will
>>>> >> >> be
>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>> connection.
>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>> end-to-end
>>>> >> >> + * acknowledgements for that).
>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>> connection.
>>>> >> >> + *
>>>> >> >> + * @param ipc IPC handle
>>>> >> >> + * @param msg Message to send
>>>> >> >> + * @param addr Address of remote endpoint
>>>> >> >> + *
>>>> >> >> + * @retval 0 on success
>>>> >> >> + * @retval <0 on error
>>>> >> >> + */
>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>> >> >> +              odp_ipc_msg_t msg,
>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> >> >
>>>> >> > This would be used to send a message to an address, but normal
>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>> application
>>>> >> > (ODP instance).
>>>> >> >
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Get address of sender (source) of message
>>>> >> >> + *
>>>> >> >> + * @param msg Message handle
>>>> >> >> + * @param addr Address of sender endpoint
>>>> >> >> + */
>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Message data pointer
>>>> >> >> + *
>>>> >> >> + * Return a pointer to the message data
>>>> >> >> + *
>>>> >> >> + * @param msg Message handle
>>>> >> >> + *
>>>> >> >> + * @return Pointer to the message data
>>>> >> >> + */
>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Message data length
>>>> >> >> + *
>>>> >> >> + * Return length of the message data.
>>>> >> >> + *
>>>> >> >> + * @param msg Message handle
>>>> >> >> + *
>>>> >> >> + * @return Message length
>>>> >> >> + */
>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>> >> >> +
>>>> >> >> +/**
>>>> >> >> + * Set message length
>>>> >> >> + *
>>>> >> >> + * Set length of the message data.
>>>> >> >> + *
>>>> >> >> + * @param msg Message handle
>>>> >> >> + * @param len New length
>>>> >> >> + *
>>>> >> >> + * @retval 0 on success
>>>> >> >> + * @retval <0 on error
>>>> >> >> + */
>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>> >> >
>>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>>> tail
>>>> >> > would be analogies from packet API
>>>> >> >
>>>> >> >
>>>> >> > -Petri
>>>> >> >
>>>> >> >
>>>> >> _______________________________________________
>>>> >> lng-odp mailing list
>>>> >> lng-odp@lists.linaro.org
>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Maxim Uvarov May 21, 2015, 3:45 p.m. UTC | #19

From the rfc 3549 netlink looks like good protocol to communicate between
data plane and control plane. And messages are defined by that protocol
also. At least we should do something the same.

Maxim.

On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 21 May 2015 at 15:56, Alexandru Badicioiu <
> alexandru.badicioiu@linaro.org> wrote:
>
>> I got the impression that ODP MBUS API would define a transport
>> protocol/API between an ODP
>>
> No the MBUS API is just an API for message passing (think of the OSE IPC
> API) and doesn't specify use cases or content. Just like the ODP packet API
> doesn't specify what the content in a packet means or the format of the
> content.
>
>
>> application and a control plane application, like TCP is the transport
>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>> transport protocol for configuration messages.
>> Maxim asked about the messages - should applications define the message
>> format and/or the message content? Wouldn't be an easier task for the
>> application to define only the content and let ODP to define a format?
>>
> How can you define a format when you don't know what the messages are used
> for and what data needs to be transferred? Why should the MBUS API or
> implementations care about the message format? It's just payload and none
> of their business.
>
> If you want to, you can specify formats for specific purposes, e.g. reuse
> Netlink formats for the functions that Netlink supports. Some ODP
> applications may use this, other not (because they use some other protocol
> or they implement some other functionality).
>
>
>
>> Reliability could be an issue but Netlink spec says how applications can
>> create reliable protocols:
>>
>>
>> One could create a reliable protocol between an FEC and a CPC by
>>    using the combination of sequence numbers, ACKs, and retransmit
>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>    timers are provided by Linux.
>>
>> And you could do the same in ODP but I prefer not to, this adds a level
> of complexity to the application code I do not want. Perhaps the actual
> MBUS implementation has to do this but then hidden from the applications.
> Just like TCP reliability and ordering etc is hidden from the applications
> that just do read and write.
>
>    One could create a heartbeat protocol between the FEC and CPC by
>>    using the ECHO flags and the NLMSG_NOOP message.
>>
>>
>>
>>
>>
>>
>>
>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>> alexandru.badicioiu@linaro.org> wrote:
>>>
>>>> I was referring to the  Netlink protocol in itself, as a model for ODP
>>>> MBUS (or IPC).
>>>>
>>> Isn't the Netlink protocol what the endpoints send between them? This is
>>> not specified by the ODP IPC/MBUS API, applications can define or re-use
>>> whatever protocol they like. The protocol definition is heavily dependent
>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>> use some specific predefined protocol.
>>>
>>> Also the "wire protocol" is left undefined, this is up to the
>>> implementation to define and each platform can have its own definition.
>>>
>>> And netlink isn't even reliable. I know that that creates problems, e.g.
>>> impossible to get a clean and complete snapshot of e.g. the routing table.
>>>
>>>
>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>    have their own protocol definition -- *kernel space and user space
>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>    some privileged service that is able to copy between multiple
>>>>    protection domains.  We will refer to this service as the Netlink
>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>    transport layer, if the CPC executes on a different node than the
>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>    provides an unreliable communication.
>>>>
>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>    domain and use the connect() system call to create a path to the peer
>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>    other than to say that it is available. Throughout this document, we
>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>    restrict the two components to these protection domains or to the
>>>>    same compute node.
>>>>
>>>>
>>>>
>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> wrote:
>>>>
>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>> > Hi,
>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>> the purpose
>>>>> > of ODP IPC (within a single OS instance)?
>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>> contended and imbued with different meanings).
>>>>>
>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>> (application-to-application).
>>>>>
>>>>> I see a couple of primary requirements:
>>>>>
>>>>>    - Support communication (message exchange) between user space
>>>>>    processes.
>>>>>    - Support arbitrary used-defined messages.
>>>>>    - Ordered, reliable delivery of messages.
>>>>>
>>>>>
>>>>> From the little I can quickly read up on Netlink, the first two
>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>> different things.
>>>>>
>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>> associated with a process to go away when the process goes away. The
>>>>> message queues are not independent entities.
>>>>>
>>>>> -- Ola
>>>>>
>>>>> >
>>>>> > Thanks,
>>>>> > Alex
>>>>> >
>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> wrote:
>>>>> >>
>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>> >> >
>>>>> >> >
>>>>> >> >> -----Original Message-----
>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>> Behalf Of
>>>>> >> >> ext
>>>>> >> >> Ola Liljedahl
>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>> >> >>
>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>> IPC -
>>>>> >> >> inter
>>>>> >> >> process communication in a shared nothing architecture (message
>>>>> passing
>>>>> >> >> between processes which do not share memory).
>>>>> >> >>
>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>> possible to
>>>>> >> >> break out some message/event related definitions (everything from
>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic
>>>>> the
>>>>> >> >> packet_io.h/packet.h separation.
>>>>> >> >>
>>>>> >> >> The semantics of message passing is that sending a message to an
>>>>> >> >> endpoint
>>>>> >> >> will always look like it succeeds. The appearance of endpoints is
>>>>> >> >> explicitly
>>>>> >> >> notified through user-defined messages specified in the
>>>>> >> >> odp_ipc_resolve()
>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>>> >> >> connection)
>>>>> >> >> is also explicitly notified through user-defined messages
>>>>> specified in
>>>>> >> >> the
>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>>> >> >> addressed
>>>>> >> >> endpoints has disappeared.
>>>>> >> >>
>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in order.
>>>>> If
>>>>> >> >> message
>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>> also been
>>>>> >> >> delivered. Message delivery does not guarantee actual processing
>>>>> by the
>>>>> >> >
>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be an
>>>>> >> > configuration option.
>>>>> >> I am just targeting internal communication which I expect to be
>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>> >> implementation chooses to use some unreliable media, then it will
>>>>> need
>>>>> >> to take some counter measures. Any loss of message could be detected
>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>> >> disconnection (so that no more messages will be delivered should one
>>>>> >> go missing).
>>>>> >>
>>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>>> >> long as lossless option is always implemented. Is this a
>>>>> configuration
>>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>>> >> another endpoint?
>>>>> >>
>>>>> >> >
>>>>> >> > Also what "delivered" means?'
>>>>> >> >
>>>>> >> > Message:
>>>>> >> >  - transmitted successfully over the link ?
>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>> >> >  - delivered into application input queue ?
>>>>> >> Probably this one but I am not sure the exact definition matters,
>>>>> "has
>>>>> >> been delivered" or "will eventually be delivered unless connection
>>>>> to
>>>>> >> the destination is lost". Maybe there is a better word than
>>>>> >> "delivered?
>>>>> >>
>>>>> >> "Made available into the destination (recipient) address space"?
>>>>> >>
>>>>> >> >  - has been dequeued from application queue ?
>>>>> >> >
>>>>> >> >
>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should
>>>>> be used
>>>>> >> >> if
>>>>> >> >> this guarantee is important to the user.
>>>>> >> >>
>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>> reliable
>>>>> >> >> multidrop network where each endpoint has a unique address which
>>>>> is
>>>>> >> >> only
>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>> >> >> destroyed
>>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>>> have a
>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>> recycled
>>>>> >> >> but
>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>> have to
>>>>> >> >> be
>>>>> >> >> unique.
>>>>> >> >
>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>> VMs under
>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>> >> Currently, the scope of the name and address space is defined by the
>>>>> >> implementation. Perhaps we should define it? My current interest is
>>>>> >> within an OS instance (bare metal or virtualised). Between different
>>>>> >> OS instances, I expect something based on IP to be used (because you
>>>>> >> don't know where those different OS/VM instances will be deployed so
>>>>> >> you need topology-independent addressing).
>>>>> >>
>>>>> >> Based on other feedback, I have dropped the contented usage of "IPC"
>>>>> >> and now call it "message bus" (MBUS).
>>>>> >>
>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>>> >> reliable multidrop network"...
>>>>> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >>
>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> >> >> ---
>>>>> >> >> (This document/code contribution attached is provided under the
>>>>> terms
>>>>> >> >> of
>>>>> >> >> agreement LES-LTM-21309)
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >> +/**
>>>>> >> >> + * Create IPC endpoint
>>>>> >> >> + *
>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>> >> >> + * @param pool Pool for incoming messages
>>>>> >> >> + *
>>>>> >> >> + * @return IPC handle on success
>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>> >> >> + */
>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>> >> >
>>>>> >> > This creates (implicitly) the local end point address.
>>>>> >> >
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>> >> >> + *
>>>>> >> >> + * @param ipc   IPC handle
>>>>> >> >> + * @param queue Queue handle
>>>>> >> >> + *
>>>>> >> >> + * @retval  0 on success
>>>>> >> >> + * @retval <0 on failure
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>> >> >
>>>>> >> > Multiple input queues are likely needed for different priority
>>>>> messages.
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Resolve endpoint by name
>>>>> >> >> + *
>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>> >> >> + * When the endpoint exists, return the specified message with
>>>>> the
>>>>> >> >> endpoint
>>>>> >> >> + * as the sender.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param name Name to resolve
>>>>> >> >> + * @param msg Message to return
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>> >> >> +                  const char *name,
>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>> >> >
>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>>> under
>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>> >> >
>>>>> >> > I think name service (or address resolution) are better handled in
>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>> passing
>>>>> >> > mechanism, additional services can be built on top.
>>>>> >> >
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Monitor endpoint
>>>>> >> >> + *
>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>> >> >> + * When the endpoint is dead, return the specified message with
>>>>> the
>>>>> >> >> endpoint
>>>>> >> >> + * as the sender.
>>>>> >> >> + *
>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>> dead
>>>>> >> >> endpoints.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>> >> >> + * @param msg Message to return
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>> >> >
>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>> services.
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Send message
>>>>> >> >> + *
>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>> messages
>>>>> >> >> will
>>>>> >> >> be
>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>> connection.
>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>> end-to-end
>>>>> >> >> + * acknowledgements for that).
>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>> connection.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param msg Message to send
>>>>> >> >> + * @param addr Address of remote endpoint
>>>>> >> >> + *
>>>>> >> >> + * @retval 0 on success
>>>>> >> >> + * @retval <0 on error
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> >> >
>>>>> >> > This would be used to send a message to an address, but normal
>>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>>> application
>>>>> >> > (ODP instance).
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Get address of sender (source) of message
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + * @param addr Address of sender endpoint
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Message data pointer
>>>>> >> >> + *
>>>>> >> >> + * Return a pointer to the message data
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + *
>>>>> >> >> + * @return Pointer to the message data
>>>>> >> >> + */
>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Message data length
>>>>> >> >> + *
>>>>> >> >> + * Return length of the message data.
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + *
>>>>> >> >> + * @return Message length
>>>>> >> >> + */
>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Set message length
>>>>> >> >> + *
>>>>> >> >> + * Set length of the message data.
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + * @param len New length
>>>>> >> >> + *
>>>>> >> >> + * @retval 0 on success
>>>>> >> >> + * @retval <0 on error
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>> >> >
>>>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>>>> tail
>>>>> >> > would be analogies from packet API
>>>>> >> >
>>>>> >> >
>>>>> >> > -Petri
>>>>> >> >
>>>>> >> >
>>>>> >> _______________________________________________
>>>>> >> lng-odp mailing list
>>>>> >> lng-odp@lists.linaro.org
>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>

Ola Liljedahl May 21, 2015, 9:09 p.m. UTC | #20

On 21 May 2015 at 17:45, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:

> From the rfc 3549 netlink looks like good protocol to communicate between
> data plane and control plane. And messages are defined by that protocol
> also. At least we should do something the same.
>
Netlink seems limited to the specific functionality already present in the
Linux kernel. An ODP IPC/message passing mechanism must be extensible and
support user-defined messages. There's no reason for ODP MBUS to impose any
message format.

Any (set of) applications can model their message formats on Netlink.

I don't understand how Netlink can be used to communicate between (any two)
two applications. Please enlighten me.

-- Ola


>
>
> Maxim.
>
> On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>
>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>> alexandru.badicioiu@linaro.org> wrote:
>>
>>> I got the impression that ODP MBUS API would define a transport
>>> protocol/API between an ODP
>>>
>> No the MBUS API is just an API for message passing (think of the OSE IPC
>> API) and doesn't specify use cases or content. Just like the ODP packet API
>> doesn't specify what the content in a packet means or the format of the
>> content.
>>
>>
>>> application and a control plane application, like TCP is the transport
>>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>>> transport protocol for configuration messages.
>>> Maxim asked about the messages - should applications define the message
>>> format and/or the message content? Wouldn't be an easier task for the
>>> application to define only the content and let ODP to define a format?
>>>
>> How can you define a format when you don't know what the messages are
>> used for and what data needs to be transferred? Why should the MBUS API or
>> implementations care about the message format? It's just payload and none
>> of their business.
>>
>> If you want to, you can specify formats for specific purposes, e.g. reuse
>> Netlink formats for the functions that Netlink supports. Some ODP
>> applications may use this, other not (because they use some other protocol
>> or they implement some other functionality).
>>
>>
>>
>>> Reliability could be an issue but Netlink spec says how applications can
>>> create reliable protocols:
>>>
>>>
>>> One could create a reliable protocol between an FEC and a CPC by
>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>    timers are provided by Linux.
>>>
>>> And you could do the same in ODP but I prefer not to, this adds a level
>> of complexity to the application code I do not want. Perhaps the actual
>> MBUS implementation has to do this but then hidden from the applications.
>> Just like TCP reliability and ordering etc is hidden from the applications
>> that just do read and write.
>>
>>    One could create a heartbeat protocol between the FEC and CPC by
>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>>
>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>
>>>>> I was referring to the  Netlink protocol in itself, as a model for ODP
>>>>> MBUS (or IPC).
>>>>>
>>>> Isn't the Netlink protocol what the endpoints send between them? This
>>>> is not specified by the ODP IPC/MBUS API, applications can define or re-use
>>>> whatever protocol they like. The protocol definition is heavily dependent
>>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>>> use some specific predefined protocol.
>>>>
>>>> Also the "wire protocol" is left undefined, this is up to the
>>>> implementation to define and each platform can have its own definition.
>>>>
>>>> And netlink isn't even reliable. I know that that creates problems,
>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>> table.
>>>>
>>>>
>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>    some privileged service that is able to copy between multiple
>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>    provides an unreliable communication.
>>>>>
>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>    other than to say that it is available. Throughout this document, we
>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>    restrict the two components to these protection domains or to the
>>>>>    same compute node.
>>>>>
>>>>>
>>>>>
>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> wrote:
>>>>>
>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>> > Hi,
>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>>> the purpose
>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>> contended and imbued with different meanings).
>>>>>>
>>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>>> (application-to-application).
>>>>>>
>>>>>> I see a couple of primary requirements:
>>>>>>
>>>>>>    - Support communication (message exchange) between user space
>>>>>>    processes.
>>>>>>    - Support arbitrary used-defined messages.
>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>
>>>>>>
>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>> different things.
>>>>>>
>>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>>> associated with a process to go away when the process goes away. The
>>>>>> message queues are not independent entities.
>>>>>>
>>>>>> -- Ola
>>>>>>
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Alex
>>>>>> >
>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> -----Original Message-----
>>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>>> Behalf Of
>>>>>> >> >> ext
>>>>>> >> >> Ola Liljedahl
>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>> >> >>
>>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>>> IPC -
>>>>>> >> >> inter
>>>>>> >> >> process communication in a shared nothing architecture (message
>>>>>> passing
>>>>>> >> >> between processes which do not share memory).
>>>>>> >> >>
>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>> possible to
>>>>>> >> >> break out some message/event related definitions (everything
>>>>>> from
>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic
>>>>>> the
>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>> >> >>
>>>>>> >> >> The semantics of message passing is that sending a message to an
>>>>>> >> >> endpoint
>>>>>> >> >> will always look like it succeeds. The appearance of endpoints
>>>>>> is
>>>>>> >> >> explicitly
>>>>>> >> >> notified through user-defined messages specified in the
>>>>>> >> >> odp_ipc_resolve()
>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>>>> >> >> connection)
>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>> specified in
>>>>>> >> >> the
>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>>>> >> >> addressed
>>>>>> >> >> endpoints has disappeared.
>>>>>> >> >>
>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>> order. If
>>>>>> >> >> message
>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>>> also been
>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>> processing by the
>>>>>> >> >
>>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be
>>>>>> an
>>>>>> >> > configuration option.
>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>> >> implementation chooses to use some unreliable media, then it will
>>>>>> need
>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>> detected
>>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>>> >> disconnection (so that no more messages will be delivered should
>>>>>> one
>>>>>> >> go missing).
>>>>>> >>
>>>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>> configuration
>>>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>>>> >> another endpoint?
>>>>>> >>
>>>>>> >> >
>>>>>> >> > Also what "delivered" means?'
>>>>>> >> >
>>>>>> >> > Message:
>>>>>> >> >  - transmitted successfully over the link ?
>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>> >> >  - delivered into application input queue ?
>>>>>> >> Probably this one but I am not sure the exact definition matters,
>>>>>> "has
>>>>>> >> been delivered" or "will eventually be delivered unless connection
>>>>>> to
>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>> >> "delivered?
>>>>>> >>
>>>>>> >> "Made available into the destination (recipient) address space"?
>>>>>> >>
>>>>>> >> >  - has been dequeued from application queue ?
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should
>>>>>> be used
>>>>>> >> >> if
>>>>>> >> >> this guarantee is important to the user.
>>>>>> >> >>
>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>>> reliable
>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>> which is
>>>>>> >> >> only
>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>>> >> >> destroyed
>>>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>>>> have a
>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>> recycled
>>>>>> >> >> but
>>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>>> have to
>>>>>> >> >> be
>>>>>> >> >> unique.
>>>>>> >> >
>>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>>> VMs under
>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>> >> Currently, the scope of the name and address space is defined by
>>>>>> the
>>>>>> >> implementation. Perhaps we should define it? My current interest is
>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>> different
>>>>>> >> OS instances, I expect something based on IP to be used (because
>>>>>> you
>>>>>> >> don't know where those different OS/VM instances will be deployed
>>>>>> so
>>>>>> >> you need topology-independent addressing).
>>>>>> >>
>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>> "IPC"
>>>>>> >> and now call it "message bus" (MBUS).
>>>>>> >>
>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>>>> >> reliable multidrop network"...
>>>>>> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >>
>>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>> >> >> ---
>>>>>> >> >> (This document/code contribution attached is provided under the
>>>>>> terms
>>>>>> >> >> of
>>>>>> >> >> agreement LES-LTM-21309)
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +/**
>>>>>> >> >> + * Create IPC endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>> >> >> + *
>>>>>> >> >> + * @return IPC handle on success
>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>> >> >> + */
>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>>> >> >
>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc   IPC handle
>>>>>> >> >> + * @param queue Queue handle
>>>>>> >> >> + *
>>>>>> >> >> + * @retval  0 on success
>>>>>> >> >> + * @retval <0 on failure
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>> >> >
>>>>>> >> > Multiple input queues are likely needed for different priority
>>>>>> messages.
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Resolve endpoint by name
>>>>>> >> >> + *
>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>> >> >> + * When the endpoint exists, return the specified message with
>>>>>> the
>>>>>> >> >> endpoint
>>>>>> >> >> + * as the sender.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param name Name to resolve
>>>>>> >> >> + * @param msg Message to return
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>> >> >> +                  const char *name,
>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>> >> >
>>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>>>> under
>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>> >> >
>>>>>> >> > I think name service (or address resolution) are better handled
>>>>>> in
>>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>>> passing
>>>>>> >> > mechanism, additional services can be built on top.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Monitor endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>> with the
>>>>>> >> >> endpoint
>>>>>> >> >> + * as the sender.
>>>>>> >> >> + *
>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>>> dead
>>>>>> >> >> endpoints.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>> >> >> + * @param msg Message to return
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>> >> >
>>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>>> services.
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Send message
>>>>>> >> >> + *
>>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>>> messages
>>>>>> >> >> will
>>>>>> >> >> be
>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>> connection.
>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>> end-to-end
>>>>>> >> >> + * acknowledgements for that).
>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>> connection.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param msg Message to send
>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @retval 0 on success
>>>>>> >> >> + * @retval <0 on error
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>> >> >
>>>>>> >> > This would be used to send a message to an address, but normal
>>>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>>>> application
>>>>>> >> > (ODP instance).
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Get address of sender (source) of message
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Message data pointer
>>>>>> >> >> + *
>>>>>> >> >> + * Return a pointer to the message data
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + *
>>>>>> >> >> + * @return Pointer to the message data
>>>>>> >> >> + */
>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Message data length
>>>>>> >> >> + *
>>>>>> >> >> + * Return length of the message data.
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + *
>>>>>> >> >> + * @return Message length
>>>>>> >> >> + */
>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Set message length
>>>>>> >> >> + *
>>>>>> >> >> + * Set length of the message data.
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + * @param len New length
>>>>>> >> >> + *
>>>>>> >> >> + * @retval 0 on success
>>>>>> >> >> + * @retval <0 on error
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>> >> >
>>>>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>>>>> tail
>>>>>> >> > would be analogies from packet API
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > -Petri
>>>>>> >> >
>>>>>> >> >
>>>>>> >> _______________________________________________
>>>>>> >> lng-odp mailing list
>>>>>> >> lng-odp@lists.linaro.org
>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-odp@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>

Alexandru Badicioiu May 22, 2015, 6:14 a.m. UTC | #21

On 22 May 2015 at 00:09, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 21 May 2015 at 17:45, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>
>> From the rfc 3549 netlink looks like good protocol to communicate between
>> data plane and control plane. And messages are defined by that protocol
>> also. At least we should do something the same.
>>
> Netlink seems limited to the specific functionality already present in the
> Linux kernel. An ODP IPC/message passing mechanism must be extensible and
> support user-defined messages. There's no reason for ODP MBUS to impose any
> message format.
>
Netlink is extensively implemented in Linux kernel but the RFC explicitly
doesn't limit it to this scope.
Netlink messages have a  header , defined by Netlink protocol and a payload
which contains user-defined messages in TLV format (e.g - RTM_XXX messages
for routing control). Doesn't TLV format suffice for the need of ODP
applications?

>
> Any (set of) applications can model their message formats on Netlink.
>
> I don't understand how Netlink can be used to communicate between (any
> two) two applications. Please enlighten me.
>
Netlink is not limited to user-kernel communication, only some of the
current services like RTM_XXX for routing configuration. For example ,
Generic Netlink allows users in both kernel and userspace -
https://lwn.net/Articles/208755/:

"When looking at figure #1 it is important to note that any Generic Netlink
user can communicate with any other user over the bus using the same API
regardless of where the user resides in relation to the kernel/userspace
boundary."


> -- Ola
>
>
>>
>>
>> Maxim.
>>
>> On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>>> alexandru.badicioiu@linaro.org> wrote:
>>>
>>>> I got the impression that ODP MBUS API would define a transport
>>>> protocol/API between an ODP
>>>>
>>> No the MBUS API is just an API for message passing (think of the OSE IPC
>>> API) and doesn't specify use cases or content. Just like the ODP packet API
>>> doesn't specify what the content in a packet means or the format of the
>>> content.
>>>
>>>
>>>> application and a control plane application, like TCP is the transport
>>>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>>>> transport protocol for configuration messages.
>>>> Maxim asked about the messages - should applications define the message
>>>> format and/or the message content? Wouldn't be an easier task for the
>>>> application to define only the content and let ODP to define a format?
>>>>
>>> How can you define a format when you don't know what the messages are
>>> used for and what data needs to be transferred? Why should the MBUS API or
>>> implementations care about the message format? It's just payload and none
>>> of their business.
>>>
>>> If you want to, you can specify formats for specific purposes, e.g.
>>> reuse Netlink formats for the functions that Netlink supports. Some ODP
>>> applications may use this, other not (because they use some other protocol
>>> or they implement some other functionality).
>>>
>>>
>>>
>>>> Reliability could be an issue but Netlink spec says how applications
>>>> can create reliable protocols:
>>>>
>>>>
>>>> One could create a reliable protocol between an FEC and a CPC by
>>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>>    timers are provided by Linux.
>>>>
>>>> And you could do the same in ODP but I prefer not to, this adds a level
>>> of complexity to the application code I do not want. Perhaps the actual
>>> MBUS implementation has to do this but then hidden from the applications.
>>> Just like TCP reliability and ordering etc is hidden from the applications
>>> that just do read and write.
>>>
>>>    One could create a heartbeat protocol between the FEC and CPC by
>>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> wrote:
>>>>
>>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>
>>>>>> I was referring to the  Netlink protocol in itself, as a model for
>>>>>> ODP MBUS (or IPC).
>>>>>>
>>>>> Isn't the Netlink protocol what the endpoints send between them? This
>>>>> is not specified by the ODP IPC/MBUS API, applications can define or re-use
>>>>> whatever protocol they like. The protocol definition is heavily dependent
>>>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>>>> use some specific predefined protocol.
>>>>>
>>>>> Also the "wire protocol" is left undefined, this is up to the
>>>>> implementation to define and each platform can have its own definition.
>>>>>
>>>>> And netlink isn't even reliable. I know that that creates problems,
>>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>>> table.
>>>>>
>>>>>
>>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>>    some privileged service that is able to copy between multiple
>>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>>    provides an unreliable communication.
>>>>>>
>>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>>    other than to say that it is available. Throughout this document, we
>>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>>    restrict the two components to these protection domains or to the
>>>>>>    same compute node.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>> wrote:
>>>>>>
>>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>> > Hi,
>>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>>>> the purpose
>>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>>> contended and imbued with different meanings).
>>>>>>>
>>>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>>>> (application-to-application).
>>>>>>>
>>>>>>> I see a couple of primary requirements:
>>>>>>>
>>>>>>>    - Support communication (message exchange) between user space
>>>>>>>    processes.
>>>>>>>    - Support arbitrary used-defined messages.
>>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>>
>>>>>>>
>>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>>> different things.
>>>>>>>
>>>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>>>> associated with a process to go away when the process goes away. The
>>>>>>> message queues are not independent entities.
>>>>>>>
>>>>>>> -- Ola
>>>>>>>
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Alex
>>>>>>> >
>>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >> -----Original Message-----
>>>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>>>> Behalf Of
>>>>>>> >> >> ext
>>>>>>> >> >> Ola Liljedahl
>>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>>> >> >>
>>>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>>>> IPC -
>>>>>>> >> >> inter
>>>>>>> >> >> process communication in a shared nothing architecture
>>>>>>> (message passing
>>>>>>> >> >> between processes which do not share memory).
>>>>>>> >> >>
>>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>>> possible to
>>>>>>> >> >> break out some message/event related definitions (everything
>>>>>>> from
>>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic
>>>>>>> the
>>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>>> >> >>
>>>>>>> >> >> The semantics of message passing is that sending a message to
>>>>>>> an
>>>>>>> >> >> endpoint
>>>>>>> >> >> will always look like it succeeds. The appearance of endpoints
>>>>>>> is
>>>>>>> >> >> explicitly
>>>>>>> >> >> notified through user-defined messages specified in the
>>>>>>> >> >> odp_ipc_resolve()
>>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise
>>>>>>> lost
>>>>>>> >> >> connection)
>>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>>> specified in
>>>>>>> >> >> the
>>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>>>>> >> >> addressed
>>>>>>> >> >> endpoints has disappeared.
>>>>>>> >> >>
>>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>>> order. If
>>>>>>> >> >> message
>>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>>>> also been
>>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>>> processing by the
>>>>>>> >> >
>>>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be
>>>>>>> an
>>>>>>> >> > configuration option.
>>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>>> >> implementation chooses to use some unreliable media, then it will
>>>>>>> need
>>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>>> detected
>>>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>>>> >> disconnection (so that no more messages will be delivered should
>>>>>>> one
>>>>>>> >> go missing).
>>>>>>> >>
>>>>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>>> configuration
>>>>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>>>>> >> another endpoint?
>>>>>>> >>
>>>>>>> >> >
>>>>>>> >> > Also what "delivered" means?'
>>>>>>> >> >
>>>>>>> >> > Message:
>>>>>>> >> >  - transmitted successfully over the link ?
>>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>>> >> >  - delivered into application input queue ?
>>>>>>> >> Probably this one but I am not sure the exact definition matters,
>>>>>>> "has
>>>>>>> >> been delivered" or "will eventually be delivered unless
>>>>>>> connection to
>>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>>> >> "delivered?
>>>>>>> >>
>>>>>>> >> "Made available into the destination (recipient) address space"?
>>>>>>> >>
>>>>>>> >> >  - has been dequeued from application queue ?
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should
>>>>>>> be used
>>>>>>> >> >> if
>>>>>>> >> >> this guarantee is important to the user.
>>>>>>> >> >>
>>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>>>> reliable
>>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>>> which is
>>>>>>> >> >> only
>>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>>>> >> >> destroyed
>>>>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>>>>> have a
>>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>>> recycled
>>>>>>> >> >> but
>>>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>>>> have to
>>>>>>> >> >> be
>>>>>>> >> >> unique.
>>>>>>> >> >
>>>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>>>> VMs under
>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>> >> Currently, the scope of the name and address space is defined by
>>>>>>> the
>>>>>>> >> implementation. Perhaps we should define it? My current interest
>>>>>>> is
>>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>>> different
>>>>>>> >> OS instances, I expect something based on IP to be used (because
>>>>>>> you
>>>>>>> >> don't know where those different OS/VM instances will be deployed
>>>>>>> so
>>>>>>> >> you need topology-independent addressing).
>>>>>>> >>
>>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>>> "IPC"
>>>>>>> >> and now call it "message bus" (MBUS).
>>>>>>> >>
>>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>>>>> >> reliable multidrop network"...
>>>>>>> >>
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >>
>>>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>> >> >> ---
>>>>>>> >> >> (This document/code contribution attached is provided under
>>>>>>> the terms
>>>>>>> >> >> of
>>>>>>> >> >> agreement LES-LTM-21309)
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Create IPC endpoint
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>>> >> >> + *
>>>>>>> >> >> + * @return IPC handle on success
>>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>>> >> >> + */
>>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>>>> >> >
>>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param ipc   IPC handle
>>>>>>> >> >> + * @param queue Queue handle
>>>>>>> >> >> + *
>>>>>>> >> >> + * @retval  0 on success
>>>>>>> >> >> + * @retval <0 on failure
>>>>>>> >> >> + */
>>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>>> >> >
>>>>>>> >> > Multiple input queues are likely needed for different priority
>>>>>>> messages.
>>>>>>> >> >
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Resolve endpoint by name
>>>>>>> >> >> + *
>>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>>> >> >> + * When the endpoint exists, return the specified message
>>>>>>> with the
>>>>>>> >> >> endpoint
>>>>>>> >> >> + * as the sender.
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>> >> >> + * @param name Name to resolve
>>>>>>> >> >> + * @param msg Message to return
>>>>>>> >> >> + */
>>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>>> >> >> +                  const char *name,
>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>> >> >
>>>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>>>>> under
>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>> >> >
>>>>>>> >> > I think name service (or address resolution) are better handled
>>>>>>> in
>>>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>>>> passing
>>>>>>> >> > mechanism, additional services can be built on top.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Monitor endpoint
>>>>>>> >> >> + *
>>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>>> with the
>>>>>>> >> >> endpoint
>>>>>>> >> >> + * as the sender.
>>>>>>> >> >> + *
>>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>>>> dead
>>>>>>> >> >> endpoints.
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>>> >> >> + * @param msg Message to return
>>>>>>> >> >> + */
>>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>> >> >
>>>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>>>> services.
>>>>>>> >> >
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Send message
>>>>>>> >> >> + *
>>>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>>>> messages
>>>>>>> >> >> will
>>>>>>> >> >> be
>>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>>> connection.
>>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>>> end-to-end
>>>>>>> >> >> + * acknowledgements for that).
>>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>>> connection.
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>> >> >> + * @param msg Message to send
>>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>>> >> >> + *
>>>>>>> >> >> + * @retval 0 on success
>>>>>>> >> >> + * @retval <0 on error
>>>>>>> >> >> + */
>>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>> >> >
>>>>>>> >> > This would be used to send a message to an address, but normal
>>>>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>>>>> application
>>>>>>> >> > (ODP instance).
>>>>>>> >> >
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Get address of sender (source) of message
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param msg Message handle
>>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>>> >> >> + */
>>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Message data pointer
>>>>>>> >> >> + *
>>>>>>> >> >> + * Return a pointer to the message data
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param msg Message handle
>>>>>>> >> >> + *
>>>>>>> >> >> + * @return Pointer to the message data
>>>>>>> >> >> + */
>>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Message data length
>>>>>>> >> >> + *
>>>>>>> >> >> + * Return length of the message data.
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param msg Message handle
>>>>>>> >> >> + *
>>>>>>> >> >> + * @return Message length
>>>>>>> >> >> + */
>>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>>> >> >> +
>>>>>>> >> >> +/**
>>>>>>> >> >> + * Set message length
>>>>>>> >> >> + *
>>>>>>> >> >> + * Set length of the message data.
>>>>>>> >> >> + *
>>>>>>> >> >> + * @param msg Message handle
>>>>>>> >> >> + * @param len New length
>>>>>>> >> >> + *
>>>>>>> >> >> + * @retval 0 on success
>>>>>>> >> >> + * @retval <0 on error
>>>>>>> >> >> + */
>>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>>> >> >
>>>>>>> >> > When data ptr or data len is modified: push/pull head,
>>>>>>> push/pull tail
>>>>>>> >> > would be analogies from packet API
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > -Petri
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> _______________________________________________
>>>>>>> >> lng-odp mailing list
>>>>>>> >> lng-odp@lists.linaro.org
>>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-odp@lists.linaro.org
>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>>
>>
>

Ola Liljedahl May 22, 2015, 9:10 a.m. UTC | #22

On 22 May 2015 at 08:14, Alexandru Badicioiu <alexandru.badicioiu@linaro.org
> wrote:

>
>
> On 22 May 2015 at 00:09, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>
>> On 21 May 2015 at 17:45, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>>
>>> From the rfc 3549 netlink looks like good protocol to communicate
>>> between data plane and control plane. And messages are defined by that
>>> protocol also. At least we should do something the same.
>>>
>> Netlink seems limited to the specific functionality already present in
>> the Linux kernel. An ODP IPC/message passing mechanism must be extensible
>> and support user-defined messages. There's no reason for ODP MBUS to impose
>> any message format.
>>
> Netlink is extensively implemented in Linux kernel but the RFC explicitly
> doesn't limit it to this scope.
> Netlink messages have a  header , defined by Netlink protocol and a
> payload which contains user-defined messages in TLV format (e.g - RTM_XXX
> messages for routing control). Doesn't TLV format suffice for the need of
> ODP applications?
>
Why should we impose any message format on ODP applications?
An ODP MBUS implementation could perhaps use Netlink as the mechanism to
connect to other endpoints and transfer messages in both directions. By not
specifying irrelevant details in the MBUS API, we give more freedom to
implementations. I doubt Netlink will always be available or will be the
best choice on all platforms where people are trying to implement ODP.

Since the ODP implementation will control the definition of the message
event type, it can reserve memory for necessary (implementation specific)
headers preceding the user-defined payload.


>
>> Any (set of) applications can model their message formats on Netlink.
>>
>> I don't understand how Netlink can be used to communicate between (any
>> two) two applications. Please enlighten me.
>>
> Netlink is not limited to user-kernel communication, only some of the
> current services like RTM_XXX for routing configuration. For example ,
> Generic Netlink allows users in both kernel and userspace -
> https://lwn.net/Articles/208755/:
>
> "When looking at figure #1 it is important to note that any Generic Netlink
> user can communicate with any other user over the bus using the same API
> regardless of where the user resides in relation to the kernel/userspace
> boundary."
>
> Another claim but no description of or examples hon ow this is actually
accomplished.
All the examples in this articles are from the kernel perspective. Not very
useful for a user-to-user messaging mechanism.


>
>> -- Ola
>>
>>
>>>
>>>
>>> Maxim.
>>>
>>> On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>>
>>>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>
>>>>> I got the impression that ODP MBUS API would define a transport
>>>>> protocol/API between an ODP
>>>>>
>>>> No the MBUS API is just an API for message passing (think of the OSE
>>>> IPC API) and doesn't specify use cases or content. Just like the ODP packet
>>>> API doesn't specify what the content in a packet means or the format of the
>>>> content.
>>>>
>>>>
>>>>> application and a control plane application, like TCP is the transport
>>>>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>>>>> transport protocol for configuration messages.
>>>>> Maxim asked about the messages - should applications define the
>>>>> message format and/or the message content? Wouldn't be an easier task for
>>>>> the application to define only the content and let ODP to define a format?
>>>>>
>>>> How can you define a format when you don't know what the messages are
>>>> used for and what data needs to be transferred? Why should the MBUS API or
>>>> implementations care about the message format? It's just payload and none
>>>> of their business.
>>>>
>>>> If you want to, you can specify formats for specific purposes, e.g.
>>>> reuse Netlink formats for the functions that Netlink supports. Some ODP
>>>> applications may use this, other not (because they use some other protocol
>>>> or they implement some other functionality).
>>>>
>>>>
>>>>
>>>>> Reliability could be an issue but Netlink spec says how applications
>>>>> can create reliable protocols:
>>>>>
>>>>>
>>>>> One could create a reliable protocol between an FEC and a CPC by
>>>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>>>    timers are provided by Linux.
>>>>>
>>>>> And you could do the same in ODP but I prefer not to, this adds a
>>>> level of complexity to the application code I do not want. Perhaps the
>>>> actual MBUS implementation has to do this but then hidden from the
>>>> applications. Just like TCP reliability and ordering etc is hidden from the
>>>> applications that just do read and write.
>>>>
>>>>    One could create a heartbeat protocol between the FEC and CPC by
>>>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> wrote:
>>>>>
>>>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>
>>>>>>> I was referring to the  Netlink protocol in itself, as a model for
>>>>>>> ODP MBUS (or IPC).
>>>>>>>
>>>>>> Isn't the Netlink protocol what the endpoints send between them? This
>>>>>> is not specified by the ODP IPC/MBUS API, applications can define or re-use
>>>>>> whatever protocol they like. The protocol definition is heavily dependent
>>>>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>>>>> use some specific predefined protocol.
>>>>>>
>>>>>> Also the "wire protocol" is left undefined, this is up to the
>>>>>> implementation to define and each platform can have its own definition.
>>>>>>
>>>>>> And netlink isn't even reliable. I know that that creates problems,
>>>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>>>> table.
>>>>>>
>>>>>>
>>>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>>>    some privileged service that is able to copy between multiple
>>>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>>>    provides an unreliable communication.
>>>>>>>
>>>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>>>    other than to say that it is available. Throughout this document, we
>>>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>>>    restrict the two components to these protection domains or to the
>>>>>>>    same compute node.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>>> > Hi,
>>>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>>>>> the purpose
>>>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>>>> contended and imbued with different meanings).
>>>>>>>>
>>>>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>>>>> (application-to-application).
>>>>>>>>
>>>>>>>> I see a couple of primary requirements:
>>>>>>>>
>>>>>>>>    - Support communication (message exchange) between user space
>>>>>>>>    processes.
>>>>>>>>    - Support arbitrary used-defined messages.
>>>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>>>
>>>>>>>>
>>>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>>>> different things.
>>>>>>>>
>>>>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>>>>> associated with a process to go away when the process goes away. The
>>>>>>>> message queues are not independent entities.
>>>>>>>>
>>>>>>>> -- Ola
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Alex
>>>>>>>> >
>>>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >> -----Original Message-----
>>>>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>>>>> Behalf Of
>>>>>>>> >> >> ext
>>>>>>>> >> >> Ola Liljedahl
>>>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>>>> >> >>
>>>>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>>>>> IPC -
>>>>>>>> >> >> inter
>>>>>>>> >> >> process communication in a shared nothing architecture
>>>>>>>> (message passing
>>>>>>>> >> >> between processes which do not share memory).
>>>>>>>> >> >>
>>>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>>>> possible to
>>>>>>>> >> >> break out some message/event related definitions (everything
>>>>>>>> from
>>>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would
>>>>>>>> mimic the
>>>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>>>> >> >>
>>>>>>>> >> >> The semantics of message passing is that sending a message to
>>>>>>>> an
>>>>>>>> >> >> endpoint
>>>>>>>> >> >> will always look like it succeeds. The appearance of
>>>>>>>> endpoints is
>>>>>>>> >> >> explicitly
>>>>>>>> >> >> notified through user-defined messages specified in the
>>>>>>>> >> >> odp_ipc_resolve()
>>>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise
>>>>>>>> lost
>>>>>>>> >> >> connection)
>>>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>>>> specified in
>>>>>>>> >> >> the
>>>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because
>>>>>>>> the
>>>>>>>> >> >> addressed
>>>>>>>> >> >> endpoints has disappeared.
>>>>>>>> >> >>
>>>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>>>> order. If
>>>>>>>> >> >> message
>>>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>>>>> also been
>>>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>>>> processing by the
>>>>>>>> >> >
>>>>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>>>>> >> > delivered" means in practice loss less delivery (== re-tries
>>>>>>>> and
>>>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should
>>>>>>>> be an
>>>>>>>> >> > configuration option.
>>>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>>>> >> implementation chooses to use some unreliable media, then it
>>>>>>>> will need
>>>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>>>> detected
>>>>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>>>>> >> disconnection (so that no more messages will be delivered should
>>>>>>>> one
>>>>>>>> >> go missing).
>>>>>>>> >>
>>>>>>>> >> I am OK with adding the lossless/lossy configuration to the API
>>>>>>>> as
>>>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>>>> configuration
>>>>>>>> >> when creating the local  IPC endpoint or when sending a message
>>>>>>>> to
>>>>>>>> >> another endpoint?
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >> > Also what "delivered" means?'
>>>>>>>> >> >
>>>>>>>> >> > Message:
>>>>>>>> >> >  - transmitted successfully over the link ?
>>>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>>>> >> >  - delivered into application input queue ?
>>>>>>>> >> Probably this one but I am not sure the exact definition
>>>>>>>> matters, "has
>>>>>>>> >> been delivered" or "will eventually be delivered unless
>>>>>>>> connection to
>>>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>>>> >> "delivered?
>>>>>>>> >>
>>>>>>>> >> "Made available into the destination (recipient) address space"?
>>>>>>>> >>
>>>>>>>> >> >  - has been dequeued from application queue ?
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >> recipient. End-to-end acknowledgements (using messages)
>>>>>>>> should be used
>>>>>>>> >> >> if
>>>>>>>> >> >> this guarantee is important to the user.
>>>>>>>> >> >>
>>>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>>>>> reliable
>>>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>>>> which is
>>>>>>>> >> >> only
>>>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>>>>> >> >> destroyed
>>>>>>>> >> >> and then recreated (with the same name), the new endpoint
>>>>>>>> will have a
>>>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>>>> recycled
>>>>>>>> >> >> but
>>>>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>>>>> have to
>>>>>>>> >> >> be
>>>>>>>> >> >> unique.
>>>>>>>> >> >
>>>>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>>>>> VMs under
>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>> >> Currently, the scope of the name and address space is defined by
>>>>>>>> the
>>>>>>>> >> implementation. Perhaps we should define it? My current interest
>>>>>>>> is
>>>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>>>> different
>>>>>>>> >> OS instances, I expect something based on IP to be used (because
>>>>>>>> you
>>>>>>>> >> don't know where those different OS/VM instances will be
>>>>>>>> deployed so
>>>>>>>> >> you need topology-independent addressing).
>>>>>>>> >>
>>>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>>>> "IPC"
>>>>>>>> >> and now call it "message bus" (MBUS).
>>>>>>>> >>
>>>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an
>>>>>>>> OS-internal
>>>>>>>> >> reliable multidrop network"...
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >>
>>>>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>> >> >> ---
>>>>>>>> >> >> (This document/code contribution attached is provided under
>>>>>>>> the terms
>>>>>>>> >> >> of
>>>>>>>> >> >> agreement LES-LTM-21309)
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Create IPC endpoint
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @return IPC handle on success
>>>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>>>> >> >> + */
>>>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>>>>> >> >
>>>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param ipc   IPC handle
>>>>>>>> >> >> + * @param queue Queue handle
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @retval  0 on success
>>>>>>>> >> >> + * @retval <0 on failure
>>>>>>>> >> >> + */
>>>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>>>> >> >
>>>>>>>> >> > Multiple input queues are likely needed for different priority
>>>>>>>> messages.
>>>>>>>> >> >
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Resolve endpoint by name
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>>>> >> >> + * When the endpoint exists, return the specified message
>>>>>>>> with the
>>>>>>>> >> >> endpoint
>>>>>>>> >> >> + * as the sender.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>> >> >> + * @param name Name to resolve
>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>> >> >> + */
>>>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>>>> >> >> +                  const char *name,
>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>> >> >
>>>>>>>> >> > How widely these names are visible? Inside one VM, multiple
>>>>>>>> VMs under
>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>> >> >
>>>>>>>> >> > I think name service (or address resolution) are better
>>>>>>>> handled in
>>>>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>>>>> passing
>>>>>>>> >> > mechanism, additional services can be built on top.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Monitor endpoint
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>>>> with the
>>>>>>>> >> >> endpoint
>>>>>>>> >> >> + * as the sender.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>>>>> dead
>>>>>>>> >> >> endpoints.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>> >> >> + */
>>>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>> >> >
>>>>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>>>>> services.
>>>>>>>> >> >
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Send message
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>>>>> messages
>>>>>>>> >> >> will
>>>>>>>> >> >> be
>>>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>>>> connection.
>>>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>>>> end-to-end
>>>>>>>> >> >> + * acknowledgements for that).
>>>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>>>> connection.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>> >> >> + * @param msg Message to send
>>>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>> >> >> + */
>>>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>> >> >
>>>>>>>> >> > This would be used to send a message to an address, but normal
>>>>>>>> >> > odp_queue_enq() could be used to circulate this event inside
>>>>>>>> an application
>>>>>>>> >> > (ODP instance).
>>>>>>>> >> >
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Get address of sender (source) of message
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>>>> >> >> + */
>>>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Message data pointer
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Return a pointer to the message data
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @return Pointer to the message data
>>>>>>>> >> >> + */
>>>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Message data length
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Return length of the message data.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @return Message length
>>>>>>>> >> >> + */
>>>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>>>> >> >> +
>>>>>>>> >> >> +/**
>>>>>>>> >> >> + * Set message length
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * Set length of the message data.
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>> >> >> + * @param len New length
>>>>>>>> >> >> + *
>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>> >> >> + */
>>>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>>>> >> >
>>>>>>>> >> > When data ptr or data len is modified: push/pull head,
>>>>>>>> push/pull tail
>>>>>>>> >> > would be analogies from packet API
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > -Petri
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> _______________________________________________
>>>>>>>> >> lng-odp mailing list
>>>>>>>> >> lng-odp@lists.linaro.org
>>>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>
>>
>

Alexandru Badicioiu May 22, 2015, 10:13 a.m. UTC | #23

On 22 May 2015 at 12:10, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:

> On 22 May 2015 at 08:14, Alexandru Badicioiu <
> alexandru.badicioiu@linaro.org> wrote:
>
>>
>>
>> On 22 May 2015 at 00:09, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>
>>> On 21 May 2015 at 17:45, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>>>
>>>> From the rfc 3549 netlink looks like good protocol to communicate
>>>> between data plane and control plane. And messages are defined by that
>>>> protocol also. At least we should do something the same.
>>>>
>>> Netlink seems limited to the specific functionality already present in
>>> the Linux kernel. An ODP IPC/message passing mechanism must be extensible
>>> and support user-defined messages. There's no reason for ODP MBUS to impose
>>> any message format.
>>>
>> Netlink is extensively implemented in Linux kernel but the RFC explicitly
>> doesn't limit it to this scope.
>> Netlink messages have a  header , defined by Netlink protocol and a
>> payload which contains user-defined messages in TLV format (e.g - RTM_XXX
>> messages for routing control). Doesn't TLV format suffice for the need of
>> ODP applications?
>>
> Why should we impose any message format on ODP applications?
>
A message format , in this case TLV, seems to be adequate for the purpose
of dataplane - control plane communication. I see it more like an useful
thing rather than a constraint. Isn't dataplane-control plane communication
the purpose of ODP MBUS?  Or is more general?

> An ODP MBUS implementation could perhaps use Netlink as the mechanism to
> connect to other endpoints and transfer messages in both directions. By not
> specifying irrelevant details in the MBUS API, we give more freedom to
> implementations. I doubt Netlink will always be available or will be the
> best choice on all platforms where people are trying to implement ODP.
>
You see Linux Netlink as a possible implementation for ODP MBUS, I see
Netlink as the protocol for ODP MBUS. ODP implementation must provide the
Netlink protocol, applications will use the MBUS API to build and send
messages (libnl is an example). Particular implementations can use Linux
kernel Netlink , others can do a complete userspace implementation even
with HW acceleration (DMA copy for example).

>
> Since the ODP implementation will control the definition of the message
> event type, it can reserve memory for necessary (implementation specific)
> headers preceding the user-defined payload.
>
>
>>
>>> Any (set of) applications can model their message formats on Netlink.
>>>
>>> I don't understand how Netlink can be used to communicate between (any
>>> two) two applications. Please enlighten me.
>>>
>> Netlink is not limited to user-kernel communication, only some of the
>> current services like RTM_XXX for routing configuration. For example ,
>> Generic Netlink allows users in both kernel and userspace -
>> https://lwn.net/Articles/208755/:
>>
>> "When looking at figure #1 it is important to note that any Generic Netlink
>> user can communicate with any other user over the bus using the same API
>> regardless of where the user resides in relation to the kernel/userspace
>> boundary."
>>
>> Another claim but no description of or examples hon ow this is actually
> accomplished.
> All the examples in this articles are from the kernel perspective. Not
> very useful for a user-to-user messaging mechanism.
>
This is accomplished by the means of socket communication. Netlink protocol
works over sockets like any other protocol using sockets (UDP/TCP).
AF_NETLINK address has a pid member which identifies the destination
process (http://man7.org/linux/man-pages/man7/netlink.7.html - Address
formats paragraph).

>
>
>>
>>> -- Ola
>>>
>>>
>>>>
>>>>
>>>> Maxim.
>>>>
>>>> On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>> wrote:
>>>>
>>>>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>
>>>>>> I got the impression that ODP MBUS API would define a transport
>>>>>> protocol/API between an ODP
>>>>>>
>>>>> No the MBUS API is just an API for message passing (think of the OSE
>>>>> IPC API) and doesn't specify use cases or content. Just like the ODP packet
>>>>> API doesn't specify what the content in a packet means or the format of the
>>>>> content.
>>>>>
>>>>>
>>>>>> application and a control plane application, like TCP is the
>>>>>> transport protocol for HTTP applications (e.g Web). Netlink defines exactly
>>>>>> that - transport protocol for configuration messages.
>>>>>> Maxim asked about the messages - should applications define the
>>>>>> message format and/or the message content? Wouldn't be an easier task for
>>>>>> the application to define only the content and let ODP to define a format?
>>>>>>
>>>>> How can you define a format when you don't know what the messages are
>>>>> used for and what data needs to be transferred? Why should the MBUS API or
>>>>> implementations care about the message format? It's just payload and none
>>>>> of their business.
>>>>>
>>>>> If you want to, you can specify formats for specific purposes, e.g.
>>>>> reuse Netlink formats for the functions that Netlink supports. Some ODP
>>>>> applications may use this, other not (because they use some other protocol
>>>>> or they implement some other functionality).
>>>>>
>>>>>
>>>>>
>>>>>> Reliability could be an issue but Netlink spec says how applications
>>>>>> can create reliable protocols:
>>>>>>
>>>>>>
>>>>>> One could create a reliable protocol between an FEC and a CPC by
>>>>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>>>>    timers are provided by Linux.
>>>>>>
>>>>>> And you could do the same in ODP but I prefer not to, this adds a
>>>>> level of complexity to the application code I do not want. Perhaps the
>>>>> actual MBUS implementation has to do this but then hidden from the
>>>>> applications. Just like TCP reliability and ordering etc is hidden from the
>>>>> applications that just do read and write.
>>>>>
>>>>>    One could create a heartbeat protocol between the FEC and CPC by
>>>>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>> wrote:
>>>>>>
>>>>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>>
>>>>>>>> I was referring to the  Netlink protocol in itself, as a model for
>>>>>>>> ODP MBUS (or IPC).
>>>>>>>>
>>>>>>> Isn't the Netlink protocol what the endpoints send between them?
>>>>>>> This is not specified by the ODP IPC/MBUS API, applications can define or
>>>>>>> re-use whatever protocol they like. The protocol definition is heavily
>>>>>>> dependent on what you actually use the IPC for and we shouldn't force ODP
>>>>>>> users to use some specific predefined protocol.
>>>>>>>
>>>>>>> Also the "wire protocol" is left undefined, this is up to the
>>>>>>> implementation to define and each platform can have its own definition.
>>>>>>>
>>>>>>> And netlink isn't even reliable. I know that that creates problems,
>>>>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>>>>> table.
>>>>>>>
>>>>>>>
>>>>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>>>>    some privileged service that is able to copy between multiple
>>>>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>>>>    provides an unreliable communication.
>>>>>>>>
>>>>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>>>>    other than to say that it is available. Throughout this document, we
>>>>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>>>>    restrict the two components to these protection domains or to the
>>>>>>>>    same compute node.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549)
>>>>>>>>> fit the purpose
>>>>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>>>>> contended and imbued with different meanings).
>>>>>>>>>
>>>>>>>>> It is perhaps possible. Netlink seems a bit focused on
>>>>>>>>> intra-kernel and kernel-to-user while the ODP IPC-MBUS is focused on
>>>>>>>>> user-to-user (application-to-application).
>>>>>>>>>
>>>>>>>>> I see a couple of primary requirements:
>>>>>>>>>
>>>>>>>>>    - Support communication (message exchange) between user space
>>>>>>>>>    processes.
>>>>>>>>>    - Support arbitrary used-defined messages.
>>>>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>>>>> different things.
>>>>>>>>>
>>>>>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>>>>>> associated with a process to go away when the process goes away. The
>>>>>>>>> message queues are not independent entities.
>>>>>>>>>
>>>>>>>>> -- Ola
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Alex
>>>>>>>>> >
>>>>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >> -----Original Message-----
>>>>>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>>>>>> Behalf Of
>>>>>>>>> >> >> ext
>>>>>>>>> >> >> Ola Liljedahl
>>>>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>>>>> >> >>
>>>>>>>>> >> >> As promised, here is my first attempt at a standalone API
>>>>>>>>> for IPC -
>>>>>>>>> >> >> inter
>>>>>>>>> >> >> process communication in a shared nothing architecture
>>>>>>>>> (message passing
>>>>>>>>> >> >> between processes which do not share memory).
>>>>>>>>> >> >>
>>>>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>>>>> possible to
>>>>>>>>> >> >> break out some message/event related definitions (everything
>>>>>>>>> from
>>>>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would
>>>>>>>>> mimic the
>>>>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>>>>> >> >>
>>>>>>>>> >> >> The semantics of message passing is that sending a message
>>>>>>>>> to an
>>>>>>>>> >> >> endpoint
>>>>>>>>> >> >> will always look like it succeeds. The appearance of
>>>>>>>>> endpoints is
>>>>>>>>> >> >> explicitly
>>>>>>>>> >> >> notified through user-defined messages specified in the
>>>>>>>>> >> >> odp_ipc_resolve()
>>>>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise
>>>>>>>>> lost
>>>>>>>>> >> >> connection)
>>>>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>>>>> specified in
>>>>>>>>> >> >> the
>>>>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because
>>>>>>>>> the
>>>>>>>>> >> >> addressed
>>>>>>>>> >> >> endpoints has disappeared.
>>>>>>>>> >> >>
>>>>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>>>>> order. If
>>>>>>>>> >> >> message
>>>>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N
>>>>>>>>> have also been
>>>>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>>>>> processing by the
>>>>>>>>> >> >
>>>>>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>>>>>> >> > delivered" means in practice loss less delivery (== re-tries
>>>>>>>>> and
>>>>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should
>>>>>>>>> be an
>>>>>>>>> >> > configuration option.
>>>>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>>>>> >> implementation chooses to use some unreliable media, then it
>>>>>>>>> will need
>>>>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>>>>> detected
>>>>>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>>>>>> >> disconnection (so that no more messages will be delivered
>>>>>>>>> should one
>>>>>>>>> >> go missing).
>>>>>>>>> >>
>>>>>>>>> >> I am OK with adding the lossless/lossy configuration to the API
>>>>>>>>> as
>>>>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>>>>> configuration
>>>>>>>>> >> when creating the local  IPC endpoint or when sending a message
>>>>>>>>> to
>>>>>>>>> >> another endpoint?
>>>>>>>>> >>
>>>>>>>>> >> >
>>>>>>>>> >> > Also what "delivered" means?'
>>>>>>>>> >> >
>>>>>>>>> >> > Message:
>>>>>>>>> >> >  - transmitted successfully over the link ?
>>>>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>>>>> >> >  - delivered into application input queue ?
>>>>>>>>> >> Probably this one but I am not sure the exact definition
>>>>>>>>> matters, "has
>>>>>>>>> >> been delivered" or "will eventually be delivered unless
>>>>>>>>> connection to
>>>>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>>>>> >> "delivered?
>>>>>>>>> >>
>>>>>>>>> >> "Made available into the destination (recipient) address space"?
>>>>>>>>> >>
>>>>>>>>> >> >  - has been dequeued from application queue ?
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >> recipient. End-to-end acknowledgements (using messages)
>>>>>>>>> should be used
>>>>>>>>> >> >> if
>>>>>>>>> >> >> this guarantee is important to the user.
>>>>>>>>> >> >>
>>>>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an
>>>>>>>>> internal reliable
>>>>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>>>>> which is
>>>>>>>>> >> >> only
>>>>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint
>>>>>>>>> is
>>>>>>>>> >> >> destroyed
>>>>>>>>> >> >> and then recreated (with the same name), the new endpoint
>>>>>>>>> will have a
>>>>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>>>>> recycled
>>>>>>>>> >> >> but
>>>>>>>>> >> >> not for a very long time). Endpoints names do not
>>>>>>>>> necessarily have to
>>>>>>>>> >> >> be
>>>>>>>>> >> >> unique.
>>>>>>>>> >> >
>>>>>>>>> >> > How widely these addresses are unique: inside one VM,
>>>>>>>>> multiple VMs under
>>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>>> >> Currently, the scope of the name and address space is defined
>>>>>>>>> by the
>>>>>>>>> >> implementation. Perhaps we should define it? My current
>>>>>>>>> interest is
>>>>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>>>>> different
>>>>>>>>> >> OS instances, I expect something based on IP to be used
>>>>>>>>> (because you
>>>>>>>>> >> don't know where those different OS/VM instances will be
>>>>>>>>> deployed so
>>>>>>>>> >> you need topology-independent addressing).
>>>>>>>>> >>
>>>>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>>>>> "IPC"
>>>>>>>>> >> and now call it "message bus" (MBUS).
>>>>>>>>> >>
>>>>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an
>>>>>>>>> OS-internal
>>>>>>>>> >> reliable multidrop network"...
>>>>>>>>> >>
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >>
>>>>>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>>> >> >> ---
>>>>>>>>> >> >> (This document/code contribution attached is provided under
>>>>>>>>> the terms
>>>>>>>>> >> >> of
>>>>>>>>> >> >> agreement LES-LTM-21309)
>>>>>>>>> >> >>
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Create IPC endpoint
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @return IPC handle on success
>>>>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>>>>>> >> >
>>>>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param ipc   IPC handle
>>>>>>>>> >> >> + * @param queue Queue handle
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @retval  0 on success
>>>>>>>>> >> >> + * @retval <0 on failure
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>>>>> >> >
>>>>>>>>> >> > Multiple input queues are likely needed for different
>>>>>>>>> priority messages.
>>>>>>>>> >> >
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Resolve endpoint by name
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>>>>> >> >> + * When the endpoint exists, return the specified message
>>>>>>>>> with the
>>>>>>>>> >> >> endpoint
>>>>>>>>> >> >> + * as the sender.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>> >> >> + * @param name Name to resolve
>>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>>>>> >> >> +                  const char *name,
>>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>>> >> >
>>>>>>>>> >> > How widely these names are visible? Inside one VM, multiple
>>>>>>>>> VMs under
>>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>>> >> >
>>>>>>>>> >> > I think name service (or address resolution) are better
>>>>>>>>> handled in
>>>>>>>>> >> > middleware layer. If ODP provides unique addresses and
>>>>>>>>> message passing
>>>>>>>>> >> > mechanism, additional services can be built on top.
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Monitor endpoint
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>>>>> with the
>>>>>>>>> >> >> endpoint
>>>>>>>>> >> >> + * as the sender.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated
>>>>>>>>> as dead
>>>>>>>>> >> >> endpoints.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>>> >> >
>>>>>>>>> >> > Again, I'd see node health monitoring and alarms as
>>>>>>>>> middleware services.
>>>>>>>>> >> >
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Send message
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Send a message to an endpoint (which may already be
>>>>>>>>> dead).
>>>>>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>>>>>> messages
>>>>>>>>> >> >> will
>>>>>>>>> >> >> be
>>>>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>>>>> connection.
>>>>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>>>>> end-to-end
>>>>>>>>> >> >> + * acknowledgements for that).
>>>>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>>>>> connection.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>> >> >> + * @param msg Message to send
>>>>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>>> >> >
>>>>>>>>> >> > This would be used to send a message to an address, but normal
>>>>>>>>> >> > odp_queue_enq() could be used to circulate this event inside
>>>>>>>>> an application
>>>>>>>>> >> > (ODP instance).
>>>>>>>>> >> >
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Get address of sender (source) of message
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Message data pointer
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Return a pointer to the message data
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @return Pointer to the message data
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Message data length
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Return length of the message data.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @return Message length
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>>>>> >> >> +
>>>>>>>>> >> >> +/**
>>>>>>>>> >> >> + * Set message length
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * Set length of the message data.
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>> >> >> + * @param len New length
>>>>>>>>> >> >> + *
>>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>>> >> >> + */
>>>>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>>>>> >> >
>>>>>>>>> >> > When data ptr or data len is modified: push/pull head,
>>>>>>>>> push/pull tail
>>>>>>>>> >> > would be analogies from packet API
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > -Petri
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> _______________________________________________
>>>>>>>>> >> lng-odp mailing list
>>>>>>>>> >> lng-odp@lists.linaro.org
>>>>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lng-odp mailing list
>>>>> lng-odp@lists.linaro.org
>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>>
>>>>
>>>
>>
>

Ola Liljedahl May 22, 2015, 10:47 a.m. UTC | #24

On 22 May 2015 at 12:13, Alexandru Badicioiu <alexandru.badicioiu@linaro.org
> wrote:

>
>
> On 22 May 2015 at 12:10, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>
>> On 22 May 2015 at 08:14, Alexandru Badicioiu <
>> alexandru.badicioiu@linaro.org> wrote:
>>
>>>
>>>
>>> On 22 May 2015 at 00:09, Ola Liljedahl <ola.liljedahl@linaro.org> wrote:
>>>
>>>> On 21 May 2015 at 17:45, Maxim Uvarov <maxim.uvarov@linaro.org> wrote:
>>>>
>>>>> From the rfc 3549 netlink looks like good protocol to communicate
>>>>> between data plane and control plane. And messages are defined by that
>>>>> protocol also. At least we should do something the same.
>>>>>
>>>> Netlink seems limited to the specific functionality already present in
>>>> the Linux kernel. An ODP IPC/message passing mechanism must be extensible
>>>> and support user-defined messages. There's no reason for ODP MBUS to impose
>>>> any message format.
>>>>
>>> Netlink is extensively implemented in Linux kernel but the RFC
>>> explicitly doesn't limit it to this scope.
>>> Netlink messages have a  header , defined by Netlink protocol and a
>>> payload which contains user-defined messages in TLV format (e.g - RTM_XXX
>>> messages for routing control). Doesn't TLV format suffice for the need of
>>> ODP applications?
>>>
>> Why should we impose any message format on ODP applications?
>>
> A message format , in this case TLV, seems to be adequate for the purpose
> of dataplane - control plane communication.
>
Possibly it is adequate for *some* use cases. But for *all* use cases?


> I see it more like an useful thing rather than a constraint.
>
Applications can, if they so choose, use the TLV message format. But why
should this be imposed on applications by ODP MBUS? Why should the MBUS API
or implementation care about the message format of the payload?

Does Linux care about the format of data in your files?


Isn't dataplane-control plane communication the purpose of ODP MBUS?
>
Yes. But this is an open-ended definition. We are not limiting what ODP can
be used for so we have no idea what control and dataplane would mean in
every possible case where ODP is used.


>   Or is more general?
>
>> An ODP MBUS implementation could perhaps use Netlink as the mechanism to
>> connect to other endpoints and transfer messages in both directions. By not
>> specifying irrelevant details in the MBUS API, we give more freedom to
>> implementations. I doubt Netlink will always be available or will be the
>> best choice on all platforms where people are trying to implement ODP.
>>
> You see Linux Netlink as a possible implementation for ODP MBUS, I see
> Netlink as the protocol for ODP MBUS. ODP implementation must provide the
> Netlink protocol, applications will use the MBUS API to build and send
> messages (libnl is an example). Particular implementations can use Linux
> kernel Netlink , others can do a complete userspace implementation even
> with HW acceleration (DMA copy for example).
>
So how do users benefit from forcing all of them to use Netlink message
formats? And how do the implementations benefit?

If you are introducing limitations, there has to be good reasons for them.
I have seen none so far.


>
>> Since the ODP implementation will control the definition of the message
>> event type, it can reserve memory for necessary (implementation specific)
>> headers preceding the user-defined payload.
>>
>>
>>>
>>>> Any (set of) applications can model their message formats on Netlink.
>>>>
>>>> I don't understand how Netlink can be used to communicate between (any
>>>> two) two applications. Please enlighten me.
>>>>
>>> Netlink is not limited to user-kernel communication, only some of the
>>> current services like RTM_XXX for routing configuration. For example ,
>>> Generic Netlink allows users in both kernel and userspace -
>>> https://lwn.net/Articles/208755/:
>>>
>>> "When looking at figure #1 it is important to note that any Generic Netlink
>>> user can communicate with any other user over the bus using the same API
>>> regardless of where the user resides in relation to the kernel/userspace
>>> boundary."
>>>
>>> Another claim but no description of or examples hon ow this is actually
>> accomplished.
>> All the examples in this articles are from the kernel perspective. Not
>> very useful for a user-to-user messaging mechanism.
>>
> This is accomplished by the means of socket communication. Netlink
> protocol works over sockets like any other protocol using sockets
> (UDP/TCP). AF_NETLINK address has a pid member which identifies the
> destination process (http://man7.org/linux/man-pages/man7/netlink.7.html
> - Address formats paragraph).
>
Sockets, my favourite API. Not.


>

>>
>>>
>>>> -- Ola
>>>>
>>>>
>>>>>
>>>>>
>>>>> Maxim.
>>>>>
>>>>> On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>> wrote:
>>>>>
>>>>>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>
>>>>>>> I got the impression that ODP MBUS API would define a transport
>>>>>>> protocol/API between an ODP
>>>>>>>
>>>>>> No the MBUS API is just an API for message passing (think of the OSE
>>>>>> IPC API) and doesn't specify use cases or content. Just like the ODP packet
>>>>>> API doesn't specify what the content in a packet means or the format of the
>>>>>> content.
>>>>>>
>>>>>>
>>>>>>> application and a control plane application, like TCP is the
>>>>>>> transport protocol for HTTP applications (e.g Web). Netlink defines exactly
>>>>>>> that - transport protocol for configuration messages.
>>>>>>> Maxim asked about the messages - should applications define the
>>>>>>> message format and/or the message content? Wouldn't be an easier task for
>>>>>>> the application to define only the content and let ODP to define a format?
>>>>>>>
>>>>>> How can you define a format when you don't know what the messages are
>>>>>> used for and what data needs to be transferred? Why should the MBUS API or
>>>>>> implementations care about the message format? It's just payload and none
>>>>>> of their business.
>>>>>>
>>>>>> If you want to, you can specify formats for specific purposes, e.g.
>>>>>> reuse Netlink formats for the functions that Netlink supports. Some ODP
>>>>>> applications may use this, other not (because they use some other protocol
>>>>>> or they implement some other functionality).
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Reliability could be an issue but Netlink spec says how applications
>>>>>>> can create reliable protocols:
>>>>>>>
>>>>>>>
>>>>>>> One could create a reliable protocol between an FEC and a CPC by
>>>>>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>>>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>>>>>    timers are provided by Linux.
>>>>>>>
>>>>>>> And you could do the same in ODP but I prefer not to, this adds a
>>>>>> level of complexity to the application code I do not want. Perhaps the
>>>>>> actual MBUS implementation has to do this but then hidden from the
>>>>>> applications. Just like TCP reliability and ordering etc is hidden from the
>>>>>> applications that just do read and write.
>>>>>>
>>>>>>    One could create a heartbeat protocol between the FEC and CPC by
>>>>>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>>>
>>>>>>>>> I was referring to the  Netlink protocol in itself, as a model for
>>>>>>>>> ODP MBUS (or IPC).
>>>>>>>>>
>>>>>>>> Isn't the Netlink protocol what the endpoints send between them?
>>>>>>>> This is not specified by the ODP IPC/MBUS API, applications can define or
>>>>>>>> re-use whatever protocol they like. The protocol definition is heavily
>>>>>>>> dependent on what you actually use the IPC for and we shouldn't force ODP
>>>>>>>> users to use some specific predefined protocol.
>>>>>>>>
>>>>>>>> Also the "wire protocol" is left undefined, this is up to the
>>>>>>>> implementation to define and each platform can have its own definition.
>>>>>>>>
>>>>>>>> And netlink isn't even reliable. I know that that creates problems,
>>>>>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>>>>>> table.
>>>>>>>>
>>>>>>>>
>>>>>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>>>>>    some privileged service that is able to copy between multiple
>>>>>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>>>>>    provides an unreliable communication.
>>>>>>>>>
>>>>>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>>>>>    other than to say that it is available. Throughout this document, we
>>>>>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>>>>>    restrict the two components to these protection domains or to the
>>>>>>>>>    same compute node.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>>>>>> alexandru.badicioiu@linaro.org> wrote:
>>>>>>>>>> > Hi,
>>>>>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549)
>>>>>>>>>> fit the purpose
>>>>>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>>>>>> contended and imbued with different meanings).
>>>>>>>>>>
>>>>>>>>>> It is perhaps possible. Netlink seems a bit focused on
>>>>>>>>>> intra-kernel and kernel-to-user while the ODP IPC-MBUS is focused on
>>>>>>>>>> user-to-user (application-to-application).
>>>>>>>>>>
>>>>>>>>>> I see a couple of primary requirements:
>>>>>>>>>>
>>>>>>>>>>    - Support communication (message exchange) between user space
>>>>>>>>>>    processes.
>>>>>>>>>>    - Support arbitrary used-defined messages.
>>>>>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>>>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>>>>>> different things.
>>>>>>>>>>
>>>>>>>>>> My IPC/MBUS reference implementation for linux-generic builds
>>>>>>>>>> upon POSIX message queues. One of my issues is that I want the message
>>>>>>>>>> queue associated with a process to go away when the process goes away. The
>>>>>>>>>> message queues are not independent entities.
>>>>>>>>>>
>>>>>>>>>> -- Ola
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Alex
>>>>>>>>>> >
>>>>>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <
>>>>>>>>>> ola.liljedahl@linaro.org> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>>>>>> >> <petri.savolainen@nokia.com> wrote:
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> -----Original Message-----
>>>>>>>>>> >> >> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On
>>>>>>>>>> Behalf Of
>>>>>>>>>> >> >> ext
>>>>>>>>>> >> >> Ola Liljedahl
>>>>>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> As promised, here is my first attempt at a standalone API
>>>>>>>>>> for IPC -
>>>>>>>>>> >> >> inter
>>>>>>>>>> >> >> process communication in a shared nothing architecture
>>>>>>>>>> (message passing
>>>>>>>>>> >> >> between processes which do not share memory).
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>>>>>> possible to
>>>>>>>>>> >> >> break out some message/event related definitions
>>>>>>>>>> (everything from
>>>>>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would
>>>>>>>>>> mimic the
>>>>>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The semantics of message passing is that sending a message
>>>>>>>>>> to an
>>>>>>>>>> >> >> endpoint
>>>>>>>>>> >> >> will always look like it succeeds. The appearance of
>>>>>>>>>> endpoints is
>>>>>>>>>> >> >> explicitly
>>>>>>>>>> >> >> notified through user-defined messages specified in the
>>>>>>>>>> >> >> odp_ipc_resolve()
>>>>>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise
>>>>>>>>>> lost
>>>>>>>>>> >> >> connection)
>>>>>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>>>>>> specified in
>>>>>>>>>> >> >> the
>>>>>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because
>>>>>>>>>> the
>>>>>>>>>> >> >> addressed
>>>>>>>>>> >> >> endpoints has disappeared.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>>>>>> order. If
>>>>>>>>>> >> >> message
>>>>>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N
>>>>>>>>>> have also been
>>>>>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>>>>>> processing by the
>>>>>>>>>> >> >
>>>>>>>>>> >> > Ordered is OK requirement, but "all messages <N have also
>>>>>>>>>> been
>>>>>>>>>> >> > delivered" means in practice loss less delivery (== re-tries
>>>>>>>>>> and
>>>>>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should
>>>>>>>>>> be an
>>>>>>>>>> >> > configuration option.
>>>>>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>>>>>> >> implementation chooses to use some unreliable media, then it
>>>>>>>>>> will need
>>>>>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>>>>>> detected
>>>>>>>>>> >> using sequence numbers (and timeouts) and handled by
>>>>>>>>>> (temporary)
>>>>>>>>>> >> disconnection (so that no more messages will be delivered
>>>>>>>>>> should one
>>>>>>>>>> >> go missing).
>>>>>>>>>> >>
>>>>>>>>>> >> I am OK with adding the lossless/lossy configuration to the
>>>>>>>>>> API as
>>>>>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>>>>>> configuration
>>>>>>>>>> >> when creating the local  IPC endpoint or when sending a
>>>>>>>>>> message to
>>>>>>>>>> >> another endpoint?
>>>>>>>>>> >>
>>>>>>>>>> >> >
>>>>>>>>>> >> > Also what "delivered" means?'
>>>>>>>>>> >> >
>>>>>>>>>> >> > Message:
>>>>>>>>>> >> >  - transmitted successfully over the link ?
>>>>>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>>>>>> >> >  - delivered into application input queue ?
>>>>>>>>>> >> Probably this one but I am not sure the exact definition
>>>>>>>>>> matters, "has
>>>>>>>>>> >> been delivered" or "will eventually be delivered unless
>>>>>>>>>> connection to
>>>>>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>>>>>> >> "delivered?
>>>>>>>>>> >>
>>>>>>>>>> >> "Made available into the destination (recipient) address
>>>>>>>>>> space"?
>>>>>>>>>> >>
>>>>>>>>>> >> >  - has been dequeued from application queue ?
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> recipient. End-to-end acknowledgements (using messages)
>>>>>>>>>> should be used
>>>>>>>>>> >> >> if
>>>>>>>>>> >> >> this guarantee is important to the user.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an
>>>>>>>>>> internal reliable
>>>>>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>>>>>> which is
>>>>>>>>>> >> >> only
>>>>>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint
>>>>>>>>>> is
>>>>>>>>>> >> >> destroyed
>>>>>>>>>> >> >> and then recreated (with the same name), the new endpoint
>>>>>>>>>> will have a
>>>>>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>>>>>> recycled
>>>>>>>>>> >> >> but
>>>>>>>>>> >> >> not for a very long time). Endpoints names do not
>>>>>>>>>> necessarily have to
>>>>>>>>>> >> >> be
>>>>>>>>>> >> >> unique.
>>>>>>>>>> >> >
>>>>>>>>>> >> > How widely these addresses are unique: inside one VM,
>>>>>>>>>> multiple VMs under
>>>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>>>> >> Currently, the scope of the name and address space is defined
>>>>>>>>>> by the
>>>>>>>>>> >> implementation. Perhaps we should define it? My current
>>>>>>>>>> interest is
>>>>>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>>>>>> different
>>>>>>>>>> >> OS instances, I expect something based on IP to be used
>>>>>>>>>> (because you
>>>>>>>>>> >> don't know where those different OS/VM instances will be
>>>>>>>>>> deployed so
>>>>>>>>>> >> you need topology-independent addressing).
>>>>>>>>>> >>
>>>>>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>>>>>> "IPC"
>>>>>>>>>> >> and now call it "message bus" (MBUS).
>>>>>>>>>> >>
>>>>>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an
>>>>>>>>>> OS-internal
>>>>>>>>>> >> reliable multidrop network"...
>>>>>>>>>> >>
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
>>>>>>>>>> >> >> ---
>>>>>>>>>> >> >> (This document/code contribution attached is provided under
>>>>>>>>>> the terms
>>>>>>>>>> >> >> of
>>>>>>>>>> >> >> agreement LES-LTM-21309)
>>>>>>>>>> >> >>
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Create IPC endpoint
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @return IPC handle on success
>>>>>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t
>>>>>>>>>> pool);
>>>>>>>>>> >> >
>>>>>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param ipc   IPC handle
>>>>>>>>>> >> >> + * @param queue Queue handle
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @retval  0 on success
>>>>>>>>>> >> >> + * @retval <0 on failure
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>>>>>> >> >
>>>>>>>>>> >> > Multiple input queues are likely needed for different
>>>>>>>>>> priority messages.
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Resolve endpoint by name
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>>>>>> >> >> + * When the endpoint exists, return the specified message
>>>>>>>>>> with the
>>>>>>>>>> >> >> endpoint
>>>>>>>>>> >> >> + * as the sender.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>>> >> >> + * @param name Name to resolve
>>>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>>>>>> >> >> +                  const char *name,
>>>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>>>> >> >
>>>>>>>>>> >> > How widely these names are visible? Inside one VM, multiple
>>>>>>>>>> VMs under
>>>>>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>>>>>> >> >
>>>>>>>>>> >> > I think name service (or address resolution) are better
>>>>>>>>>> handled in
>>>>>>>>>> >> > middleware layer. If ODP provides unique addresses and
>>>>>>>>>> message passing
>>>>>>>>>> >> > mechanism, additional services can be built on top.
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Monitor endpoint
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>>>>>> with the
>>>>>>>>>> >> >> endpoint
>>>>>>>>>> >> >> + * as the sender.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated
>>>>>>>>>> as dead
>>>>>>>>>> >> >> endpoints.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>>>>>> >> >> + * @param msg Message to return
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>>>>>> >> >
>>>>>>>>>> >> > Again, I'd see node health monitoring and alarms as
>>>>>>>>>> middleware services.
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Send message
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Send a message to an endpoint (which may already be
>>>>>>>>>> dead).
>>>>>>>>>> >> >> + * Message delivery is ordered and reliable. All
>>>>>>>>>> (accepted) messages
>>>>>>>>>> >> >> will
>>>>>>>>>> >> >> be
>>>>>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>>>>>> connection.
>>>>>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>>>>>> end-to-end
>>>>>>>>>> >> >> + * acknowledgements for that).
>>>>>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>>>>>> connection.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param ipc IPC handle
>>>>>>>>>> >> >> + * @param msg Message to send
>>>>>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>>>> >> >
>>>>>>>>>> >> > This would be used to send a message to an address, but
>>>>>>>>>> normal
>>>>>>>>>> >> > odp_queue_enq() could be used to circulate this event inside
>>>>>>>>>> an application
>>>>>>>>>> >> > (ODP instance).
>>>>>>>>>> >> >
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Get address of sender (source) of message
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Message data pointer
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Return a pointer to the message data
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @return Pointer to the message data
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Message data length
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Return length of the message data.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @return Message length
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>>>>>> >> >> +
>>>>>>>>>> >> >> +/**
>>>>>>>>>> >> >> + * Set message length
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * Set length of the message data.
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @param msg Message handle
>>>>>>>>>> >> >> + * @param len New length
>>>>>>>>>> >> >> + *
>>>>>>>>>> >> >> + * @retval 0 on success
>>>>>>>>>> >> >> + * @retval <0 on error
>>>>>>>>>> >> >> + */
>>>>>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>>>>>> >> >
>>>>>>>>>> >> > When data ptr or data len is modified: push/pull head,
>>>>>>>>>> push/pull tail
>>>>>>>>>> >> > would be analogies from packet API
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > -Petri
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> _______________________________________________
>>>>>>>>>> >> lng-odp mailing list
>>>>>>>>>> >> lng-odp@lists.linaro.org
>>>>>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> lng-odp@lists.linaro.org
>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Ola Liljedahl May 22, 2015, 11:19 a.m. UTC | #25

On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo) <
petri.savolainen@nokia.com> wrote:

>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of ext
> > Ola Liljedahl
> > Sent: Tuesday, May 19, 2015 1:04 AM
> > To: lng-odp@lists.linaro.org
> > Subject: [lng-odp] [RFC] Add ipc.h
> >
> > As promised, here is my first attempt at a standalone API for IPC - inter
> > process communication in a shared nothing architecture (message passing
> > between processes which do not share memory).
> >
> > Currently all definitions are in the file ipc.h but it is possible to
> > break out some message/event related definitions (everything from
> > odp_ipc_sender) in a separate file message.h. This would mimic the
> > packet_io.h/packet.h separation.
> >
> > The semantics of message passing is that sending a message to an endpoint
> > will always look like it succeeds. The appearance of endpoints is
> > explicitly
> > notified through user-defined messages specified in the odp_ipc_resolve()
> > call. Similarly, the disappearance (e.g. death or otherwise lost
> > connection)
> > is also explicitly notified through user-defined messages specified in
> the
> > odp_ipc_monitor() call. The send call does not fail because the addressed
> > endpoints has disappeared.
> >
> > Messages (from endpoint A to endpoint B) are delivered in order. If
> > message
> > N sent to an endpoint is delivered, then all messages <N have also been
> > delivered. Message delivery does not guarantee actual processing by the
>
> Ordered is OK requirement, but "all messages <N have also been delivered"
> means in practice loss less delivery (== re-tries and retransmission
> windows, etc). Lossy vs loss less link should be an configuration option.
>
> Also what "delivered" means?
>
> Message:
>  - transmitted successfully over the link ?
>  - is now under control of the remote node (post office) ?
>  - delivered into application input queue ?
>  - has been dequeued from application queue ?
>
>
> > recipient. End-to-end acknowledgements (using messages) should be used if
> > this guarantee is important to the user.
> >
> > IPC endpoints can be seen as interfaces (taps) to an internal reliable
> > multidrop network where each endpoint has a unique address which is only
> > valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> > and then recreated (with the same name), the new endpoint will have a
> > new address (eventually endpoints addresses will have to be recycled but
> > not for a very long time). Endpoints names do not necessarily have to be
> > unique.
>
> How widely these addresses are unique: inside one VM, multiple VMs under
> the same host, multiple devices on a LAN (VLAN), ...
>
I have added that the scope is expected to be an OS instance (e.g. VM).

>
>
> >
> > Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> > ---
> > (This document/code contribution attached is provided under the terms of
> > agreement LES-LTM-21309)
> >
>
>
> > +/**
> > + * Create IPC endpoint
> > + *
> > + * @param name Name of local IPC endpoint
> > + * @param pool Pool for incoming messages
> > + *
> > + * @return IPC handle on success
> > + * @retval ODP_IPC_INVALID on failure and errno set
> > + */
> > +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>
> This creates (implicitly) the local end point address.
>
Yes. Does that have to be described?


>
>
> > +
> > +/**
> > + * Set the default input queue for an IPC endpoint
> > + *
> > + * @param ipc   IPC handle
> > + * @param queue Queue handle
> > + *
> > + * @retval  0 on success
> > + * @retval <0 on failure
> > + */
> > +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>
> Multiple input queues are likely needed for different priority messages.
>
I have added priorities (copied from queue.h SCHED priorities) and a
priority parameter to the send() call.

packet_io.h doesn't have any API for associating a list of queues with the
different (packet) priorities so there is no template to follow. I could
invent a new call for doing this on MBUS endpoints.
E.g.
int odp_mbus_inq_set(odp_mbus_t mbus, odp_mbus_prio_t prio, odp_queue_t
queue);
Call once for each priority, I think this is better than having a call
which specifies all queues at once (the number of priorities is
implementation specific).

I now think that the default queue should be specified when the endpoint is
created. Messages could start pouring in immediately and might have to be
enqueued somewhere (in certain implementations, I did not experience this
problem in my prototype so did not think about it).


> > +
> > +/**
> > + * Resolve endpoint by name
> > + *
> > + * Look up an existing or future endpoint by name.
> > + * When the endpoint exists, return the specified message with the
> > endpoint
> > + * as the sender.
> > + *
> > + * @param ipc IPC handle
> > + * @param name Name to resolve
> > + * @param msg Message to return
> > + */
> > +void odp_ipc_resolve(odp_ipc_t ipc,
> > +                  const char *name,
> > +                  odp_ipc_msg_t msg);
>
> How widely these names are visible? Inside one VM, multiple VMs under the
> same host, multiple devices on a LAN (VLAN), ...
>
> I think name service (or address resolution) are better handled in
> middleware layer. If ODP provides unique addresses and message passing
> mechanism, additional services can be built on top.
>
We still need an API for it. How should that API look like and where should
it be declared?
I am suggesting a definition above and that it be located in the ODP
mbus.h. Please suggest an actual alternative.


>
>
> > +
> > +/**
> > + * Monitor endpoint
> > + *
> > + * Monitor an existing (potentially already dead) endpoint.
> > + * When the endpoint is dead, return the specified message with the
> > endpoint
> > + * as the sender.
> > + *
> > + * Unrecognized or invalid endpoint addresses are treated as dead
> > endpoints.
> > + *
> > + * @param ipc IPC handle
> > + * @param addr Address of monitored endpoint
> > + * @param msg Message to return
> > + */
> > +void odp_ipc_monitor(odp_ipc_t ipc,
> > +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
> > +                  odp_ipc_msg_t msg);
>
> Again, I'd see node health monitoring and alarms as middleware services.
>
Same comment as for resolve/lookup.


> > +
> > +/**
> > + * Send message
> > + *
> > + * Send a message to an endpoint (which may already be dead).
> > + * Message delivery is ordered and reliable. All (accepted) messages
> will
> > be
> > + * delivered up to the point of endpoint death or lost connection.
> > + * Actual reception and processing is not guaranteed (use end-to-end
> > + * acknowledgements for that).
> > + * Monitor the remote endpoint to detect death or lost connection.
> > + *
> > + * @param ipc IPC handle
> > + * @param msg Message to send
> > + * @param addr Address of remote endpoint
> > + *
> > + * @retval 0 on success
> > + * @retval <0 on error
> > + */
> > +int odp_ipc_send(odp_ipc_t ipc,
> > +              odp_ipc_msg_t msg,
>
Message priority parameter added.

> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>
> This would be used to send a message to an address, but normal
> odp_queue_enq() could be used to circulate this event inside an application
> (ODP instance).
>
Yes. Messages are events.


>
> > +
> > +/**
> > + * Get address of sender (source) of message
> > + *
> > + * @param msg Message handle
> > + * @param addr Address of sender endpoint
> > + */
> > +void odp_ipc_sender(odp_ipc_msg_t msg,
> > +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
> > +
> > +/**
> > + * Message data pointer
> > + *
> > + * Return a pointer to the message data
> > + *
> > + * @param msg Message handle
> > + *
> > + * @return Pointer to the message data
> > + */
> > +void *odp_ipc_data(odp_ipc_msg_t msg);
> > +
> > +/**
> > + * Message data length
> > + *
> > + * Return length of the message data.
> > + *
> > + * @param msg Message handle
> > + *
> > + * @return Message length
> > + */
> > +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> > +
> > +/**
> > + * Set message length
> > + *
> > + * Set length of the message data.
> > + *
> > + * @param msg Message handle
> > + * @param len New length
> > + *
> > + * @retval 0 on success
> > + * @retval <0 on error
> > + */
> > +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>
Per Maxim's suggestion, I renamed this call to odp_message_length_set()
(there is also an odp_message_length() call which gets the length).


>
> When data ptr or data len is modified: push/pull head, push/pull tail
> would be analogies from packet API
>
Messages are not packets that you add and remove headers from. Or?

If we are going to replicate the whole packet.h API, perhaps we should just
use packets for messages. Indeed this was in my original prototype but then
I wasn't sure this was abstract and implementation independent enough. I do
envision messages to be more like buffers but with a per-buffer size
(length).



>
>
> -Petri
>
>
>

Ola Liljedahl May 22, 2015, 2:13 p.m. UTC | #26

On 22 May 2015 at 15:16, Savolainen, Petri (Nokia - FI/Espoo) <
petri.savolainen@nokia.com> wrote:

>  Hi,
>
>
>
> Instead of message bus (mbus), I’d use terms message and message IO
> (similar to packets and packet IO).
>
That could work as well. But the concepts named and described here actually
matches quite nicely (a subset of) those of kdbus
https://code.google.com/p/d-bus/source/browse/kdbus.txt?name=policy.

Anyway suggestion accepted.



>
>
> odp_msg_t == message event
>
But packet events are called odp_packet_t, not odp_pkt_t, and timeout
events are called odp_timeout_t, not odp_tmo_t. I prefer the long name
odp_message_t. Do you still insist?

 odp_msgio_t == message io port/interface/tap/socket/mailbox/…
>
>
>
> // create msg io port
>
> odp_msgio_t odp_msgio_create(…);
>
>
>
> // msg io port local address
>
> odp_msgio_addr_t odp_msgio_addr(odp_msgio_t msgio);
>
>
>
>
>
> more comments inlined …
>
Your inlined comments are difficult to find. They seem to be more indented
that the text they comment.


>
>
>
>
> *From:* ext Ola Liljedahl [mailto:ola.liljedahl@linaro.org]
> *Sent:* Friday, May 22, 2015 2:20 PM
> *To:* Savolainen, Petri (Nokia - FI/Espoo)
> *Cc:* lng-odp@lists.linaro.org
> *Subject:* Re: [lng-odp] [RFC] Add ipc.h
>
>
>
> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo) <
> petri.savolainen@nokia.com> wrote:
>
>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of ext
> > Ola Liljedahl
> > Sent: Tuesday, May 19, 2015 1:04 AM
> > To: lng-odp@lists.linaro.org
> > Subject: [lng-odp] [RFC] Add ipc.h
> >
> > As promised, here is my first attempt at a standalone API for IPC - inter
> > process communication in a shared nothing architecture (message passing
> > between processes which do not share memory).
> >
> > Currently all definitions are in the file ipc.h but it is possible to
> > break out some message/event related definitions (everything from
> > odp_ipc_sender) in a separate file message.h. This would mimic the
> > packet_io.h/packet.h separation.
> >
> > The semantics of message passing is that sending a message to an endpoint
> > will always look like it succeeds. The appearance of endpoints is
> > explicitly
> > notified through user-defined messages specified in the odp_ipc_resolve()
> > call. Similarly, the disappearance (e.g. death or otherwise lost
> > connection)
> > is also explicitly notified through user-defined messages specified in
> the
> > odp_ipc_monitor() call. The send call does not fail because the addressed
> > endpoints has disappeared.
> >
> > Messages (from endpoint A to endpoint B) are delivered in order. If
> > message
> > N sent to an endpoint is delivered, then all messages <N have also been
> > delivered. Message delivery does not guarantee actual processing by the
>
> Ordered is OK requirement, but "all messages <N have also been delivered"
> means in practice loss less delivery (== re-tries and retransmission
> windows, etc). Lossy vs loss less link should be an configuration option.
>
> Also what "delivered" means?
>
> Message:
>  - transmitted successfully over the link ?
>  - is now under control of the remote node (post office) ?
>  - delivered into application input queue ?
>  - has been dequeued from application queue ?
>
>
> > recipient. End-to-end acknowledgements (using messages) should be used if
> > this guarantee is important to the user.
> >
> > IPC endpoints can be seen as interfaces (taps) to an internal reliable
> > multidrop network where each endpoint has a unique address which is only
> > valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
> > and then recreated (with the same name), the new endpoint will have a
> > new address (eventually endpoints addresses will have to be recycled but
> > not for a very long time). Endpoints names do not necessarily have to be
> > unique.
>
> How widely these addresses are unique: inside one VM, multiple VMs under
> the same host, multiple devices on a LAN (VLAN), ...
>
> I have added that the scope is expected to be an OS instance (e.g. VM).
>
>
>
> OK, it’s likely the scope mostly needed anyway.
>
>
>
> Still need to define if addressing (and protocol) is implementation
> specific or standardized.
>
> Implementation specific for now.

> I think you are suggesting implementation specific, which is fine but need
> to note that it’s suitable only between two ODP instances of the same _
> *implementation*_ version. E.g. messaging between linux-generic and
> odp-dpdk,  or between odp-dpdk-1.1.0.0 and odp-dpdk-1.1.0.1 would not
> necessary work. A protocol spec (with version numbering) would be needed to
> guaranteed intra implementation version communication.
>
>  I don't want to walk into the tar pit of defining a binary
("on-the-wire") protocol. But this can be done later.

Message formats, API and transport are separate things. I am only trying to
define the API here.

The incompatibility issues you describe should be noted.


>
>
> Implementation specific messaging would be sufficient for SW (ODP
> instance) coming from single SW vendor, but integration of SW from multiple
> vendors would need packets or proper “IPC” protocol.
>
> I assume you mean "message format" here. There are so many ways to define
these (Netlink has already been suggested, I suspect some users may prefer
e.g. NETCONF). I think any standardization here is a separate activity. The
question is whether something needs to be handled on the ODP MSGIO level.

The transport protocol (not seen by the applications) is defined by the ODP
implementation and may very will have HW-dependencies (e.g. using HW
buffers/queues/DMA).


>
> >
> > Signed-off-by: Ola Liljedahl <ola.liljedahl@linaro.org>
> > ---
> > (This document/code contribution attached is provided under the terms of
> > agreement LES-LTM-21309)
> >
>
>
> > +/**
> > + * Create IPC endpoint
> > + *
> > + * @param name Name of local IPC endpoint
> > + * @param pool Pool for incoming messages
> > + *
> > + * @return IPC handle on success
> > + * @retval ODP_IPC_INVALID on failure and errno set
> > + */
> > +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>
> This creates (implicitly) the local end point address.
>
>  Yes. Does that have to be described?
>
>
>
> Maybe to highlight that “name” is not the address.
>
OK


>
>
>
>
> > +
> > +/**
> > + * Set the default input queue for an IPC endpoint
> > + *
> > + * @param ipc   IPC handle
> > + * @param queue Queue handle
> > + *
> > + * @retval  0 on success
> > + * @retval <0 on failure
> > + */
> > +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>
> Multiple input queues are likely needed for different priority messages.
>
>  I have added priorities (copied from queue.h SCHED priorities) and a
> priority parameter to the send() call.
>
>
>
> packet_io.h doesn't have any API for associating a list of queues with the
> different (packet) priorities so there is no template to follow. I could
> invent a new call for doing this on MBUS endpoints.
>
> E.g.
>
> int odp_mbus_inq_set(odp_mbus_t mbus, odp_mbus_prio_t prio, odp_queue_t
> queue);
>
> Call once for each priority, I think this is better than having a call
> which specifies all queues at once (the number of priorities is
> implementation specific).
>
>
>
> I now think that the default queue should be specified when the endpoint
> is created. Messages could start pouring in immediately and might have to
> be enqueued somewhere (in certain implementations, I did not experience
> this problem in my prototype so did not think about it).
>
>
>
> I think it’s better to create all input queues (or actually let the
> implementation create queues) before the “port” is activated, so that
> already the very first incoming message goes to the right queue (priority
> level).
>
But then the implementation will need to know the type (polled, scheduled)
etc of the queues. Pass same parameters as to odp_queue_create(). In the
end, odp_mbus_create() will have a million parameters.

Or perhaps there should be "start" and "stop" calls so that endpoint does
not get active immediately. The application can create the endpoint, do
other initialization stuff (create and attach input queues) and then
activate the endpoint (connect it to the message bus). Seems more flexible
to me.


>
> BTW, odp_pktio_inq_xxx() is likely removed and handled through
> classification API (and let implementation return queue handles).
>
OK. Then I should not mimic this API.


>
>
>
>
>
> > +
> > +/**
> > + * Resolve endpoint by name
> > + *
> > + * Look up an existing or future endpoint by name.
> > + * When the endpoint exists, return the specified message with the
> > endpoint
> > + * as the sender.
> > + *
> > + * @param ipc IPC handle
> > + * @param name Name to resolve
> > + * @param msg Message to return
> > + */
> > +void odp_ipc_resolve(odp_ipc_t ipc,
> > +                  const char *name,
> > +                  odp_ipc_msg_t msg);
>
> How widely these names are visible? Inside one VM, multiple VMs under the
> same host, multiple devices on a LAN (VLAN), ...
>
> I think name service (or address resolution) are better handled in
> middleware layer. If ODP provides unique addresses and message passing
> mechanism, additional services can be built on top.
>
>  We still need an API for it. How should that API look like and where
> should it be declared?
>
> I am suggesting a definition above and that it be located in the ODP
> mbus.h. Please suggest an actual alternative.
>
>
>
> It’s not necessary an API call. Application could send a “name resolution
> request” message to the middleware name server, etc. The name server
> address can be delivered to application in many ways (command line, config
> file, init message, etc).
>
My original prototype used special message numbers (but special addresses
could also be used, message numbers are not part of the API) to indicate
resolve/lookup and monitor requests. This can still be done by the
implementation. Why can't we hide this behind a standardized API?

Freedom for the implementation is good but I don't think it is good to
expose the application to all those different potential solutions.




>
>
>
>
>
>
> > +
> > +/**
> > + * Monitor endpoint
> > + *
> > + * Monitor an existing (potentially already dead) endpoint.
> > + * When the endpoint is dead, return the specified message with the
> > endpoint
> > + * as the sender.
> > + *
> > + * Unrecognized or invalid endpoint addresses are treated as dead
> > endpoints.
> > + *
> > + * @param ipc IPC handle
> > + * @param addr Address of monitored endpoint
> > + * @param msg Message to return
> > + */
> > +void odp_ipc_monitor(odp_ipc_t ipc,
> > +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
> > +                  odp_ipc_msg_t msg);
>
> Again, I'd see node health monitoring and alarms as middleware services.
>
>  Same comment as for resolve/lookup.
>
>
>
> Again it could a message interface between application and the middleware
> alarm service, etc.
>
>
>
>
>
>
> > +
> > +/**
> > + * Send message
> > + *
> > + * Send a message to an endpoint (which may already be dead).
> > + * Message delivery is ordered and reliable. All (accepted) messages
> will
> > be
> > + * delivered up to the point of endpoint death or lost connection.
> > + * Actual reception and processing is not guaranteed (use end-to-end
> > + * acknowledgements for that).
> > + * Monitor the remote endpoint to detect death or lost connection.
> > + *
> > + * @param ipc IPC handle
> > + * @param msg Message to send
> > + * @param addr Address of remote endpoint
> > + *
> > + * @retval 0 on success
> > + * @retval <0 on error
> > + */
> > +int odp_ipc_send(odp_ipc_t ipc,
> > +              odp_ipc_msg_t msg,
>
>  Message priority parameter added.
>
>
>
> > +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>
> This would be used to send a message to an address, but normal
> odp_queue_enq() could be used to circulate this event inside an application
> (ODP instance).
>
>  Yes. Messages are events.
>
>
>
>
> > +
> > +/**
> > + * Get address of sender (source) of message
> > + *
> > + * @param msg Message handle
> > + * @param addr Address of sender endpoint
> > + */
> > +void odp_ipc_sender(odp_ipc_msg_t msg,
> > +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
> > +
> > +/**
> > + * Message data pointer
> > + *
> > + * Return a pointer to the message data
> > + *
> > + * @param msg Message handle
> > + *
> > + * @return Pointer to the message data
> > + */
> > +void *odp_ipc_data(odp_ipc_msg_t msg);
> > +
> > +/**
> > + * Message data length
> > + *
> > + * Return length of the message data.
> > + *
> > + * @param msg Message handle
> > + *
> > + * @return Message length
> > + */
> > +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
> > +
> > +/**
> > + * Set message length
> > + *
> > + * Set length of the message data.
> > + *
> > + * @param msg Message handle
> > + * @param len New length
> > + *
> > + * @retval 0 on success
> > + * @retval <0 on error
> > + */
> > +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>
>  Per Maxim's suggestion, I renamed this call to odp_message_length_set()
> (there is also an odp_message_length() call which gets the length).
>
>
>
> New length need to be in limits of the buffer size.
>
Yes that's why the call can fail. But perhaps this limitation needs to be
documented.


> Agree, that buffer size is constant, but it can large (e.g. multiple kB in
> maximum). If message delivery copies data, the actual data length need to
> be define (for low overhead).
>
Which is why we have the odp_message_length() and odp_message_length_set()
calls.


>
>
>
>
> When data ptr or data len is modified: push/pull head, push/pull tail
> would be analogies from packet API
>
>  Messages are not packets that you add and remove headers from. Or?
>
>
>
> Messages have structure and may very well have layers. E.g. middleware
> message header is in front. The data pointer and length are updated when
> message travels between ODP msgio, middleware and application.
>
OK so push/pull the head are needed. Any real need to push/pull the tail?
We need to introduce the concept of headroom as well I guess. More and more
like packets...


>
>
> If we are going to replicate the whole packet.h API, perhaps we should
> just use packets for messages. Indeed this was in my original prototype but
> then I wasn't sure this was abstract and implementation independent enough.
> I do envision messages to be more like buffers but with a per-buffer size
> (length).
>
>
>
> Certainly all protocol flags/features are not needed and preferably no
> segmentation. But dynamic data ptr/len would bring in concepts of head,
> tail, etc.
>
>
>
> -Petri
>
>
>
>
>

Commit Message

Comments

Patch