diff mbox

example: timer: delete races while termination

Message ID 1435917364-22865-1-git-send-email-ivan.khoronzhuk@linaro.org
State New
Headers show

Commit Message

Ivan Khoronzhuk July 3, 2015, 9:56 a.m. UTC
Current implementation has at least two races that lead to several issues:

- gbls->remain can overflow. One thread can decrement remain counter to 0.
While another can decrement it once again and it will be > 0 once again. After
what the thread will loop very long time ...

- Several threads can terminate the same timer and as result the same event.
After out from the main loop the thread terminates the last timer it used.
But the last timer saved in ttp for one thread can be received in another
thread. So after leaving the main loop two threads can hold the same timer.

- Some timer cannot be freed as several threads try to delete the same
timer, as result one of the timer/tmo stay not freed after termination.

- The test can send more events that requested. The receiving of requested
number of tmos doesn't mean the test send the same number. It rather sends more.

This patch is intended to fix above drawbacks.

The termination patch must follow the next things:

- The event can be in the following places: in the timer (waiting to be
scheduled), in the some queue for some thread to be scheduled, received in the
main loop.

- each event "holds" the timer, so when we receive event we can delete timer.

- the thread cannot delete timer w/o event as it doesn't know who is owner of
this event (and obvious the timer).

- the thread shouldn't send events more than requested.

-  all threads have to be "held" in the loop till the last received event.
The scheduler can assign event for any of the threads, so one thread can
receive two last events for example.

According to the above, added several improvements:
- don't send more timeouts that supposed to receive
- free timer and tmo for the last received tmos = num of threads.
- leave the main loop only if the last tmo/timer is free.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 example/timer/odp_timer_test.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

Comments

Ivan Khoronzhuk July 3, 2015, 10 a.m. UTC | #1
On 03.07.15 12:56, Ivan Khoronzhuk wrote:
> Current implementation has at least two races that lead to several issues:
>
> - gbls->remain can overflow. One thread can decrement remain counter to 0.
> While another can decrement it once again and it will be > 0 once again. After
> what the thread will loop very long time ...
>
> - Several threads can terminate the same timer and as result the same event.
> After out from the main loop the thread terminates the last timer it used.
> But the last timer saved in ttp for one thread can be received in another
> thread. So after leaving the main loop two threads can hold the same timer.
>
> - Some timer cannot be freed as several threads try to delete the same
> timer, as result one of the timer/tmo stay not freed after termination.
>
> - The test can send more events that requested. The receiving of requested
> number of tmos doesn't mean the test send the same number. It rather sends more.
>
> This patch is intended to fix above drawbacks.
>
> The termination patch must follow the next things:
>
> - The event can be in the following places: in the timer (waiting to be
> scheduled), in the some queue for some thread to be scheduled, received in the
> main loop.
>
> - each event "holds" the timer, so when we receive event we can delete timer.
>
> - the thread cannot delete timer w/o event as it doesn't know who is owner of
> this event (and obvious the timer).
>
> - the thread shouldn't send events more than requested.
>
> -  all threads have to be "held" in the loop till the last received event.
> The scheduler can assign event for any of the threads, so one thread can
> receive two last events for example.
>
> According to the above, added several improvements:
> - don't send more timeouts that supposed to receive
> - free timer and tmo for the last received tmos = num of threads.
> - leave the main loop only if the last tmo/timer is free.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   example/timer/odp_timer_test.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/example/timer/odp_timer_test.c b/example/timer/odp_timer_test.c
> index 5e4306e..e832e35 100644
> --- a/example/timer/odp_timer_test.c
> +++ b/example/timer/odp_timer_test.c
> @@ -47,6 +47,7 @@ typedef struct {
>   	odp_timer_pool_t tp;		/**< Timer pool handle*/
>   	odp_atomic_u32_t remain;	/**< Number of timeouts to receive*/
>   	struct test_timer tt[256];	/**< Array of all timer helper structs*/
> +	int num_workers;		/**< Number of threads */
>   } test_globals_t;
>
>   /** @private Timer set status ASCII strings */
> @@ -139,16 +140,18 @@ static void test_abs_timeouts(int thr, test_globals_t *gbls)
>   	ttp->ev = odp_timeout_to_event(tmo);
>   	tick = odp_timer_current_tick(gbls->tp);
>
> -	while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
> +	while (1) {
>   		odp_event_t ev;
>   		odp_timer_set_t rc;
>
> -		tick += period;
> -		rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> -		if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> -			/* Too early or too late timeout requested */
> -			EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
> -				      timerset2str(rc));
> +		if (ttp) {
> +			tick += period;
> +			rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> +			if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> +				/* Too early or too late timeout requested */
> +				EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
> +					      timerset2str(rc));
> +			}
>   		}
>
>   		/* Get the next expired timeout.
> @@ -185,18 +188,17 @@ static void test_abs_timeouts(int thr, test_globals_t *gbls)
>   		}
>   		EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n", thr, tick);
>
> -		odp_atomic_dec_u32(&gbls->remain);
> -	}
> +		int rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);

Just forget
int -> uint32_t

> +		if (!rx_num)
> +			EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick %"PRIu64")\n",
> +				      ttp->tim, tick);
> +		else if (rx_num > gbls->num_workers)
> +			continue;
>
> -	/* Cancel and free last timer used */
> -	(void)odp_timer_cancel(ttp->tim, &ttp->ev);
> -	if (ttp->ev != ODP_EVENT_INVALID)
>   		odp_timeout_free(odp_timeout_from_event(ttp->ev));
> -	else
> -		EXAMPLE_ERR("Lost timeout event at timer cancel\n");
> -	/* Since we have cancelled the timer, there is no timeout event to
> -	 * return from odp_timer_free() */
> -	(void)odp_timer_free(ttp->tim);
> +		odp_timer_free(ttp->tim);
> +		ttp = NULL;
> +	}
>
>   	/* Remove any prescheduled events */
>   	remove_prescheduled_events();
> @@ -483,6 +485,8 @@ int main(int argc, char *argv[])
>
>   	printf("\n");
>
> +	gbls->num_workers = num_workers;
> +
>   	/* Initialize number of timeouts to receive */
>   	odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count * num_workers);
>
>
Ivan Khoronzhuk July 3, 2015, 12:04 p.m. UTC | #2
Just decided to add the ability to indicate that some TMO was lost.
At least it already helped me to figure out the issue in odp_schedule of
keystone implementation.

So see this patch inside new series:
  "[lng-odp] [Patch 0/2] example: timer: fix/improve test"

On 03.07.15 12:56, Ivan Khoronzhuk wrote:
> Current implementation has at least two races that lead to several issues:
>
> - gbls->remain can overflow. One thread can decrement remain counter to 0.
> While another can decrement it once again and it will be > 0 once again. After
> what the thread will loop very long time ...
>
> - Several threads can terminate the same timer and as result the same event.
> After out from the main loop the thread terminates the last timer it used.
> But the last timer saved in ttp for one thread can be received in another
> thread. So after leaving the main loop two threads can hold the same timer.
>
> - Some timer cannot be freed as several threads try to delete the same
> timer, as result one of the timer/tmo stay not freed after termination.
>
> - The test can send more events that requested. The receiving of requested
> number of tmos doesn't mean the test send the same number. It rather sends more.
>
> This patch is intended to fix above drawbacks.
>
> The termination patch must follow the next things:
>
> - The event can be in the following places: in the timer (waiting to be
> scheduled), in the some queue for some thread to be scheduled, received in the
> main loop.
>
> - each event "holds" the timer, so when we receive event we can delete timer.
>
> - the thread cannot delete timer w/o event as it doesn't know who is owner of
> this event (and obvious the timer).
>
> - the thread shouldn't send events more than requested.
>
> -  all threads have to be "held" in the loop till the last received event.
> The scheduler can assign event for any of the threads, so one thread can
> receive two last events for example.
>
> According to the above, added several improvements:
> - don't send more timeouts that supposed to receive
> - free timer and tmo for the last received tmos = num of threads.
> - leave the main loop only if the last tmo/timer is free.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   example/timer/odp_timer_test.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/example/timer/odp_timer_test.c b/example/timer/odp_timer_test.c
> index 5e4306e..e832e35 100644
> --- a/example/timer/odp_timer_test.c
> +++ b/example/timer/odp_timer_test.c
> @@ -47,6 +47,7 @@ typedef struct {
>   	odp_timer_pool_t tp;		/**< Timer pool handle*/
>   	odp_atomic_u32_t remain;	/**< Number of timeouts to receive*/
>   	struct test_timer tt[256];	/**< Array of all timer helper structs*/
> +	int num_workers;		/**< Number of threads */
>   } test_globals_t;
>
>   /** @private Timer set status ASCII strings */
> @@ -139,16 +140,18 @@ static void test_abs_timeouts(int thr, test_globals_t *gbls)
>   	ttp->ev = odp_timeout_to_event(tmo);
>   	tick = odp_timer_current_tick(gbls->tp);
>
> -	while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
> +	while (1) {
>   		odp_event_t ev;
>   		odp_timer_set_t rc;
>
> -		tick += period;
> -		rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> -		if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> -			/* Too early or too late timeout requested */
> -			EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
> -				      timerset2str(rc));
> +		if (ttp) {
> +			tick += period;
> +			rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> +			if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> +				/* Too early or too late timeout requested */
> +				EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
> +					      timerset2str(rc));
> +			}
>   		}
>
>   		/* Get the next expired timeout.
> @@ -185,18 +188,17 @@ static void test_abs_timeouts(int thr, test_globals_t *gbls)
>   		}
>   		EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n", thr, tick);
>
> -		odp_atomic_dec_u32(&gbls->remain);
> -	}
> +		int rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
> +		if (!rx_num)
> +			EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick %"PRIu64")\n",
> +				      ttp->tim, tick);
> +		else if (rx_num > gbls->num_workers)
> +			continue;
>
> -	/* Cancel and free last timer used */
> -	(void)odp_timer_cancel(ttp->tim, &ttp->ev);
> -	if (ttp->ev != ODP_EVENT_INVALID)
>   		odp_timeout_free(odp_timeout_from_event(ttp->ev));
> -	else
> -		EXAMPLE_ERR("Lost timeout event at timer cancel\n");
> -	/* Since we have cancelled the timer, there is no timeout event to
> -	 * return from odp_timer_free() */
> -	(void)odp_timer_free(ttp->tim);
> +		odp_timer_free(ttp->tim);
> +		ttp = NULL;
> +	}
>
>   	/* Remove any prescheduled events */
>   	remove_prescheduled_events();
> @@ -483,6 +485,8 @@ int main(int argc, char *argv[])
>
>   	printf("\n");
>
> +	gbls->num_workers = num_workers;
> +
>   	/* Initialize number of timeouts to receive */
>   	odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count * num_workers);
>
>
diff mbox

Patch

diff --git a/example/timer/odp_timer_test.c b/example/timer/odp_timer_test.c
index 5e4306e..e832e35 100644
--- a/example/timer/odp_timer_test.c
+++ b/example/timer/odp_timer_test.c
@@ -47,6 +47,7 @@  typedef struct {
 	odp_timer_pool_t tp;		/**< Timer pool handle*/
 	odp_atomic_u32_t remain;	/**< Number of timeouts to receive*/
 	struct test_timer tt[256];	/**< Array of all timer helper structs*/
+	int num_workers;		/**< Number of threads */
 } test_globals_t;
 
 /** @private Timer set status ASCII strings */
@@ -139,16 +140,18 @@  static void test_abs_timeouts(int thr, test_globals_t *gbls)
 	ttp->ev = odp_timeout_to_event(tmo);
 	tick = odp_timer_current_tick(gbls->tp);
 
-	while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
+	while (1) {
 		odp_event_t ev;
 		odp_timer_set_t rc;
 
-		tick += period;
-		rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
-		if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
-			/* Too early or too late timeout requested */
-			EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
-				      timerset2str(rc));
+		if (ttp) {
+			tick += period;
+			rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
+			if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
+				/* Too early or too late timeout requested */
+				EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
+					      timerset2str(rc));
+			}
 		}
 
 		/* Get the next expired timeout.
@@ -185,18 +188,17 @@  static void test_abs_timeouts(int thr, test_globals_t *gbls)
 		}
 		EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n", thr, tick);
 
-		odp_atomic_dec_u32(&gbls->remain);
-	}
+		int rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
+		if (!rx_num)
+			EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick %"PRIu64")\n",
+				      ttp->tim, tick);
+		else if (rx_num > gbls->num_workers)
+			continue;
 
-	/* Cancel and free last timer used */
-	(void)odp_timer_cancel(ttp->tim, &ttp->ev);
-	if (ttp->ev != ODP_EVENT_INVALID)
 		odp_timeout_free(odp_timeout_from_event(ttp->ev));
-	else
-		EXAMPLE_ERR("Lost timeout event at timer cancel\n");
-	/* Since we have cancelled the timer, there is no timeout event to
-	 * return from odp_timer_free() */
-	(void)odp_timer_free(ttp->tim);
+		odp_timer_free(ttp->tim);
+		ttp = NULL;
+	}
 
 	/* Remove any prescheduled events */
 	remove_prescheduled_events();
@@ -483,6 +485,8 @@  int main(int argc, char *argv[])
 
 	printf("\n");
 
+	gbls->num_workers = num_workers;
+
 	/* Initialize number of timeouts to receive */
 	odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count * num_workers);