diff mbox

[v2,1/2] example: timer: delete races while termination

Message ID 1435927118-12397-2-git-send-email-ivan.khoronzhuk@linaro.org
State New
Headers show

Commit Message

Ivan Khoronzhuk July 3, 2015, 12:38 p.m. UTC
Current implementation has at least two races that lead to several issues:

- gbls->remain can overflow. One thread can decrement remain counter to 0.
While another can decrement it once again and it will be > 0. After
what some thread will loop very long time ...

- Several threads can terminate the same timer and as result the same event.
After out from the main loop a thread terminates a last timer it used.
But a last timer saved in ttp for a thread can be received in another
thread. So after leaving the main loop two threads can hold the same timer.

- Some timer cannot be freed as several threads try to delete the same
timer, as result one of the timer/tmo stay not freed after termination.

- The test can send more events that requested. The receiving of requested
number of tmos doesn't mean the test sent the same number. It rather sent more.

This patch is intended to fix above drawbacks.

The termination path must follow the next things:

- An event can be in the following places: in a timer (waiting to be
scheduled), in a queue for a thread to be scheduled, received in the
main loop.

- An event "holds" a timer, so when we receive an event we can delete it's
timer.

- a thread cannot delete timer w/o an event as it doesn't know who is owner of
the event (and obvious the timer).

- a thread shouldn't send events more than requested.

-  all threads have to be "held" in the loop till a last received event.
The scheduler can assign event for any of the threads, so one thread can
receive two last events for example.

According to above, added several improvements:
- don't send more timeouts that supposed to receive
- free timer and tmo for a last received tmos = num of threads.
- leave the main loop only if a last tmo/timer is free.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 example/timer/odp_timer_test.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

Comments

Mike Holmes July 23, 2015, 5:39 p.m. UTC | #1
the EXAMPLE_ABORT one might be fine, we usually accept prints that are > 80
if there is no nice way to split them.


Using patch:
lng-odp_Patch_v2_1-2_example_timer_delete_races_while_termination.mbox
  Trying to apply patch
  Patch applied
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars
per line)
#12:
- Several threads can terminate the same timer and as result the same event.

WARNING: Missing a blank line after declarations
#98: FILE: example/timer/odp_timer_test.c:168:
+        uint32_t rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
+        if (!rx_num)

WARNING: line over 80 characters
#99: FILE: example/timer/odp_timer_test.c:169:
+            EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick
%"PRIu64")\n",

CHECK: Concatenated strings should use spaces between elements
#99: FILE: example/timer/odp_timer_test.c:169:
+            EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick
%"PRIu64")\n",

total: 0 errors, 3 warnings, 1 checks, 67 lines checked

NOTE: Ignored message types: DEPRECATED_VARIABLE NEW_TYPEDEFS

0001-example-timer-delete-races-while-termination.patch has style problems,
please review.

On 3 July 2015 at 08:38, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> Current implementation has at least two races that lead to several issues:
>
> - gbls->remain can overflow. One thread can decrement remain counter to 0.
> While another can decrement it once again and it will be > 0. After
> what some thread will loop very long time ...
>
> - Several threads can terminate the same timer and as result the same
> event.
> After out from the main loop a thread terminates a last timer it used.
> But a last timer saved in ttp for a thread can be received in another
> thread. So after leaving the main loop two threads can hold the same timer.
>
> - Some timer cannot be freed as several threads try to delete the same
> timer, as result one of the timer/tmo stay not freed after termination.
>
> - The test can send more events that requested. The receiving of requested
> number of tmos doesn't mean the test sent the same number. It rather sent
> more.
>
> This patch is intended to fix above drawbacks.
>
> The termination path must follow the next things:
>
> - An event can be in the following places: in a timer (waiting to be
> scheduled), in a queue for a thread to be scheduled, received in the
> main loop.
>
> - An event "holds" a timer, so when we receive an event we can delete it's
> timer.
>
> - a thread cannot delete timer w/o an event as it doesn't know who is
> owner of
> the event (and obvious the timer).
>
> - a thread shouldn't send events more than requested.
>
> -  all threads have to be "held" in the loop till a last received event.
> The scheduler can assign event for any of the threads, so one thread can
> receive two last events for example.
>
> According to above, added several improvements:
> - don't send more timeouts that supposed to receive
> - free timer and tmo for a last received tmos = num of threads.
> - leave the main loop only if a last tmo/timer is free.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  example/timer/odp_timer_test.c | 38 +++++++++++++++++++++-----------------
>  1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/example/timer/odp_timer_test.c
> b/example/timer/odp_timer_test.c
> index 5e4306e..584a581 100644
> --- a/example/timer/odp_timer_test.c
> +++ b/example/timer/odp_timer_test.c
> @@ -47,6 +47,7 @@ typedef struct {
>         odp_timer_pool_t tp;            /**< Timer pool handle*/
>         odp_atomic_u32_t remain;        /**< Number of timeouts to
> receive*/
>         struct test_timer tt[256];      /**< Array of all timer helper
> structs*/
> +       uint32_t num_workers;           /**< Number of threads */
>  } test_globals_t;
>
>  /** @private Timer set status ASCII strings */
> @@ -139,16 +140,18 @@ static void test_abs_timeouts(int thr,
> test_globals_t *gbls)
>         ttp->ev = odp_timeout_to_event(tmo);
>         tick = odp_timer_current_tick(gbls->tp);
>
> -       while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
> +       while (1) {
>                 odp_event_t ev;
>                 odp_timer_set_t rc;
>
> -               tick += period;
> -               rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> -               if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> -                       /* Too early or too late timeout requested */
> -                       EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
> -                                     timerset2str(rc));
> +               if (ttp) {
> +                       tick += period;
> +                       rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
> +                       if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
> +                               /* Too early or too late timeout requested
> */
> +                               EXAMPLE_ABORT("odp_timer_set_abs() failed:
> %s\n",
> +                                             timerset2str(rc));
> +                       }
>                 }
>
>                 /* Get the next expired timeout.
> @@ -185,18 +188,17 @@ static void test_abs_timeouts(int thr,
> test_globals_t *gbls)
>                 }
>                 EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n", thr, tick);
>
> -               odp_atomic_dec_u32(&gbls->remain);
> -       }
> +               uint32_t rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
> +               if (!rx_num)
> +                       EXAMPLE_ABORT("Unexpected timeout received (timer
> %x, tick %"PRIu64")\n",
> +                                     ttp->tim, tick);
> +               else if (rx_num > gbls->num_workers)
> +                       continue;
>
> -       /* Cancel and free last timer used */
> -       (void)odp_timer_cancel(ttp->tim, &ttp->ev);
> -       if (ttp->ev != ODP_EVENT_INVALID)
>                 odp_timeout_free(odp_timeout_from_event(ttp->ev));
> -       else
> -               EXAMPLE_ERR("Lost timeout event at timer cancel\n");
> -       /* Since we have cancelled the timer, there is no timeout event to
> -        * return from odp_timer_free() */
> -       (void)odp_timer_free(ttp->tim);
> +               odp_timer_free(ttp->tim);
> +               ttp = NULL;
> +       }
>
>         /* Remove any prescheduled events */
>         remove_prescheduled_events();
> @@ -483,6 +485,8 @@ int main(int argc, char *argv[])
>
>         printf("\n");
>
> +       gbls->num_workers = num_workers;
> +
>         /* Initialize number of timeouts to receive */
>         odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count *
> num_workers);
>
> --
> 1.9.1
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Ivan Khoronzhuk July 23, 2015, 6:01 p.m. UTC | #2
I will resend.
Just created this patch before had added hook in git.
Thanks.

On 23.07.15 20:39, Mike Holmes wrote:
> the EXAMPLE_ABORT one might be fine, we usually accept prints that are >
> 80 if there is no nice way to split them.
>
>
> Using patch:
> lng-odp_Patch_v2_1-2_example_timer_delete_races_while_termination.mbox
>    Trying to apply patch
>    Patch applied
> WARNING: Possible unwrapped commit description (prefer a maximum 75
> chars per line)
> #12:
> - Several threads can terminate the same timer and as result the same event.
>
> WARNING: Missing a blank line after declarations
> #98: FILE: example/timer/odp_timer_test.c:168:
> +        uint32_t rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
> +        if (!rx_num)
>
> WARNING: line over 80 characters
> #99: FILE: example/timer/odp_timer_test.c:169:
> +            EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick
> %"PRIu64")\n",
>
> CHECK: Concatenated strings should use spaces between elements
> #99: FILE: example/timer/odp_timer_test.c:169:
> +            EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick
> %"PRIu64")\n",
>
> total: 0 errors, 3 warnings, 1 checks, 67 lines checked
>
> NOTE: Ignored message types: DEPRECATED_VARIABLE NEW_TYPEDEFS
>
> 0001-example-timer-delete-races-while-termination.patch has style
> problems, please review.
>
> On 3 July 2015 at 08:38, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org
> <mailto:ivan.khoronzhuk@linaro.org>> wrote:
>
>     Current implementation has at least two races that lead to several
>     issues:
>
>     - gbls->remain can overflow. One thread can decrement remain counter
>     to 0.
>     While another can decrement it once again and it will be > 0. After
>     what some thread will loop very long time ...
>
>     - Several threads can terminate the same timer and as result the
>     same event.
>     After out from the main loop a thread terminates a last timer it used.
>     But a last timer saved in ttp for a thread can be received in another
>     thread. So after leaving the main loop two threads can hold the same
>     timer.
>
>     - Some timer cannot be freed as several threads try to delete the same
>     timer, as result one of the timer/tmo stay not freed after termination.
>
>     - The test can send more events that requested. The receiving of
>     requested
>     number of tmos doesn't mean the test sent the same number. It rather
>     sent more.
>
>     This patch is intended to fix above drawbacks.
>
>     The termination path must follow the next things:
>
>     - An event can be in the following places: in a timer (waiting to be
>     scheduled), in a queue for a thread to be scheduled, received in the
>     main loop.
>
>     - An event "holds" a timer, so when we receive an event we can
>     delete it's
>     timer.
>
>     - a thread cannot delete timer w/o an event as it doesn't know who
>     is owner of
>     the event (and obvious the timer).
>
>     - a thread shouldn't send events more than requested.
>
>     -  all threads have to be "held" in the loop till a last received event.
>     The scheduler can assign event for any of the threads, so one thread can
>     receive two last events for example.
>
>     According to above, added several improvements:
>     - don't send more timeouts that supposed to receive
>     - free timer and tmo for a last received tmos = num of threads.
>     - leave the main loop only if a last tmo/timer is free.
>
>     Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org
>     <mailto:ivan.khoronzhuk@linaro.org>>
>     ---
>       example/timer/odp_timer_test.c | 38
>     +++++++++++++++++++++-----------------
>       1 file changed, 21 insertions(+), 17 deletions(-)
>
>     diff --git a/example/timer/odp_timer_test.c
>     b/example/timer/odp_timer_test.c
>     index 5e4306e..584a581 100644
>     --- a/example/timer/odp_timer_test.c
>     +++ b/example/timer/odp_timer_test.c
>     @@ -47,6 +47,7 @@ typedef struct {
>              odp_timer_pool_t tp;            /**< Timer pool handle*/
>              odp_atomic_u32_t remain;        /**< Number of timeouts to
>     receive*/
>              struct test_timer tt[256];      /**< Array of all timer
>     helper structs*/
>     +       uint32_t num_workers;           /**< Number of threads */
>       } test_globals_t;
>
>       /** @private Timer set status ASCII strings */
>     @@ -139,16 +140,18 @@ static void test_abs_timeouts(int thr,
>     test_globals_t *gbls)
>              ttp->ev = odp_timeout_to_event(tmo);
>              tick = odp_timer_current_tick(gbls->tp);
>
>     -       while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
>     +       while (1) {
>                      odp_event_t ev;
>                      odp_timer_set_t rc;
>
>     -               tick += period;
>     -               rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
>     -               if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
>     -                       /* Too early or too late timeout requested */
>     -                       EXAMPLE_ABORT("odp_timer_set_abs() failed:
>     %s\n",
>     -                                     timerset2str(rc));
>     +               if (ttp) {
>     +                       tick += period;
>     +                       rc = odp_timer_set_abs(ttp->tim, tick,
>     &ttp->ev);
>     +                       if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
>     +                               /* Too early or too late timeout
>     requested */
>     +                               EXAMPLE_ABORT("odp_timer_set_abs()
>     failed: %s\n",
>     +                                             timerset2str(rc));
>     +                       }
>                      }
>
>                      /* Get the next expired timeout.
>     @@ -185,18 +188,17 @@ static void test_abs_timeouts(int thr,
>     test_globals_t *gbls)
>                      }
>                      EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n",
>     thr, tick);
>
>     -               odp_atomic_dec_u32(&gbls->remain);
>     -       }
>     +               uint32_t rx_num =
>     odp_atomic_fetch_dec_u32(&gbls->remain);
>     +               if (!rx_num)
>     +                       EXAMPLE_ABORT("Unexpected timeout received
>     (timer %x, tick %"PRIu64")\n",
>     +                                     ttp->tim, tick);
>     +               else if (rx_num > gbls->num_workers)
>     +                       continue;
>
>     -       /* Cancel and free last timer used */
>     -       (void)odp_timer_cancel(ttp->tim, &ttp->ev);
>     -       if (ttp->ev != ODP_EVENT_INVALID)
>                      odp_timeout_free(odp_timeout_from_event(ttp->ev));
>     -       else
>     -               EXAMPLE_ERR("Lost timeout event at timer cancel\n");
>     -       /* Since we have cancelled the timer, there is no timeout
>     event to
>     -        * return from odp_timer_free() */
>     -       (void)odp_timer_free(ttp->tim);
>     +               odp_timer_free(ttp->tim);
>     +               ttp = NULL;
>     +       }
>
>              /* Remove any prescheduled events */
>              remove_prescheduled_events();
>     @@ -483,6 +485,8 @@ int main(int argc, char *argv[])
>
>              printf("\n");
>
>     +       gbls->num_workers = num_workers;
>     +
>              /* Initialize number of timeouts to receive */
>              odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count *
>     num_workers);
>
>     --
>     1.9.1
>
>     _______________________________________________
>     lng-odp mailing list
>     lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>     https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
> --
> Mike Holmes
> Technical Manager - Linaro Networking Group
> Linaro.org <http://www.linaro.org/>***│ *Open source software for ARM SoCs
>
> __
>
>
diff mbox

Patch

diff --git a/example/timer/odp_timer_test.c b/example/timer/odp_timer_test.c
index 5e4306e..584a581 100644
--- a/example/timer/odp_timer_test.c
+++ b/example/timer/odp_timer_test.c
@@ -47,6 +47,7 @@  typedef struct {
 	odp_timer_pool_t tp;		/**< Timer pool handle*/
 	odp_atomic_u32_t remain;	/**< Number of timeouts to receive*/
 	struct test_timer tt[256];	/**< Array of all timer helper structs*/
+	uint32_t num_workers;		/**< Number of threads */
 } test_globals_t;
 
 /** @private Timer set status ASCII strings */
@@ -139,16 +140,18 @@  static void test_abs_timeouts(int thr, test_globals_t *gbls)
 	ttp->ev = odp_timeout_to_event(tmo);
 	tick = odp_timer_current_tick(gbls->tp);
 
-	while ((int)odp_atomic_load_u32(&gbls->remain) > 0) {
+	while (1) {
 		odp_event_t ev;
 		odp_timer_set_t rc;
 
-		tick += period;
-		rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
-		if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
-			/* Too early or too late timeout requested */
-			EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
-				      timerset2str(rc));
+		if (ttp) {
+			tick += period;
+			rc = odp_timer_set_abs(ttp->tim, tick, &ttp->ev);
+			if (odp_unlikely(rc != ODP_TIMER_SUCCESS)) {
+				/* Too early or too late timeout requested */
+				EXAMPLE_ABORT("odp_timer_set_abs() failed: %s\n",
+					      timerset2str(rc));
+			}
 		}
 
 		/* Get the next expired timeout.
@@ -185,18 +188,17 @@  static void test_abs_timeouts(int thr, test_globals_t *gbls)
 		}
 		EXAMPLE_DBG("  [%i] timeout, tick %"PRIu64"\n", thr, tick);
 
-		odp_atomic_dec_u32(&gbls->remain);
-	}
+		uint32_t rx_num = odp_atomic_fetch_dec_u32(&gbls->remain);
+		if (!rx_num)
+			EXAMPLE_ABORT("Unexpected timeout received (timer %x, tick %"PRIu64")\n",
+				      ttp->tim, tick);
+		else if (rx_num > gbls->num_workers)
+			continue;
 
-	/* Cancel and free last timer used */
-	(void)odp_timer_cancel(ttp->tim, &ttp->ev);
-	if (ttp->ev != ODP_EVENT_INVALID)
 		odp_timeout_free(odp_timeout_from_event(ttp->ev));
-	else
-		EXAMPLE_ERR("Lost timeout event at timer cancel\n");
-	/* Since we have cancelled the timer, there is no timeout event to
-	 * return from odp_timer_free() */
-	(void)odp_timer_free(ttp->tim);
+		odp_timer_free(ttp->tim);
+		ttp = NULL;
+	}
 
 	/* Remove any prescheduled events */
 	remove_prescheduled_events();
@@ -483,6 +485,8 @@  int main(int argc, char *argv[])
 
 	printf("\n");
 
+	gbls->num_workers = num_workers;
+
 	/* Initialize number of timeouts to receive */
 	odp_atomic_init_u32(&gbls->remain, gbls->args.tmo_count * num_workers);