diff mbox

validation: scheduler: increase delay tolerance

Message ID 1456768144-18699-1-git-send-email-ivan.khoronzhuk@linaro.org
State Accepted
Commit 7098c6e9652c1f3a0fbce56db79777832ca4fd82
Headers show

Commit Message

Ivan Khoronzhuk Feb. 29, 2016, 5:49 p.m. UTC
For some systems sensitive to time deviation, would be better
to have some time backup while testing scheduler time, so increase
it to 3 jiffies.

https://bugs.linaro.org/show_bug.cgi?id=2076

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 test/validation/scheduler/scheduler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bill Fischofer Feb. 29, 2016, 9:39 p.m. UTC | #1
On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk <
ivan.khoronzhuk@linaro.org> wrote:

> For some systems sensitive to time deviation, would be better

> to have some time backup while testing scheduler time, so increase

> it to 3 jiffies.

>

> https://bugs.linaro.org/show_bug.cgi?id=2076

>

> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

> ---

>  test/validation/scheduler/scheduler.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

>

> diff --git a/test/validation/scheduler/scheduler.c

> b/test/validation/scheduler/scheduler.c

> index dcf01c0..cb04209 100644

> --- a/test/validation/scheduler/scheduler.c

> +++ b/test/validation/scheduler/scheduler.c

> @@ -48,7 +48,7 @@

>  #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n)

>  #define CHAOS_WAIT_FAIL     (5 * ODP_TIME_SEC_IN_NS)

>

> -#define ODP_WAIT_TOLERANCE     (20 * ODP_TIME_MSEC_IN_NS)

> +#define ODP_WAIT_TOLERANCE     (60 * ODP_TIME_MSEC_IN_NS)

>


Do we know that this is a fix or is it just a guess at a "better" number?
The original code used unconditional waits. If the concern is simply to
avoid the possibility of indefinite stalls then why try to cut things so
close?  We could agree that a wait of one minute is sufficient to say that
something definitely isn't right, but do we care what sort of jitter we may
see on a run-by-run basis here?

A one minute timeout would mean that tests would always get a result.
Implementations that observe waits of that magnitude would clearly be in
need of investigation while others would still pass this functional
validation.  Other tests generate performance numbers and if scheduling
waits are unacceptably large they'd be better covered in that context.


>

>  /* Test global variables */

>  typedef struct {

> --

> 1.9.1

>

> _______________________________________________

> lng-odp mailing list

> lng-odp@lists.linaro.org

> https://lists.linaro.org/mailman/listinfo/lng-odp

>
Ivan Khoronzhuk Feb. 29, 2016, 10:22 p.m. UTC | #2
On 29.02.16 23:39, Bill Fischofer wrote:
>
>
> On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote:
>
>     For some systems sensitive to time deviation, would be better
>     to have some time backup while testing scheduler time, so increase
>     it to 3 jiffies.
>
>     https://bugs.linaro.org/show_bug.cgi?id=2076
>
>     Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>>
>     ---
>       test/validation/scheduler/scheduler.c | 2 +-
>       1 file changed, 1 insertion(+), 1 deletion(-)
>
>     diff --git a/test/validation/scheduler/scheduler.c b/test/validation/scheduler/scheduler.c
>     index dcf01c0..cb04209 100644
>     --- a/test/validation/scheduler/scheduler.c
>     +++ b/test/validation/scheduler/scheduler.c
>     @@ -48,7 +48,7 @@
>       #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n)
>       #define CHAOS_WAIT_FAIL     (5 * ODP_TIME_SEC_IN_NS)
>
>     -#define ODP_WAIT_TOLERANCE     (20 * ODP_TIME_MSEC_IN_NS)
>     +#define ODP_WAIT_TOLERANCE     (60 * ODP_TIME_MSEC_IN_NS)
>
>
> Do we know that this is a fix or is it just a guess at a "better" number?
It's based on jiffy. It definitely should pass for "normal" platforms w/o impact of Linux scheduler.
As we cannot predict the load on system better to have backup, practice says that 1 jiffy is not enough in some cases.


> The original code used unconditional waits.
> If the concern is simply to avoid the possibility of indefinite stalls then why try to cut things so close?
Nope. Intention here not simply catch indefinite stalls, the intention to check if time sense for scheduler is working in normal ranges.
It can differ greatly in case of some incorrect initialization or calculation. This test is going to catch this.
Actually, the test caught this bug, this bug is not a bug of test, it's bug of linux-generic implementation when scheduler timeout is corrupted
with LK scheduler and it's nice to see this captured here. In the same way it's going to catch issues on "real" boards, where such huge impact can
be only in case of incorrect timings.

>  We could agree that a wait of one minute is sufficient to say that something definitely isn't right, but do we care what sort of jitter we may see on a run-by-run basis here?
1 minute is a very huge amount of time. Here I just increased it from 20ms on 5seconds to 60ms on 5seconds. Does it a very small error? It's about 1.2%.
But it's not based on percentage currently, it's based on slices the linux kernel scheduler splits time, if you want it can be bound with 10% error for instance.
I cannot test and predict this value, no one can, which delays can be on non real time systems.
If platform cannot pass this test it definitely should improve it`s timing. For instance, if app decides to wait no more than 100ms, but scheduler waits 150ms, is it normal?
Maybe it's normal for linux-generic, but 50ms of waste time it's big amount of time for normal cases.

>
> A one minute timeout would mean that tests would always get a result.  Implementations that observe waits of that magnitude would clearly be in need of investigation while others would still pass this functional validation.  Other tests generate performance numbers and if scheduling waits are unacceptably large they'd be better covered in that context.
>
>
>       /* Test global variables */
>       typedef struct {
>     --
>     1.9.1
>
>     _______________________________________________
>     lng-odp mailing list
>     lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>     https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
Bill Fischofer March 1, 2016, 1:25 a.m. UTC | #3
OK, thanks for the explanation.  With that:

Reviewed-by: Bill Fischofer <bill.fischofer@linaro.org>


On Mon, Feb 29, 2016 at 4:22 PM, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org
> wrote:


>

>

> On 29.02.16 23:39, Bill Fischofer wrote:

>

>>

>>

>> On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk <

>> ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote:

>>

>>     For some systems sensitive to time deviation, would be better

>>     to have some time backup while testing scheduler time, so increase

>>     it to 3 jiffies.

>>

>>     https://bugs.linaro.org/show_bug.cgi?id=2076

>>

>>     Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto:

>> ivan.khoronzhuk@linaro.org>>

>>     ---

>>       test/validation/scheduler/scheduler.c | 2 +-

>>       1 file changed, 1 insertion(+), 1 deletion(-)

>>

>>     diff --git a/test/validation/scheduler/scheduler.c

>> b/test/validation/scheduler/scheduler.c

>>     index dcf01c0..cb04209 100644

>>     --- a/test/validation/scheduler/scheduler.c

>>     +++ b/test/validation/scheduler/scheduler.c

>>     @@ -48,7 +48,7 @@

>>       #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n)

>>       #define CHAOS_WAIT_FAIL     (5 * ODP_TIME_SEC_IN_NS)

>>

>>     -#define ODP_WAIT_TOLERANCE     (20 * ODP_TIME_MSEC_IN_NS)

>>     +#define ODP_WAIT_TOLERANCE     (60 * ODP_TIME_MSEC_IN_NS)

>>

>>

>> Do we know that this is a fix or is it just a guess at a "better" number?

>>

> It's based on jiffy. It definitely should pass for "normal" platforms w/o

> impact of Linux scheduler.

> As we cannot predict the load on system better to have backup, practice

> says that 1 jiffy is not enough in some cases.

>

>

> The original code used unconditional waits.

>> If the concern is simply to avoid the possibility of indefinite stalls

>> then why try to cut things so close?

>>

> Nope. Intention here not simply catch indefinite stalls, the intention to

> check if time sense for scheduler is working in normal ranges.

> It can differ greatly in case of some incorrect initialization or

> calculation. This test is going to catch this.

> Actually, the test caught this bug, this bug is not a bug of test, it's

> bug of linux-generic implementation when scheduler timeout is corrupted

> with LK scheduler and it's nice to see this captured here. In the same way

> it's going to catch issues on "real" boards, where such huge impact can

> be only in case of incorrect timings.

>

>  We could agree that a wait of one minute is sufficient to say that

>> something definitely isn't right, but do we care what sort of jitter we may

>> see on a run-by-run basis here?

>>

> 1 minute is a very huge amount of time. Here I just increased it from 20ms

> on 5seconds to 60ms on 5seconds. Does it a very small error? It's about

> 1.2%.

> But it's not based on percentage currently, it's based on slices the linux

> kernel scheduler splits time, if you want it can be bound with 10% error

> for instance.

> I cannot test and predict this value, no one can, which delays can be on

> non real time systems.

> If platform cannot pass this test it definitely should improve it`s

> timing. For instance, if app decides to wait no more than 100ms, but

> scheduler waits 150ms, is it normal?

> Maybe it's normal for linux-generic, but 50ms of waste time it's big

> amount of time for normal cases.

>

>

>> A one minute timeout would mean that tests would always get a result.

>> Implementations that observe waits of that magnitude would clearly be in

>> need of investigation while others would still pass this functional

>> validation.  Other tests generate performance numbers and if scheduling

>> waits are unacceptably large they'd be better covered in that context.

>>

>>

>>       /* Test global variables */

>>       typedef struct {

>>     --

>>     1.9.1

>>

>>     _______________________________________________

>>     lng-odp mailing list

>>     lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>

>>     https://lists.linaro.org/mailman/listinfo/lng-odp

>>

>>

>>

> --

> Regards,

> Ivan Khoronzhuk

>
Maxim Uvarov March 1, 2016, 3:51 p.m. UTC | #4
Merged,
Maxim.

On 03/01/16 04:25, Bill Fischofer wrote:
> OK, thanks for the explanation.  With that:
>
> Reviewed-by: Bill Fischofer <bill.fischofer@linaro.org 
> <mailto:bill.fischofer@linaro.org>>
>
> On Mon, Feb 29, 2016 at 4:22 PM, Ivan Khoronzhuk 
> <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote:
>
>
>
>     On 29.02.16 23:39, Bill Fischofer wrote:
>
>
>
>         On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk
>         <ivan.khoronzhuk@linaro.org
>         <mailto:ivan.khoronzhuk@linaro.org>
>         <mailto:ivan.khoronzhuk@linaro.org
>         <mailto:ivan.khoronzhuk@linaro.org>>> wrote:
>
>             For some systems sensitive to time deviation, would be better
>             to have some time backup while testing scheduler time, so
>         increase
>             it to 3 jiffies.
>
>         https://bugs.linaro.org/show_bug.cgi?id=2076
>
>             Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org
>         <mailto:ivan.khoronzhuk@linaro.org>
>         <mailto:ivan.khoronzhuk@linaro.org
>         <mailto:ivan.khoronzhuk@linaro.org>>>
>             ---
>               test/validation/scheduler/scheduler.c | 2 +-
>               1 file changed, 1 insertion(+), 1 deletion(-)
>
>             diff --git a/test/validation/scheduler/scheduler.c
>         b/test/validation/scheduler/scheduler.c
>             index dcf01c0..cb04209 100644
>             --- a/test/validation/scheduler/scheduler.c
>             +++ b/test/validation/scheduler/scheduler.c
>             @@ -48,7 +48,7 @@
>               #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n)
>               #define CHAOS_WAIT_FAIL     (5 * ODP_TIME_SEC_IN_NS)
>
>             -#define ODP_WAIT_TOLERANCE     (20 * ODP_TIME_MSEC_IN_NS)
>             +#define ODP_WAIT_TOLERANCE     (60 * ODP_TIME_MSEC_IN_NS)
>
>
>         Do we know that this is a fix or is it just a guess at a
>         "better" number?
>
>     It's based on jiffy. It definitely should pass for "normal"
>     platforms w/o impact of Linux scheduler.
>     As we cannot predict the load on system better to have backup,
>     practice says that 1 jiffy is not enough in some cases.
>
>
>         The original code used unconditional waits.
>         If the concern is simply to avoid the possibility of
>         indefinite stalls then why try to cut things so close?
>
>     Nope. Intention here not simply catch indefinite stalls, the
>     intention to check if time sense for scheduler is working in
>     normal ranges.
>     It can differ greatly in case of some incorrect initialization or
>     calculation. This test is going to catch this.
>     Actually, the test caught this bug, this bug is not a bug of test,
>     it's bug of linux-generic implementation when scheduler timeout is
>     corrupted
>     with LK scheduler and it's nice to see this captured here. In the
>     same way it's going to catch issues on "real" boards, where such
>     huge impact can
>     be only in case of incorrect timings.
>
>          We could agree that a wait of one minute is sufficient to say
>         that something definitely isn't right, but do we care what
>         sort of jitter we may see on a run-by-run basis here?
>
>     1 minute is a very huge amount of time. Here I just increased it
>     from 20ms on 5seconds to 60ms on 5seconds. Does it a very small
>     error? It's about 1.2%.
>     But it's not based on percentage currently, it's based on slices
>     the linux kernel scheduler splits time, if you want it can be
>     bound with 10% error for instance.
>     I cannot test and predict this value, no one can, which delays can
>     be on non real time systems.
>     If platform cannot pass this test it definitely should improve
>     it`s timing. For instance, if app decides to wait no more than
>     100ms, but scheduler waits 150ms, is it normal?
>     Maybe it's normal for linux-generic, but 50ms of waste time it's
>     big amount of time for normal cases.
>
>
>         A one minute timeout would mean that tests would always get a
>         result.  Implementations that observe waits of that magnitude
>         would clearly be in need of investigation while others would
>         still pass this functional validation.  Other tests generate
>         performance numbers and if scheduling waits are unacceptably
>         large they'd be better covered in that context.
>
>
>               /* Test global variables */
>               typedef struct {
>             --
>             1.9.1
>
>             _______________________________________________
>             lng-odp mailing list
>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>         <mailto:lng-odp@lists.linaro.org
>         <mailto:lng-odp@lists.linaro.org>>
>         https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>     -- 
>     Regards,
>     Ivan Khoronzhuk
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
diff mbox

Patch

diff --git a/test/validation/scheduler/scheduler.c b/test/validation/scheduler/scheduler.c
index dcf01c0..cb04209 100644
--- a/test/validation/scheduler/scheduler.c
+++ b/test/validation/scheduler/scheduler.c
@@ -48,7 +48,7 @@ 
 #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n)
 #define CHAOS_WAIT_FAIL     (5 * ODP_TIME_SEC_IN_NS)
 
-#define ODP_WAIT_TOLERANCE	(20 * ODP_TIME_MSEC_IN_NS)
+#define ODP_WAIT_TOLERANCE	(60 * ODP_TIME_MSEC_IN_NS)
 
 /* Test global variables */
 typedef struct {