Message ID | 1456768144-18699-1-git-send-email-ivan.khoronzhuk@linaro.org |
---|---|
State | Accepted |
Commit | 7098c6e9652c1f3a0fbce56db79777832ca4fd82 |
Headers | show |
On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk < ivan.khoronzhuk@linaro.org> wrote: > For some systems sensitive to time deviation, would be better > to have some time backup while testing scheduler time, so increase > it to 3 jiffies. > > https://bugs.linaro.org/show_bug.cgi?id=2076 > > Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> > --- > test/validation/scheduler/scheduler.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/validation/scheduler/scheduler.c > b/test/validation/scheduler/scheduler.c > index dcf01c0..cb04209 100644 > --- a/test/validation/scheduler/scheduler.c > +++ b/test/validation/scheduler/scheduler.c > @@ -48,7 +48,7 @@ > #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n) > #define CHAOS_WAIT_FAIL (5 * ODP_TIME_SEC_IN_NS) > > -#define ODP_WAIT_TOLERANCE (20 * ODP_TIME_MSEC_IN_NS) > +#define ODP_WAIT_TOLERANCE (60 * ODP_TIME_MSEC_IN_NS) > Do we know that this is a fix or is it just a guess at a "better" number? The original code used unconditional waits. If the concern is simply to avoid the possibility of indefinite stalls then why try to cut things so close? We could agree that a wait of one minute is sufficient to say that something definitely isn't right, but do we care what sort of jitter we may see on a run-by-run basis here? A one minute timeout would mean that tests would always get a result. Implementations that observe waits of that magnitude would clearly be in need of investigation while others would still pass this functional validation. Other tests generate performance numbers and if scheduling waits are unacceptably large they'd be better covered in that context. > > /* Test global variables */ > typedef struct { > -- > 1.9.1 > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/lng-odp >
On 29.02.16 23:39, Bill Fischofer wrote: > > > On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote: > > For some systems sensitive to time deviation, would be better > to have some time backup while testing scheduler time, so increase > it to 3 jiffies. > > https://bugs.linaro.org/show_bug.cgi?id=2076 > > Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> > --- > test/validation/scheduler/scheduler.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/validation/scheduler/scheduler.c b/test/validation/scheduler/scheduler.c > index dcf01c0..cb04209 100644 > --- a/test/validation/scheduler/scheduler.c > +++ b/test/validation/scheduler/scheduler.c > @@ -48,7 +48,7 @@ > #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n) > #define CHAOS_WAIT_FAIL (5 * ODP_TIME_SEC_IN_NS) > > -#define ODP_WAIT_TOLERANCE (20 * ODP_TIME_MSEC_IN_NS) > +#define ODP_WAIT_TOLERANCE (60 * ODP_TIME_MSEC_IN_NS) > > > Do we know that this is a fix or is it just a guess at a "better" number? It's based on jiffy. It definitely should pass for "normal" platforms w/o impact of Linux scheduler. As we cannot predict the load on system better to have backup, practice says that 1 jiffy is not enough in some cases. > The original code used unconditional waits. > If the concern is simply to avoid the possibility of indefinite stalls then why try to cut things so close? Nope. Intention here not simply catch indefinite stalls, the intention to check if time sense for scheduler is working in normal ranges. It can differ greatly in case of some incorrect initialization or calculation. This test is going to catch this. Actually, the test caught this bug, this bug is not a bug of test, it's bug of linux-generic implementation when scheduler timeout is corrupted with LK scheduler and it's nice to see this captured here. In the same way it's going to catch issues on "real" boards, where such huge impact can be only in case of incorrect timings. > We could agree that a wait of one minute is sufficient to say that something definitely isn't right, but do we care what sort of jitter we may see on a run-by-run basis here? 1 minute is a very huge amount of time. Here I just increased it from 20ms on 5seconds to 60ms on 5seconds. Does it a very small error? It's about 1.2%. But it's not based on percentage currently, it's based on slices the linux kernel scheduler splits time, if you want it can be bound with 10% error for instance. I cannot test and predict this value, no one can, which delays can be on non real time systems. If platform cannot pass this test it definitely should improve it`s timing. For instance, if app decides to wait no more than 100ms, but scheduler waits 150ms, is it normal? Maybe it's normal for linux-generic, but 50ms of waste time it's big amount of time for normal cases. > > A one minute timeout would mean that tests would always get a result. Implementations that observe waits of that magnitude would clearly be in need of investigation while others would still pass this functional validation. Other tests generate performance numbers and if scheduling waits are unacceptably large they'd be better covered in that context. > > > /* Test global variables */ > typedef struct { > -- > 1.9.1 > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> > https://lists.linaro.org/mailman/listinfo/lng-odp > >
OK, thanks for the explanation. With that: Reviewed-by: Bill Fischofer <bill.fischofer@linaro.org> On Mon, Feb 29, 2016 at 4:22 PM, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org > wrote: > > > On 29.02.16 23:39, Bill Fischofer wrote: > >> >> >> On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk < >> ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote: >> >> For some systems sensitive to time deviation, would be better >> to have some time backup while testing scheduler time, so increase >> it to 3 jiffies. >> >> https://bugs.linaro.org/show_bug.cgi?id=2076 >> >> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org <mailto: >> ivan.khoronzhuk@linaro.org>> >> --- >> test/validation/scheduler/scheduler.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/test/validation/scheduler/scheduler.c >> b/test/validation/scheduler/scheduler.c >> index dcf01c0..cb04209 100644 >> --- a/test/validation/scheduler/scheduler.c >> +++ b/test/validation/scheduler/scheduler.c >> @@ -48,7 +48,7 @@ >> #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n) >> #define CHAOS_WAIT_FAIL (5 * ODP_TIME_SEC_IN_NS) >> >> -#define ODP_WAIT_TOLERANCE (20 * ODP_TIME_MSEC_IN_NS) >> +#define ODP_WAIT_TOLERANCE (60 * ODP_TIME_MSEC_IN_NS) >> >> >> Do we know that this is a fix or is it just a guess at a "better" number? >> > It's based on jiffy. It definitely should pass for "normal" platforms w/o > impact of Linux scheduler. > As we cannot predict the load on system better to have backup, practice > says that 1 jiffy is not enough in some cases. > > > The original code used unconditional waits. >> If the concern is simply to avoid the possibility of indefinite stalls >> then why try to cut things so close? >> > Nope. Intention here not simply catch indefinite stalls, the intention to > check if time sense for scheduler is working in normal ranges. > It can differ greatly in case of some incorrect initialization or > calculation. This test is going to catch this. > Actually, the test caught this bug, this bug is not a bug of test, it's > bug of linux-generic implementation when scheduler timeout is corrupted > with LK scheduler and it's nice to see this captured here. In the same way > it's going to catch issues on "real" boards, where such huge impact can > be only in case of incorrect timings. > > We could agree that a wait of one minute is sufficient to say that >> something definitely isn't right, but do we care what sort of jitter we may >> see on a run-by-run basis here? >> > 1 minute is a very huge amount of time. Here I just increased it from 20ms > on 5seconds to 60ms on 5seconds. Does it a very small error? It's about > 1.2%. > But it's not based on percentage currently, it's based on slices the linux > kernel scheduler splits time, if you want it can be bound with 10% error > for instance. > I cannot test and predict this value, no one can, which delays can be on > non real time systems. > If platform cannot pass this test it definitely should improve it`s > timing. For instance, if app decides to wait no more than 100ms, but > scheduler waits 150ms, is it normal? > Maybe it's normal for linux-generic, but 50ms of waste time it's big > amount of time for normal cases. > > >> A one minute timeout would mean that tests would always get a result. >> Implementations that observe waits of that magnitude would clearly be in >> need of investigation while others would still pass this functional >> validation. Other tests generate performance numbers and if scheduling >> waits are unacceptably large they'd be better covered in that context. >> >> >> /* Test global variables */ >> typedef struct { >> -- >> 1.9.1 >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >> https://lists.linaro.org/mailman/listinfo/lng-odp >> >> >> > -- > Regards, > Ivan Khoronzhuk >
Merged, Maxim. On 03/01/16 04:25, Bill Fischofer wrote: > OK, thanks for the explanation. With that: > > Reviewed-by: Bill Fischofer <bill.fischofer@linaro.org > <mailto:bill.fischofer@linaro.org>> > > On Mon, Feb 29, 2016 at 4:22 PM, Ivan Khoronzhuk > <ivan.khoronzhuk@linaro.org <mailto:ivan.khoronzhuk@linaro.org>> wrote: > > > > On 29.02.16 23:39, Bill Fischofer wrote: > > > > On Mon, Feb 29, 2016 at 11:49 AM, Ivan Khoronzhuk > <ivan.khoronzhuk@linaro.org > <mailto:ivan.khoronzhuk@linaro.org> > <mailto:ivan.khoronzhuk@linaro.org > <mailto:ivan.khoronzhuk@linaro.org>>> wrote: > > For some systems sensitive to time deviation, would be better > to have some time backup while testing scheduler time, so > increase > it to 3 jiffies. > > https://bugs.linaro.org/show_bug.cgi?id=2076 > > Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org > <mailto:ivan.khoronzhuk@linaro.org> > <mailto:ivan.khoronzhuk@linaro.org > <mailto:ivan.khoronzhuk@linaro.org>>> > --- > test/validation/scheduler/scheduler.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/validation/scheduler/scheduler.c > b/test/validation/scheduler/scheduler.c > index dcf01c0..cb04209 100644 > --- a/test/validation/scheduler/scheduler.c > +++ b/test/validation/scheduler/scheduler.c > @@ -48,7 +48,7 @@ > #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n) > #define CHAOS_WAIT_FAIL (5 * ODP_TIME_SEC_IN_NS) > > -#define ODP_WAIT_TOLERANCE (20 * ODP_TIME_MSEC_IN_NS) > +#define ODP_WAIT_TOLERANCE (60 * ODP_TIME_MSEC_IN_NS) > > > Do we know that this is a fix or is it just a guess at a > "better" number? > > It's based on jiffy. It definitely should pass for "normal" > platforms w/o impact of Linux scheduler. > As we cannot predict the load on system better to have backup, > practice says that 1 jiffy is not enough in some cases. > > > The original code used unconditional waits. > If the concern is simply to avoid the possibility of > indefinite stalls then why try to cut things so close? > > Nope. Intention here not simply catch indefinite stalls, the > intention to check if time sense for scheduler is working in > normal ranges. > It can differ greatly in case of some incorrect initialization or > calculation. This test is going to catch this. > Actually, the test caught this bug, this bug is not a bug of test, > it's bug of linux-generic implementation when scheduler timeout is > corrupted > with LK scheduler and it's nice to see this captured here. In the > same way it's going to catch issues on "real" boards, where such > huge impact can > be only in case of incorrect timings. > > We could agree that a wait of one minute is sufficient to say > that something definitely isn't right, but do we care what > sort of jitter we may see on a run-by-run basis here? > > 1 minute is a very huge amount of time. Here I just increased it > from 20ms on 5seconds to 60ms on 5seconds. Does it a very small > error? It's about 1.2%. > But it's not based on percentage currently, it's based on slices > the linux kernel scheduler splits time, if you want it can be > bound with 10% error for instance. > I cannot test and predict this value, no one can, which delays can > be on non real time systems. > If platform cannot pass this test it definitely should improve > it`s timing. For instance, if app decides to wait no more than > 100ms, but scheduler waits 150ms, is it normal? > Maybe it's normal for linux-generic, but 50ms of waste time it's > big amount of time for normal cases. > > > A one minute timeout would mean that tests would always get a > result. Implementations that observe waits of that magnitude > would clearly be in need of investigation while others would > still pass this functional validation. Other tests generate > performance numbers and if scheduling waits are unacceptably > large they'd be better covered in that context. > > > /* Test global variables */ > typedef struct { > -- > 1.9.1 > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> > <mailto:lng-odp@lists.linaro.org > <mailto:lng-odp@lists.linaro.org>> > https://lists.linaro.org/mailman/listinfo/lng-odp > > > > -- > Regards, > Ivan Khoronzhuk > > > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/lng-odp
diff --git a/test/validation/scheduler/scheduler.c b/test/validation/scheduler/scheduler.c index dcf01c0..cb04209 100644 --- a/test/validation/scheduler/scheduler.c +++ b/test/validation/scheduler/scheduler.c @@ -48,7 +48,7 @@ #define CHAOS_NDX_TO_PTR(n) ((void *)(uintptr_t)n) #define CHAOS_WAIT_FAIL (5 * ODP_TIME_SEC_IN_NS) -#define ODP_WAIT_TOLERANCE (20 * ODP_TIME_MSEC_IN_NS) +#define ODP_WAIT_TOLERANCE (60 * ODP_TIME_MSEC_IN_NS) /* Test global variables */ typedef struct {
For some systems sensitive to time deviation, would be better to have some time backup while testing scheduler time, so increase it to 3 jiffies. https://bugs.linaro.org/show_bug.cgi?id=2076 Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> --- test/validation/scheduler/scheduler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)