Calculate average latencies on the fly

Message ID CAK+oA4WFAaT4DmR=8nKqTsyKe9imDYCu5p6DG4kqbhoBPD=LXw@mail.gmail.com
State New
Headers show

Commit Message

Piotr Gregor Nov. 14, 2016, 11:50 a.m.
There was an error in previous patch. Please find attached corrected patch.

cheers,
Piotr

Comments

Mathieu Poirier Nov. 14, 2016, 3:42 p.m. | #1
On 14 November 2016 at 04:50, Piotr Gregor <piotrgregor@rsyncme.org> wrote:
> There was an error in previous patch. Please find attached corrected patch.

>

> cheers,

> Piotr


Piort,

I am fairly new to this mailing list but if it is anything like the
kernel's, you will want to send your patch using git send-email.  It
is probably a good idea to run checkpatch.pl on it too - that way
you'll avoid all sort of style problems like lines over 80 characters
long.

Mathieu
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Piotr Gregor Nov. 14, 2016, 4:11 p.m. | #2
Hi Mathieu,

I am newer most likely - many thanks for your advise.

cheers,
Piotr
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Clark Williams Nov. 14, 2016, 4:58 p.m. | #3
On Mon, 14 Nov 2016 16:11:58 +0000
Piotr Gregor <piotrgregor@rsyncme.org> wrote:

> Hi Mathieu,

> 

> I am newer most likely - many thanks for your advise.

> 

> cheers,

> Piotr

> --


Luckily this list is *much* lower traffic than LKML and we're not quite as strict as they are, since we don't have the volume of patches to deal with. 

That being said, very good advice and I'll make a note to follow it :)

Piotr, John Kacur or I will look at adding in your patch to cyclictest shortly. Thanks for the submission.

-- 
United States Coast Guard
Ruining Natural Selection since 1790
John Kacur Nov. 16, 2016, 2:47 p.m. | #4
Please inline the patches

There are 1000000 microseconds / second, the default interval is 1000us, 
but let's say we did it more frequently at 100us

1000000 us /  100 intervals = 10000 samples / second * 60 sec/min * 60 
min/hr * 24 hrs/day * 365 days / year * 1000 (very pessimistic diff) =
an accumulated diff of 3.1536×10^14

With a max double of 9x10^15/3.15*10^14 we could run for 30 years before 
we move into normal numbers.

So, the problem with your patch, is not just the problem which it proposes 
to fix, is one I see as mostly theoretical, but there is the potential for 
accumulating small inaccuracies.

Of course, there is always the chance that I made some kind of error in my 
analysis here, please feel free to point it out!

What might be more interesting is how long would it take for the cycles to 
overflow?

Thanks

John

On Mon, 14 Nov 2016, Piotr Gregor wrote:

> There was an error in previous patch. Please find attached corrected patch.

> 

> cheers,

> Piotr

>
Piotr Gregor Nov. 19, 2016, 11:27 p.m. | #5
Hi John, Clark,

Recently I observed jitter of > 100 000 during tests of virtualized system.
In that case 30 years would change into 0.3 year which is not as much.
The jitter depends on the hardware/software you test, what if someone
deliberately wishes to measure jitter of more than this or he is just
out of luck and gets huge values?
Also, the limit will change if the type of average changes, but if it
is calculated in the way I proposed, it will always work.
Apart from this, since the benign possibility exists - why not to fix this
if the fix is easy.

cheers,
Piotr
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tracy Smith Nov. 20, 2016, 5:03 a.m. | #6
Piotr, please explain what you believe to be the root cause?

> On Nov 19, 2016, at 5:27 PM, Piotr Gregor <piotrgregor@rsyncme.org> wrote:

> 

> Hi John, Clark,

> 

> Recently I observed jitter of > 100 000 during tests of virtualized system.

> In that case 30 years would change into 0.3 year which is not as much.

> The jitter depends on the hardware/software you test, what if someone

> deliberately wishes to measure jitter of more than this or he is just

> out of luck and gets huge values?

> Also, the limit will change if the type of average changes, but if it

> is calculated in the way I proposed, it will always work.

> Apart from this, since the benign possibility exists - why not to fix this

> if the fix is easy.

> 

> cheers,

> Piotr

> --

> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Kacur Nov. 21, 2016, 6:32 p.m. | #7
On Sat, 19 Nov 2016, Piotr Gregor wrote:

> Hi John, Clark,

> 

> Recently I observed jitter of > 100 000 during tests of virtualized system.

> In that case 30 years would change into 0.3 year which is not as much.

> The jitter depends on the hardware/software you test, what if someone

> deliberately wishes to measure jitter of more than this or he is just

> out of luck and gets huge values?

> Also, the limit will change if the type of average changes, but if it

> is calculated in the way I proposed, it will always work.

> Apart from this, since the benign possibility exists - why not to fix this

> if the fix is easy.

> 


I already explained why. Everytime you do a floating point long division 
you are potentially accumulating small errors. Something is terribly 
wrong if you are getting a diff of greater than 100,000, if that really 
is true you need to fix it right away instead of run cyclictest for a 
long period of time.

John> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Kacur Nov. 22, 2016, 3:43 p.m. | #8
----- Original Message -----
> John,

> 

> Latency of 100,000 seems to me to be a bug when doing floating point long

> division, which potentially accumulates small errors.

> 

> Is the issue in the cyclictest calculation or in the kernel?  Or, the FPU?

> 

> If in the kernel, where do we need to trace to check for increased latency

> when doing floating point long division?

> 

> Piotr, needs to trace where the latency is occurring and fix.  Any

> recommendations on a possible fix at this stage?

> 

> Thx, Tracy

> 


Sure, you make some good points, it could be a cyclictest bug, it could be in the kernel, 
he also said he was doing this in a virtualized system, it could be somewhere in that stack.
The first step of course is to see if you can come up with a way to reliably reproduce it.

Thanks

John
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

From 5610e2b01e5302a5ecff7ca368198863a6ce9f78 Mon Sep 17 00:00:00 2001
From: Piotr Gregor <piotrgregor@rsyncme.org>
Date: Fri, 11 Nov 2016 11:28:09 +0000
Subject: [PATCH] rt-tests: cyclictest: Calculate average latencies on the fly

Prevents stat->avg variable from overflow.
---
 src/cyclictest/cyclictest.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
index 3f1bef1..0d2ddbc 100644
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -1178,7 +1178,11 @@  static void *timerthread(void *param)
 			if (refresh_on_max)
 				pthread_cond_signal(&refresh_on_max_cond);
 		}
-		stat->avg += (double) diff;
+		if (stat->cycles == 0) {
+			stat->avg = (double) diff;
+		} else {
+			stat->avg = (stat->avg + (double) diff / (double) stat->cycles) * ((double) stat->cycles / (double) (stat->cycles + 1));
+		}
 
 		if (trigger && (diff > trigger)) {
 			trigger_update(par, diff, calctime(now));
@@ -2005,7 +2009,7 @@  static void print_hist(struct thread_param *par[], int nthreads)
 	fprintf(fd, "# Avg Latencies:");
 	for (j = 0; j < nthreads; j++)
 		fprintf(fd, " %05lu", par[j]->stats->cycles ?
-		       (long)(par[j]->stats->avg/par[j]->stats->cycles) : 0);
+		       (long)(par[j]->stats->avg) : 0);
 	fprintf(fd, "\n");
 	fprintf(fd, "# Max Latencies:");
 	maxmax = 0;
@@ -2059,7 +2063,7 @@  static void print_stat(FILE *fp, struct thread_param *par, int index, int verbos
 			fprintf(fp, fmt, index, stat->tid, par->prio,
 				par->interval, stat->cycles, stat->min,
 				stat->act, stat->cycles ?
-				(long)(stat->avg/stat->cycles) : 0, stat->max);
+				(long)(stat->avg) : 0, stat->max);
 
 			if (smi)
 				fprintf(fp," SMI:%8ld", stat->smi_count);
-- 
2.7.4