[Powertop] Fix timer and work perf events timestamp tracing

Message ID	1346840555-18457-1-git-send-email-rajagopal.venkat@linaro.org
State	New
Headers	show Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org> Received-SPF: neutral (google.com: 209.85.160.50 is neither permitted nor denied by best guess record for domain of rajagopal.venkat@linaro.org) client-ip=209.85.160.50; From: Rajagopal Venkat <rajagopal.venkat@linaro.org> To: powertop@lists.01.org Cc: patches@linaro.org, linaro-dev@lists.linaro.org, Rajagopal Venkat <rajagopal.venkat@linaro.org> Subject: [Powertop][PATCH] Fix timer and work perf events timestamp tracing Date: Wed, 5 Sep 2012 15:52:34 +0530 Message-Id: <1346840555-18457-1-git-send-email-rajagopal.venkat@linaro.org>

rajagopal.venkat@linaro.org Sept. 5, 2012, 10:22 a.m. UTC

Incorrect timer and work perf events timestamp tracing is one
of the reason for reporting usage over 100%. This patch will
resolve the issue by
- rejecting the events for which entry timestamp is not recorded.
Currently these events exit timestamp itself is considered as
usage period resulting in over 100% usage.
- clearing event timestamps from global map at the end of each
measurement to avoid collision with earlier recorded timestamps.

Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
---
 src/process/timer.cpp | 5 ++++-
 src/process/work.cpp  | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

Viresh Kumar Sept. 5, 2012, 11:28 a.m. UTC | #1

On Wed, Sep 5, 2012 at 3:52 PM, Rajagopal Venkat
<rajagopal.venkat@linaro.org> wrote:
> Incorrect timer and work perf events timestamp tracing is one
> of the reason for reporting usage over 100%. This patch will
> resolve the issue by
> - rejecting the events for which entry timestamp is not recorded.
> Currently these events exit timestamp itself is considered as
> usage period resulting in over 100% usage.
> - clearing event timestamps from global map at the end of each
> measurement to avoid collision with earlier recorded timestamps.

Tried it few times on Panda Board.

Tested-by: Viresh Kumar <viresh.kumar@linaro.org>

Sergey Senozhatsky Sept. 5, 2012, 12:44 p.m. UTC | #2

Hi,

On (09/05/12 15:52), Rajagopal Venkat wrote:
> Incorrect timer and work perf events timestamp tracing is one
> of the reason for reporting usage over 100%. This patch will
> resolve the issue by
> - rejecting the events for which entry timestamp is not recorded.

how is that possible?
do you mean erasing between measurements?


schematically:

measure0:

ev1.start
ev2.start
ev2.end

processing
clear


measure1:
ev3.start
ev1.end  <<<<<
ev3.end

processing
clear


if so, then we're loosing events, which is no good. reporting less than 100%
is ok, but reporting less than real is not.


p.s.
I'll try to check emails, but most probably will be off-line most of the day.

	-ss


> Currently these events exit timestamp itself is considered as
> usage period resulting in over 100% usage.
> - clearing event timestamps from global map at the end of each
> measurement to avoid collision with earlier recorded timestamps.
> 
> Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
> ---
>  src/process/timer.cpp | 5 ++++-
>  src/process/work.cpp  | 5 ++++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/src/process/timer.cpp b/src/process/timer.cpp
> index 8917490..db074c4 100644
> --- a/src/process/timer.cpp
> +++ b/src/process/timer.cpp
> @@ -79,7 +79,8 @@ uint64_t timer::done(uint64_t time, uint64_t timer_struct)
>  {
>  	int64_t delta;
>  
> -	if (running_since[timer_struct] > time)
> +	if (running_since.find(timer_struct) == running_since.end() ||
> +			running_since[timer_struct] > time)
>  		return 0;
>  
>  	delta = time - running_since[timer_struct];
> @@ -147,6 +148,8 @@ void clear_timers(void)
>  		all_timers.erase(it);
>  		it = all_timers.begin();
>  	}
> +
> +	running_since.clear();
>  }
>  
>  bool timer::is_deferred(void)
> diff --git a/src/process/work.cpp b/src/process/work.cpp
> index 82f13a2..e436643 100644
> --- a/src/process/work.cpp
> +++ b/src/process/work.cpp
> @@ -56,7 +56,8 @@ uint64_t work::done(uint64_t time, uint64_t work_struct)
>  {
>  	int64_t delta;
>  
> -	if (running_since[work_struct] > time)
> +	if (running_since.find(work_struct) == running_since.end() ||
> +			running_since[work_struct] > time)
>  		return 0;
>  
>  	delta = time - running_since[work_struct];
> @@ -102,6 +103,8 @@ void clear_work(void)
>  		all_work.erase(it);
>  		it = all_work.begin();
>  	}
> +
> +	running_since.clear();
>  }
>  
>  
> -- 
> 1.7.11.3
> 
> _______________________________________________
> PowerTop mailing list
> PowerTop@lists.01.org
> https://lists.01.org/mailman/listinfo/powertop
>

Chris Ferron Sept. 5, 2012, 3:12 p.m. UTC | #3

On 09/05/2012 05:44 AM, Sergey Senozhatsky wrote:
> Hi,
>
> On (09/05/12 15:52), Rajagopal Venkat wrote:
>> Incorrect timer and work perf events timestamp tracing is one
>> of the reason for reporting usage over 100%. This patch will
>> resolve the issue by
>> - rejecting the events for which entry timestamp is not recorded.
> how is that possible?
> do you mean erasing between measurements?
>
>
> schematically:
>
> measure0:
>
> ev1.start
> ev2.start
> ev2.end
>
> processing
> clear
>
>
> measure1:
> ev3.start
> ev1.end  <<<<<
> ev3.end
>
> processing
> clear
>
>
> if so, then we're loosing events, which is no good. reporting less than 100%
> is ok, but reporting less than real is not.
>
>
> p.s.
> I'll try to check emails, but most probably will be off-line most of the day.
>
> 	-ss
Yes! This is a hard issues. Some report say CPU usage over 100% which in 
some cases is correct. For instance for IA, a CPU can run over 100% of 
it's listed frequency. For instance listed freq could be 3.4 but in 
turbo mode the freq govonor can request max and be given 3.8Ghz for a 
duration. Also overall if all processor cores are running at 100% you 
essentially can have a overall usage of over 100%.
There have been some cases reported where CPU usage is WAY over 100% and 
that is an issue, but I have has a hard time reproducing the issues.

This can be tricky business, so care is needed.

-Chris


>
>
>> Currently these events exit timestamp itself is considered as
>> usage period resulting in over 100% usage.
>> - clearing event timestamps from global map at the end of each
>> measurement to avoid collision with earlier recorded timestamps.
>>
>> Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
>> ---
>>   src/process/timer.cpp | 5 ++++-
>>   src/process/work.cpp  | 5 ++++-
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/process/timer.cpp b/src/process/timer.cpp
>> index 8917490..db074c4 100644
>> --- a/src/process/timer.cpp
>> +++ b/src/process/timer.cpp
>> @@ -79,7 +79,8 @@ uint64_t timer::done(uint64_t time, uint64_t timer_struct)
>>   {
>>   	int64_t delta;
>>   
>> -	if (running_since[timer_struct] > time)
>> +	if (running_since.find(timer_struct) == running_since.end() ||
>> +			running_since[timer_struct] > time)
>>   		return 0;
>>   
>>   	delta = time - running_since[timer_struct];
>> @@ -147,6 +148,8 @@ void clear_timers(void)
>>   		all_timers.erase(it);
>>   		it = all_timers.begin();
>>   	}
>> +
>> +	running_since.clear();
>>   }
>>   
>>   bool timer::is_deferred(void)
>> diff --git a/src/process/work.cpp b/src/process/work.cpp
>> index 82f13a2..e436643 100644
>> --- a/src/process/work.cpp
>> +++ b/src/process/work.cpp
>> @@ -56,7 +56,8 @@ uint64_t work::done(uint64_t time, uint64_t work_struct)
>>   {
>>   	int64_t delta;
>>   
>> -	if (running_since[work_struct] > time)
>> +	if (running_since.find(work_struct) == running_since.end() ||
>> +			running_since[work_struct] > time)
>>   		return 0;
>>   
>>   	delta = time - running_since[work_struct];
>> @@ -102,6 +103,8 @@ void clear_work(void)
>>   		all_work.erase(it);
>>   		it = all_work.begin();
>>   	}
>> +
>> +	running_since.clear();
>>   }
>>   
>>   
>> -- 
>> 1.7.11.3
>>
>> _______________________________________________
>> PowerTop mailing list
>> PowerTop@lists.01.org
>> https://lists.01.org/mailman/listinfo/powertop
>>
> _______________________________________________
> PowerTop mailing list
> PowerTop@lists.01.org
> https://lists.01.org/mailman/listinfo/powertop

rajagopal.venkat@linaro.org Sept. 5, 2012, 4:56 p.m. UTC | #4

On 5 September 2012 18:14, Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
> Hi,
>
> On (09/05/12 15:52), Rajagopal Venkat wrote:
>> Incorrect timer and work perf events timestamp tracing is one
>> of the reason for reporting usage over 100%. This patch will
>> resolve the issue by
>> - rejecting the events for which entry timestamp is not recorded.
>
> how is that possible?
> do you mean erasing between measurements?
>
>
> schematically:
>
> measure0:
>
> ev1.start
> ev2.start
> ev2.end
>
> processing
> clear
>
>
> measure1:
> ev3.start
> ev1.end  <<<<<

evX.end  <<<<<
These events are causing numbers to go wrong.

delta = time - running_since[timer_struct];
accumulated_runtime += delta

Since running_since[timer_struct] returns zero, event timestamp
itself gets added to accumulated_runtime, causing usage to go
high something like 2693%.

> ev3.end
>
> processing
> clear
>
>
> if so, then we're loosing events, which is no good. reporting less than 100%
> is ok, but reporting less than real is not.

I did thought of it. Yes, agree that, we are loosing events for which
start timestamp
is not recorded. I believe correct solution would be to consider these
events end
timestamp relative to first_stamp(src/process/do_process.cpp).

>
>
> p.s.
> I'll try to check emails, but most probably will be off-line most of the day.
>
>         -ss
>
>
>> Currently these events exit timestamp itself is considered as
>> usage period resulting in over 100% usage.
>> - clearing event timestamps from global map at the end of each
>> measurement to avoid collision with earlier recorded timestamps.
>>
>> Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
>> ---
>>  src/process/timer.cpp | 5 ++++-
>>  src/process/work.cpp  | 5 ++++-
>>  2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/process/timer.cpp b/src/process/timer.cpp
>> index 8917490..db074c4 100644
>> --- a/src/process/timer.cpp
>> +++ b/src/process/timer.cpp
>> @@ -79,7 +79,8 @@ uint64_t timer::done(uint64_t time, uint64_t timer_struct)
>>  {
>>       int64_t delta;
>>
>> -     if (running_since[timer_struct] > time)
>> +     if (running_since.find(timer_struct) == running_since.end() ||
>> +                     running_since[timer_struct] > time)
>>               return 0;
>>
>>       delta = time - running_since[timer_struct];
>> @@ -147,6 +148,8 @@ void clear_timers(void)
>>               all_timers.erase(it);
>>               it = all_timers.begin();
>>       }
>> +
>> +     running_since.clear();
>>  }
>>
>>  bool timer::is_deferred(void)
>> diff --git a/src/process/work.cpp b/src/process/work.cpp
>> index 82f13a2..e436643 100644
>> --- a/src/process/work.cpp
>> +++ b/src/process/work.cpp
>> @@ -56,7 +56,8 @@ uint64_t work::done(uint64_t time, uint64_t work_struct)
>>  {
>>       int64_t delta;
>>
>> -     if (running_since[work_struct] > time)
>> +     if (running_since.find(work_struct) == running_since.end() ||
>> +                     running_since[work_struct] > time)
>>               return 0;
>>
>>       delta = time - running_since[work_struct];
>> @@ -102,6 +103,8 @@ void clear_work(void)
>>               all_work.erase(it);
>>               it = all_work.begin();
>>       }
>> +
>> +     running_since.clear();
>>  }
>>
>>
>> --
>> 1.7.11.3
>>
>> _______________________________________________
>> PowerTop mailing list
>> PowerTop@lists.01.org
>> https://lists.01.org/mailman/listinfo/powertop
>>

Arjan van de Ven Sept. 5, 2012, 5:09 p.m. UTC | #5

On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>> measure1:
>> ev3.start
>> ev1.end  <<<<<
> 
> evX.end  <<<<<
> These events are causing numbers to go wrong.

but out of a 20 second window.. this is a tiny tiny window...
if you see 100.1% I'd buy this reasoning.
but you're seeing much more than that.

>>
>> if so, then we're loosing events, which is no good. reporting less than 100%
>> is ok, but reporting less than real is not.
> 
> I did thought of it. Yes, agree that, we are loosing events for which
> start timestamp

we can't lose those!
those are the events that give us the initial CPU frequency in the window etc....

rajagopal.venkat@linaro.org Sept. 5, 2012, 5:19 p.m. UTC | #6

On 5 September 2012 22:39, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>>> measure1:
>>> ev3.start
>>> ev1.end  <<<<<
>>
>> evX.end  <<<<<
>> These events are causing numbers to go wrong.
>
> but out of a 20 second window.. this is a tiny tiny window...
> if you see 100.1% I'd buy this reasoning.
> but you're seeing much more than that.

How about generating a report for 1sec duration?
Since timestamp itself is added to accumulated_runtime, the usage
percentage is really dependent on event end timestamp value.

>
>
>>>
>>> if so, then we're loosing events, which is no good. reporting less than 100%
>>> is ok, but reporting less than real is not.
>>
>> I did thought of it. Yes, agree that, we are loosing events for which
>> start timestamp
>
> we can't lose those!
> those are the events that give us the initial CPU frequency in the window etc....
>

Yes agree. I will submit the next version patch considering those events end
timestamp relative to first_stamp(src/process/do_process.cpp).

Arjan van de Ven Sept. 5, 2012, 5:22 p.m. UTC | #7

On 9/5/2012 10:19 AM, Rajagopal Venkat wrote:
> On 5 September 2012 22:39, Arjan van de Ven <arjan@linux.intel.com> wrote:
>> On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>>>> measure1:
>>>> ev3.start
>>>> ev1.end  <<<<<
>>>
>>> evX.end  <<<<<
>>> These events are causing numbers to go wrong.
>>
>> but out of a 20 second window.. this is a tiny tiny window...
>> if you see 100.1% I'd buy this reasoning.
>> but you're seeing much more than that.
> 
> How about generating a report for 1sec duration?

even for 1 second... still it's miniscule compared to this whole 1 second
the amount of setup/teardown time just is not that huge.

rajagopal.venkat@linaro.org Sept. 5, 2012, 5:45 p.m. UTC | #8

On 5 September 2012 22:52, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 9/5/2012 10:19 AM, Rajagopal Venkat wrote:
>> On 5 September 2012 22:39, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>> On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>>>>> measure1:
>>>>> ev3.start
>>>>> ev1.end  <<<<<
>>>>
>>>> evX.end  <<<<<
>>>> These events are causing numbers to go wrong.
>>>
>>> but out of a 20 second window.. this is a tiny tiny window...
>>> if you see 100.1% I'd buy this reasoning.
>>> but you're seeing much more than that.
>>
>> How about generating a report for 1sec duration?
>
> even for 1 second... still it's miniscule compared to this whole 1 second
> the amount of setup/teardown time just is not that huge.
>
Here are some perf timestamps,
(3979299431)
(3979303554)
(4079217947)
(4091306943)
(4091322535)
(4091336882)
When 1sec report is generated and if above timestamp gets
added to timer accumulated_runtime, no wonder why such
huge usage is reported.

Arjan van de Ven Sept. 5, 2012, 6:06 p.m. UTC | #9

On 9/5/2012 10:45 AM, Rajagopal Venkat wrote:
> On 5 September 2012 22:52, Arjan van de Ven <arjan@linux.intel.com> wrote:
>> On 9/5/2012 10:19 AM, Rajagopal Venkat wrote:
>>> On 5 September 2012 22:39, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>> On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>>>>>> measure1:
>>>>>> ev3.start
>>>>>> ev1.end  <<<<<
>>>>>
>>>>> evX.end  <<<<<
>>>>> These events are causing numbers to go wrong.
>>>>
>>>> but out of a 20 second window.. this is a tiny tiny window...
>>>> if you see 100.1% I'd buy this reasoning.
>>>> but you're seeing much more than that.
>>>
>>> How about generating a report for 1sec duration?
>>
>> even for 1 second... still it's miniscule compared to this whole 1 second
>> the amount of setup/teardown time just is not that huge.
>>
> Here are some perf timestamps,
> (3979299431)
> (3979303554)
> (4079217947)
> (4091306943)
> (4091322535)
> (4091336882)
> When 1sec report is generated and if above timestamp gets
> added to timer accumulated_runtime, no wonder why such
> huge usage is reported.

question is... how did these get here?
is the kernel reporting garbage time ????

rajagopal.venkat@linaro.org Sept. 6, 2012, 1:42 p.m. UTC | #10

On 5 September 2012 23:36, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 9/5/2012 10:45 AM, Rajagopal Venkat wrote:
>> On 5 September 2012 22:52, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>> On 9/5/2012 10:19 AM, Rajagopal Venkat wrote:
>>>> On 5 September 2012 22:39, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>>> On 9/5/2012 9:56 AM, Rajagopal Venkat wrote:
>>>>>>> measure1:
>>>>>>> ev3.start
>>>>>>> ev1.end  <<<<<
>>>>>>
>>>>>> evX.end  <<<<<
>>>>>> These events are causing numbers to go wrong.
>>>>>
>>>>> but out of a 20 second window.. this is a tiny tiny window...
>>>>> if you see 100.1% I'd buy this reasoning.
>>>>> but you're seeing much more than that.
>>>>
>>>> How about generating a report for 1sec duration?
>>>
>>> even for 1 second... still it's miniscule compared to this whole 1 second
>>> the amount of setup/teardown time just is not that huge.
>>>
>> Here are some perf timestamps,
>> (3979299431)
>> (3979303554)
>> (4079217947)
>> (4091306943)
>> (4091322535)
>> (4091336882)
>> When 1sec report is generated and if above timestamp gets
>> added to timer accumulated_runtime, no wonder why such
>> huge usage is reported.
>
> question is... how did these get here?
> is the kernel reporting garbage time ????
>
>
oops wrong numbers! Here is the data captured for timer::delayed_work_timer_fn
events on snowball. Prints are from timer::done() of src/process/timer.cpp.

----------------------------------------------------------------------------------------
Measurement start time(first_stamp)  - (4379576721191)
Measurement end time(last_stamp)   - (4380607421874)

accumulated         end_timestamp  	     start_timestamp
runtime
(61035)               (4379586975097)		(4379586914062)
(122071)	      (4379607116699)		(4379607055663)
(152589)	      (4379607177734)		(4379607147216)
(183106)	      (4379627105712)		(4379627075195)
(244142)	      (4379647186279)		(4379647125243)
(274660)	      (4380506896972)		(4380506866454)
(274660)	      (4380506927490)		(4380506927490)
(305177)	      (4380506958007)		(4380506927490)
(305177)	      (4380506988525)		(4380506988525)
(4380507324219)       (4380507019042)		(0)    <<<<<<<
(4380507354736)       (4380596923827)		(4380596893310)
(4380507415771)       (4380606964111)		(4380606903076)

cpu usage from total_cpu_time() - (425033.990889%)
------------------------------------------------------------------------------

As mentioned in my earlier comments, next patch will be submitted
to handle these events(for which start time is not recorded) instead
of ignoring them.

Arjan van de Ven Sept. 6, 2012, 1:51 p.m. UTC | #11

On 9/6/2012 6:42 AM, Rajagopal Venkat wrote:
> (305177)	      (4380506988525)		(4380506988525)
> (4380507324219)       (4380507019042)		(0)    <<<<<<<
> (4380507354736)       (4380596923827)		(4380596893310)
> (4380507415771)       (4380606964111)		(4380606903076)
> 
> cpu usage from total_cpu_time() - (425033.990889%)
> ------------------------------------------------------------------------------
> 
> As mentioned in my earlier comments, next patch will be submitted
> to handle these events(for which start time is not recorded) instead
> of ignoring them.
> 

now THIS explains a lot, and makes the fix (the one in the other mail you just sent) make a lot more sense
than the things from earlier ;-)

thanks for chasing this down!

[Powertop] Fix timer and work perf events timestamp tracing

Commit Message

Comments

Patch