diff mbox series

[5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth

Message ID 9bbefa3b9a62319698907d10e8b78f1b999c311b.1724970211.git.reinette.chatre@intel.com
State New
Headers show
Series selftests/resctrl: Support diverse platforms with MBM and MBA tests | expand

Commit Message

Reinette Chatre Aug. 29, 2024, 10:52 p.m. UTC
The MBA test incrementally throttles memory bandwidth, each time
followed by a comparison between the memory bandwidth observed
by the performance counters and resctrl respectively.

While a comparison between performance counters and resctrl is
generally appropriate, they do not have an identical view of
memory bandwidth. For example RAS features or memory performance
features that generate memory traffic may drive accesses that are
counted differently by performance counters and MBM respectively,
for instance generating "overhead" traffic which is not counted
against any specific RMID. As a ratio, this different view of memory
bandwidth becomes more apparent at low memory bandwidths.

It is not practical to enable/disable the various features that
may generate memory bandwidth to give performance counters and
resctrl an identical view. Instead, do not compare performance
counters and resctrl view of memory bandwidth when the memory
bandwidth is low.

Bandwidth throttling behaves differently across platforms
so it is not appropriate to drop measurement data simply based
on the throttling level. Instead, use a threshold of 750MiB
that has been observed to support adequate comparison between
performance counters and resctrl.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
 tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
 2 files changed, 13 insertions(+)

Comments

Reinette Chatre Sept. 4, 2024, 9:15 p.m. UTC | #1
Hi Ilpo,

On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>
>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>> followed by a comparison between the memory bandwidth observed
>>>> by the performance counters and resctrl respectively.
>>>>
>>>> While a comparison between performance counters and resctrl is
>>>> generally appropriate, they do not have an identical view of
>>>> memory bandwidth. For example RAS features or memory performance
>>>> features that generate memory traffic may drive accesses that are
>>>> counted differently by performance counters and MBM respectively,
>>>> for instance generating "overhead" traffic which is not counted
>>>> against any specific RMID. As a ratio, this different view of memory
>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>
>>> Interesting.
>>>
>>> I did some time back prototype with a change to MBM test such that instead
>>> of using once=false I changed fill_buf to be able to run N passes through
>>> the buffer which allowed me to know how many reads were performed by the
>>> benchmark. This yielded numerical difference between all those 3 values
>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>> didn't end up making an usable test.
>>>
>>> I guess I now have an explanation for at least a part of the differences.
>>>
>>>> It is not practical to enable/disable the various features that
>>>> may generate memory bandwidth to give performance counters and
>>>> resctrl an identical view. Instead, do not compare performance
>>>> counters and resctrl view of memory bandwidth when the memory
>>>> bandwidth is low.
>>>>
>>>> Bandwidth throttling behaves differently across platforms
>>>> so it is not appropriate to drop measurement data simply based
>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>> that has been observed to support adequate comparison between
>>>> performance counters and resctrl.
>>>>
>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>> ---
>>>>    tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>    tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>    2 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>> index cad473b81a64..204b9ac4b108 100644
>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
>>>> unsigned long *bw_resc)
>>>>      		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>    		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>> THROTTLE_THRESHOLD) {
>>>> +			ksft_print_msg("Bandwidth below threshold (%d MiB).
>>>> Dropping results from MBA schemata %u.\n",
>>>> +					THROTTLE_THRESHOLD,
>>>> +					ALLOCATION_MAX - ALLOCATION_STEP *
>>>> allocation);
>>>
>>> The second one too should be %d.
>>>
>>
>> hmmm ... I intended to have it be consistent with the ksft_print_msg() that
>> follows. Perhaps allocation can be made unsigned instead?
> 
> If you go that way, then it would be good to make the related defines and
> allocation in mba_setup() unsigned too, although the latter is a bit scary

Sure, will look into that.

> because it does allocation -= ALLOCATION_STEP which could underflow if the
> defines are ever changed.
> 

Is this not already covered in the following check:
	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
		return END_OF_TESTS;

We are talking about hardcoded constants though.

Reinette
Ilpo Järvinen Sept. 5, 2024, 11:45 a.m. UTC | #2
On Wed, 4 Sep 2024, Reinette Chatre wrote:
> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > 
> > > > > The MBA test incrementally throttles memory bandwidth, each time
> > > > > followed by a comparison between the memory bandwidth observed
> > > > > by the performance counters and resctrl respectively.
> > > > > 
> > > > > While a comparison between performance counters and resctrl is
> > > > > generally appropriate, they do not have an identical view of
> > > > > memory bandwidth. For example RAS features or memory performance
> > > > > features that generate memory traffic may drive accesses that are
> > > > > counted differently by performance counters and MBM respectively,
> > > > > for instance generating "overhead" traffic which is not counted
> > > > > against any specific RMID. As a ratio, this different view of memory
> > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > 
> > > > Interesting.
> > > > 
> > > > I did some time back prototype with a change to MBM test such that
> > > > instead
> > > > of using once=false I changed fill_buf to be able to run N passes
> > > > through
> > > > the buffer which allowed me to know how many reads were performed by the
> > > > benchmark. This yielded numerical difference between all those 3 values
> > > > (# of reads, MBM, perf) which also varied from arch to another so it
> > > > didn't end up making an usable test.
> > > > 
> > > > I guess I now have an explanation for at least a part of the
> > > > differences.
> > > > 
> > > > > It is not practical to enable/disable the various features that
> > > > > may generate memory bandwidth to give performance counters and
> > > > > resctrl an identical view. Instead, do not compare performance
> > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > bandwidth is low.
> > > > > 
> > > > > Bandwidth throttling behaves differently across platforms
> > > > > so it is not appropriate to drop measurement data simply based
> > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > that has been observed to support adequate comparison between
> > > > > performance counters and resctrl.
> > > > > 
> > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > > > ---
> > > > >    tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > > > >    tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > > > >    2 files changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
> > > > > unsigned long *bw_resc)
> > > > >      		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > > > >    		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
> > > > > THROTTLE_THRESHOLD) {
> > > > > +			ksft_print_msg("Bandwidth below threshold (%d
> > > > > MiB).
> > > > > Dropping results from MBA schemata %u.\n",
> > > > > +					THROTTLE_THRESHOLD,
> > > > > +					ALLOCATION_MAX -
> > > > > ALLOCATION_STEP *
> > > > > allocation);
> > > > 
> > > > The second one too should be %d.
> > > > 
> > > 
> > > hmmm ... I intended to have it be consistent with the ksft_print_msg()
> > > that
> > > follows. Perhaps allocation can be made unsigned instead?
> > 
> > If you go that way, then it would be good to make the related defines and
> > allocation in mba_setup() unsigned too, although the latter is a bit scary
> 
> Sure, will look into that.
> 
> > because it does allocation -= ALLOCATION_STEP which could underflow if the
> > defines are ever changed.
> > 
> 
> Is this not already covered in the following check:
> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
> 		return END_OF_TESTS;
> 
> We are talking about hardcoded constants though.

Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but 
it's also very non-intuitive to let the value underflow and then check for 
that with the > operator.

You're correct that they're constants so one would need to tweak the 
source to end up into this condition in the first place.

Perhaps I'm being overly pendantic here but I in general don't like 
leaving trappy and non-obvious logic like that lying around because one 
day one of such will come back biting.

So, if a variable is unsigned and we ever do subtraction (or adding 
negative numbers to it), I'd prefer additional check to prevent ever 
underflowing it unexpectedly. Or leave them signed.
Reinette Chatre Sept. 5, 2024, 6:08 p.m. UTC | #3
Hi Ilpo,

On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>
>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>> by the performance counters and resctrl respectively.
>>>>>>
>>>>>> While a comparison between performance counters and resctrl is
>>>>>> generally appropriate, they do not have an identical view of
>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>> features that generate memory traffic may drive accesses that are
>>>>>> counted differently by performance counters and MBM respectively,
>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>> against any specific RMID. As a ratio, this different view of memory
>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>
>>>>> Interesting.
>>>>>
>>>>> I did some time back prototype with a change to MBM test such that
>>>>> instead
>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>> through
>>>>> the buffer which allowed me to know how many reads were performed by the
>>>>> benchmark. This yielded numerical difference between all those 3 values
>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>> didn't end up making an usable test.
>>>>>
>>>>> I guess I now have an explanation for at least a part of the
>>>>> differences.
>>>>>
>>>>>> It is not practical to enable/disable the various features that
>>>>>> may generate memory bandwidth to give performance counters and
>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>> bandwidth is low.
>>>>>>
>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>> that has been observed to support adequate comparison between
>>>>>> performance counters and resctrl.
>>>>>>
>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>> ---
>>>>>>     tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>     tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>     2 files changed, 13 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
>>>>>> unsigned long *bw_resc)
>>>>>>       		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>     		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>> THROTTLE_THRESHOLD) {
>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>> MiB).
>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>> +					THROTTLE_THRESHOLD,
>>>>>> +					ALLOCATION_MAX -
>>>>>> ALLOCATION_STEP *
>>>>>> allocation);
>>>>>
>>>>> The second one too should be %d.
>>>>>
>>>>
>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>> that
>>>> follows. Perhaps allocation can be made unsigned instead?
>>>
>>> If you go that way, then it would be good to make the related defines and
>>> allocation in mba_setup() unsigned too, although the latter is a bit scary
>>
>> Sure, will look into that.
>>
>>> because it does allocation -= ALLOCATION_STEP which could underflow if the
>>> defines are ever changed.
>>>
>>
>> Is this not already covered in the following check:
>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>> 		return END_OF_TESTS;
>>
>> We are talking about hardcoded constants though.
> 
> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
> it's also very non-intuitive to let the value underflow and then check for
> that with the > operator.

My understanding is that this is the traditional way of checking overflow
(or more accurately wraparound). There are several examples of this pattern
in the kernel and it is also the pattern that I understand Linus referred
to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
do checking in this way (perform the subtraction and then check if it
overflowed) [2].

> 
> You're correct that they're constants so one would need to tweak the
> source to end up into this condition in the first place.
> 
> Perhaps I'm being overly pendantic here but I in general don't like
> leaving trappy and non-obvious logic like that lying around because one
> day one of such will come back biting.

It is not clear to me what is "trappy" about this. The current checks will
catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?

> 
> So, if a variable is unsigned and we ever do subtraction (or adding
> negative numbers to it), I'd prefer additional check to prevent ever
> underflowing it unexpectedly. Or leave them signed.

To support checking at the time of subtraction we either need to change the
flow of that function since it cannot exit with failure if that subtraction
fails because of overflow/wraparound, or we need to introduce more state that
will be an additional check that the existing
"if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
would have caught anyway.

In either case, to do this checking at the time of subtraction the ideal way
would be to use check_sub_overflow() ... but it again does exactly what
you find to be non-intuitive since the implementation in
tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
first and then checks if overflow occurred.

It is not clear to me what problem you are aiming to solve here. If the
major concern is that the current logic is not obvious, perhaps it can
be clarified with a comment as below:

  	if (runs_per_allocation++ != 0)
  		return 0;
  
+	/*
+	 * Do not attempt allocation outside valid range. Safeguard
+	 * against any potential wraparound caused by subtraction.
+	 */
  	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
Reinette

[1] https://lwn.net/ml/linux-kernel/CAHk-=whS7FSbBoo1gxe+83twO2JeGNsUKMhAcfWymw9auqBvjg@mail.gmail.com/
[2] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
Reinette Chatre Sept. 7, 2024, 12:05 a.m. UTC | #4
Hi Ilpo,

On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> On Thu, 5 Sep 2024, Reinette Chatre wrote:
>> On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
>>> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>>>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>>>
>>>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>>>> by the performance counters and resctrl respectively.
>>>>>>>>
>>>>>>>> While a comparison between performance counters and resctrl is
>>>>>>>> generally appropriate, they do not have an identical view of
>>>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>>>> features that generate memory traffic may drive accesses that are
>>>>>>>> counted differently by performance counters and MBM respectively,
>>>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>>>> against any specific RMID. As a ratio, this different view of
>>>>>>>> memory
>>>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>>>
>>>>>>> Interesting.
>>>>>>>
>>>>>>> I did some time back prototype with a change to MBM test such that
>>>>>>> instead
>>>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>>>> through
>>>>>>> the buffer which allowed me to know how many reads were performed by
>>>>>>> the
>>>>>>> benchmark. This yielded numerical difference between all those 3
>>>>>>> values
>>>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>>>> didn't end up making an usable test.
>>>>>>>
>>>>>>> I guess I now have an explanation for at least a part of the
>>>>>>> differences.
>>>>>>>
>>>>>>>> It is not practical to enable/disable the various features that
>>>>>>>> may generate memory bandwidth to give performance counters and
>>>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>>>> bandwidth is low.
>>>>>>>>
>>>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>>>> that has been observed to support adequate comparison between
>>>>>>>> performance counters and resctrl.
>>>>>>>>
>>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>>>> ---
>>>>>>>>      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>>>      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>>>      2 files changed, 13 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
>>>>>>>> *bw_imc,
>>>>>>>> unsigned long *bw_resc)
>>>>>>>>        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>>>      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>>>> THROTTLE_THRESHOLD) {
>>>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>>>> MiB).
>>>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>>>> +					THROTTLE_THRESHOLD,
>>>>>>>> +					ALLOCATION_MAX -
>>>>>>>> ALLOCATION_STEP *
>>>>>>>> allocation);
>>>>>>>
>>>>>>> The second one too should be %d.
>>>>>>>
>>>>>>
>>>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>>>> that
>>>>>> follows. Perhaps allocation can be made unsigned instead?
>>>>>
>>>>> If you go that way, then it would be good to make the related defines
>>>>> and
>>>>> allocation in mba_setup() unsigned too, although the latter is a bit
>>>>> scary
>>>>
>>>> Sure, will look into that.
>>>>
>>>>> because it does allocation -= ALLOCATION_STEP which could underflow if
>>>>> the
>>>>> defines are ever changed.
>>>>>
>>>>
>>>> Is this not already covered in the following check:
>>>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>>>> 		return END_OF_TESTS;
>>>>
>>>> We are talking about hardcoded constants though.
>>>
>>> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
>>> it's also very non-intuitive to let the value underflow and then check for
>>> that with the > operator.
>>
>> My understanding is that this is the traditional way of checking overflow
>> (or more accurately wraparound). There are several examples of this pattern
>> in the kernel and it is also the pattern that I understand Linus referred
>> to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
>> do checking in this way (perform the subtraction and then check if it
>> overflowed) [2].
> 
> Fair enough. I've never come across that kind of claim before.
> 
>>> You're correct that they're constants so one would need to tweak the
>>> source to end up into this condition in the first place.
>>>
>>> Perhaps I'm being overly pendantic here but I in general don't like
>>> leaving trappy and non-obvious logic like that lying around because one
>>> day one of such will come back biting.
>>
>> It is not clear to me what is "trappy" about this. The current checks will
>> catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?
>>
>>> So, if a variable is unsigned and we ever do subtraction (or adding
>>> negative numbers to it), I'd prefer additional check to prevent ever
>>> underflowing it unexpectedly. Or leave them signed.
>>
>> To support checking at the time of subtraction we either need to change the
>> flow of that function since it cannot exit with failure if that subtraction
>> fails because of overflow/wraparound, or we need to introduce more state that
>> will be an additional check that the existing
>> "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
>> would have caught anyway.
>>
>> In either case, to do this checking at the time of subtraction the ideal way
>> would be to use check_sub_overflow() ... but it again does exactly what
>> you find to be non-intuitive since the implementation in
>> tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
>> first and then checks if overflow occurred.
> 
> It's trappy because by glance, that check looks unnecessary since
> allocation starts from max and goes downwards. Also worth to note,
> the check is not immediately after the decrement but done on the next
> iteration.

Right. This is probably what causes most confusion.

Considering that, what do you think of below that avoids wraparound entirely:

---8<---
From: Reinette Chatre <reinette.chatre@intel.com>
Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious

Within mba_setup() the programmed bandwidth delay value starts
at the maximum (100, or rather ALLOCATION_MAX) and progresses
towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.

The programmed bandwidth delay should never be negative, so
representing it with an unsigned int is most appropriate. This
may introduce confusion because of the "allocation > ALLOCATION_MAX"
check used to check wraparound of the subtraction.

Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
and incrementally, with ALLOCATION_STEP steps, adjust the
bandwidth delay value. This avoids wraparound while making the purpose
of "allocation > ALLOCATION_MAX" clear and eliminates the
need for the "allocation < ALLOCATION_MIN" check.

Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch
---
  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..947d5699f0c8 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
  		     const struct user_params *uparams,
  		     struct resctrl_val_param *p)
  {
-	static int runs_per_allocation, allocation = 100;
+	static unsigned int allocation = ALLOCATION_MIN;
+	static int runs_per_allocation = 0;
  	char allocation_str[64];
  	int ret;
  
@@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (runs_per_allocation++ != 0)
  		return 0;
  
-	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
+	if (allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
  	sprintf(allocation_str, "%d", allocation);
@@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (ret < 0)
  		return ret;
  
-	allocation -= ALLOCATION_STEP;
+	allocation += ALLOCATION_STEP;
  
  	return 0;
  }
@@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
  
  static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  {
-	int allocation, runs;
+	unsigned int allocation;
  	bool ret = false;
+	int runs;
  
  	ksft_print_msg("Results are displayed in (MB)\n");
  	/* Memory bandwidth from 100% down to 10% */
@@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  			       avg_diff_per > MAX_DIFF_PERCENT ?
  			       "Fail:" : "Pass:",
  			       MAX_DIFF_PERCENT,
-			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
  
  		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);


> 
> The risk here is that somebody removes allocation > ALLOCATION_MAX check.
> 
> Something called check_sub_overflow() is not subject to a similar risk
> regardless of what operations occur inside it.

Reinette
Ilpo Järvinen Sept. 9, 2024, 8:13 a.m. UTC | #5
On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > > > > > 
> > > > > > > > > The MBA test incrementally throttles memory bandwidth, each
> > > > > > > > > time
> > > > > > > > > followed by a comparison between the memory bandwidth observed
> > > > > > > > > by the performance counters and resctrl respectively.
> > > > > > > > > 
> > > > > > > > > While a comparison between performance counters and resctrl is
> > > > > > > > > generally appropriate, they do not have an identical view of
> > > > > > > > > memory bandwidth. For example RAS features or memory
> > > > > > > > > performance
> > > > > > > > > features that generate memory traffic may drive accesses that
> > > > > > > > > are
> > > > > > > > > counted differently by performance counters and MBM
> > > > > > > > > respectively,
> > > > > > > > > for instance generating "overhead" traffic which is not
> > > > > > > > > counted
> > > > > > > > > against any specific RMID. As a ratio, this different view of
> > > > > > > > > memory
> > > > > > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > > > > > 
> > > > > > > > Interesting.
> > > > > > > > 
> > > > > > > > I did some time back prototype with a change to MBM test such
> > > > > > > > that
> > > > > > > > instead
> > > > > > > > of using once=false I changed fill_buf to be able to run N
> > > > > > > > passes
> > > > > > > > through
> > > > > > > > the buffer which allowed me to know how many reads were
> > > > > > > > performed by
> > > > > > > > the
> > > > > > > > benchmark. This yielded numerical difference between all those 3
> > > > > > > > values
> > > > > > > > (# of reads, MBM, perf) which also varied from arch to another
> > > > > > > > so it
> > > > > > > > didn't end up making an usable test.
> > > > > > > > 
> > > > > > > > I guess I now have an explanation for at least a part of the
> > > > > > > > differences.
> > > > > > > > 
> > > > > > > > > It is not practical to enable/disable the various features
> > > > > > > > > that
> > > > > > > > > may generate memory bandwidth to give performance counters and
> > > > > > > > > resctrl an identical view. Instead, do not compare performance
> > > > > > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > > > > > bandwidth is low.
> > > > > > > > > 
> > > > > > > > > Bandwidth throttling behaves differently across platforms
> > > > > > > > > so it is not appropriate to drop measurement data simply based
> > > > > > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > > > > > that has been observed to support adequate comparison between
> > > > > > > > > performance counters and resctrl.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > > > > > > > ---
> > > > > > > > >      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > > > > > > > >      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > > > > > > > >      2 files changed, 13 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
> > > > > > > > > *bw_imc,
> > > > > > > > > unsigned long *bw_resc)
> > > > > > > > >        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > > > > > > > >      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > > > > > +		if (avg_bw_imc < THROTTLE_THRESHOLD ||
> > > > > > > > > avg_bw_resc <
> > > > > > > > > THROTTLE_THRESHOLD) {
> > > > > > > > > +			ksft_print_msg("Bandwidth below
> > > > > > > > > threshold (%d
> > > > > > > > > MiB).
> > > > > > > > > Dropping results from MBA schemata %u.\n",
> > > > > > > > > +					THROTTLE_THRESHOLD,
> > > > > > > > > +					ALLOCATION_MAX -
> > > > > > > > > ALLOCATION_STEP *
> > > > > > > > > allocation);
> > > > > > > > 
> > > > > > > > The second one too should be %d.
> > > > > > > > 
> > > > > > > 
> > > > > > > hmmm ... I intended to have it be consistent with the
> > > > > > > ksft_print_msg()
> > > > > > > that
> > > > > > > follows. Perhaps allocation can be made unsigned instead?
> > > > > > 
> > > > > > If you go that way, then it would be good to make the related
> > > > > > defines
> > > > > > and
> > > > > > allocation in mba_setup() unsigned too, although the latter is a bit
> > > > > > scary
> > > > > 
> > > > > Sure, will look into that.
> > > > > 
> > > > > > because it does allocation -= ALLOCATION_STEP which could underflow
> > > > > > if
> > > > > > the
> > > > > > defines are ever changed.
> > > > > > 
> > > > > 
> > > > > Is this not already covered in the following check:
> > > > > 	if (allocation < ALLOCATION_MIN || allocation >
> > > > > ALLOCATION_MAX)
> > > > > 		return END_OF_TESTS;
> > > > > 
> > > > > We are talking about hardcoded constants though.
> > > > 
> > > > Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
> > > > it's also very non-intuitive to let the value underflow and then check
> > > > for
> > > > that with the > operator.
> > > 
> > > My understanding is that this is the traditional way of checking overflow
> > > (or more accurately wraparound). There are several examples of this
> > > pattern
> > > in the kernel and it is also the pattern that I understand Linus referred
> > > to as "traditional" in [1]. Even the compiler's intrinsic overflow
> > > checkers
> > > do checking in this way (perform the subtraction and then check if it
> > > overflowed) [2].
> > 
> > Fair enough. I've never come across that kind of claim before.
> > 
> > > > You're correct that they're constants so one would need to tweak the
> > > > source to end up into this condition in the first place.
> > > > 
> > > > Perhaps I'm being overly pendantic here but I in general don't like
> > > > leaving trappy and non-obvious logic like that lying around because one
> > > > day one of such will come back biting.
> > > 
> > > It is not clear to me what is "trappy" about this. The current checks will
> > > catch the wraparound if somebody changes ALLOCATION_STEP in your scenario,
> > > no?
> > > 
> > > > So, if a variable is unsigned and we ever do subtraction (or adding
> > > > negative numbers to it), I'd prefer additional check to prevent ever
> > > > underflowing it unexpectedly. Or leave them signed.
> > > 
> > > To support checking at the time of subtraction we either need to change
> > > the
> > > flow of that function since it cannot exit with failure if that
> > > subtraction
> > > fails because of overflow/wraparound, or we need to introduce more state
> > > that
> > > will be an additional check that the existing
> > > "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
> > > would have caught anyway.
> > > 
> > > In either case, to do this checking at the time of subtraction the ideal
> > > way
> > > would be to use check_sub_overflow() ... but it again does exactly what
> > > you find to be non-intuitive since the implementation in
> > > tools/include/linux/overflow.h uses the gcc intrinsics that does
> > > subtraction
> > > first and then checks if overflow occurred.
> > 
> > It's trappy because by glance, that check looks unnecessary since
> > allocation starts from max and goes downwards. Also worth to note,
> > the check is not immediately after the decrement but done on the next
> > iteration.
> 
> Right. This is probably what causes most confusion.
> 
> Considering that, what do you think of below that avoids wraparound entirely:
> 
> ---8<---
> From: Reinette Chatre <reinette.chatre@intel.com>
> Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious
> 
> Within mba_setup() the programmed bandwidth delay value starts
> at the maximum (100, or rather ALLOCATION_MAX) and progresses
> towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.
> 
> The programmed bandwidth delay should never be negative, so
> representing it with an unsigned int is most appropriate. This
> may introduce confusion because of the "allocation > ALLOCATION_MAX"
> check used to check wraparound of the subtraction.
> 
> Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
> and incrementally, with ALLOCATION_STEP steps, adjust the
> bandwidth delay value. This avoids wraparound while making the purpose
> of "allocation > ALLOCATION_MAX" clear and eliminates the
> need for the "allocation < ALLOCATION_MIN" check.
> 
> Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - New patch
> ---
>  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/mba_test.c
> b/tools/testing/selftests/resctrl/mba_test.c
> index ab8496a4925b..947d5699f0c8 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
>  		     const struct user_params *uparams,
>  		     struct resctrl_val_param *p)
>  {
> -	static int runs_per_allocation, allocation = 100;
> +	static unsigned int allocation = ALLOCATION_MIN;
> +	static int runs_per_allocation = 0;
>  	char allocation_str[64];
>  	int ret;
>  @@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
>  	if (runs_per_allocation++ != 0)
>  		return 0;
>  -	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
> +	if (allocation > ALLOCATION_MAX)
>  		return END_OF_TESTS;
>   	sprintf(allocation_str, "%d", allocation);
> @@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
>  	if (ret < 0)
>  		return ret;
>  -	allocation -= ALLOCATION_STEP;
> +	allocation += ALLOCATION_STEP;
>   	return 0;
>  }
> @@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
>   static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>  {
> -	int allocation, runs;
> +	unsigned int allocation;
>  	bool ret = false;
> +	int runs;
>   	ksft_print_msg("Results are displayed in (MB)\n");
>  	/* Memory bandwidth from 100% down to 10% */
> @@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned
> long *bw_resc)
>  			       avg_diff_per > MAX_DIFF_PERCENT ?
>  			       "Fail:" : "Pass:",
>  			       MAX_DIFF_PERCENT,
> -			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
> +			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
>   		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
>  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);

Looks fine.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
diff mbox series

Patch

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index cad473b81a64..204b9ac4b108 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -96,6 +96,13 @@  static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
 
 		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
 		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
+			ksft_print_msg("Bandwidth below threshold (%d MiB).  Dropping results from MBA schemata %u.\n",
+					THROTTLE_THRESHOLD,
+					ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			break;
+		}
+
 		avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
 		avg_diff_per = (int)(avg_diff * 100);
 
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index 0e5456165a6a..e65c5fb76b17 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -43,6 +43,12 @@ 
 
 #define DEFAULT_SPAN		(250 * MB)
 
+/*
+ * Memory bandwidth (in MiB) below which the bandwidth comparisons
+ * between iMC and resctrl are considered unreliable.
+ */
+#define THROTTLE_THRESHOLD	750
+
 /*
  * user_params:		User supplied parameters
  * @cpu:		CPU number to which the benchmark will be bound to