mbox series

[0/5] Enable rteval on Arm based systems

Message ID 20210901080816.721731-1-punitagrawal@gmail.com
Headers show
Series Enable rteval on Arm based systems | expand

Message

Punit Agrawal Sept. 1, 2021, 8:08 a.m. UTC
Hi,

The following small set of patches were required to enable rteval on
Arm based systems. Of these, the first two patches have been posted
before[0] and have been updated to fix some issues noticed since then.

The remaining three patches are new and fix issues encountered while
running rteval on a 64bit Arm platform.

All feedback welcome.

Thanks,
Punit

[0] https://lore.kernel.org/linux-rt-users/20210224021603.446274-1-punit1.agrawal@toshiba.co.jp/

Punit Agrawal (5):
  rteval: systopology.py: Add support for systems that don't have Numa
  rteval: cyclictest.py: Update logic to get core description
  rteval: kernel.py: Add support for kthreads running with deadline
    policy
  rteval: hackbench.py: Enable running on a system with low memory
  rteval: kcompile.py: Gracefully handle missing affinity information

 rteval/misc.py                           | 23 +++++++++++++-
 rteval/modules/loads/hackbench.py        |  3 +-
 rteval/modules/loads/kcompile.py         |  6 +++-
 rteval/modules/measurement/cyclictest.py |  8 +++--
 rteval/sysinfo/kernel.py                 |  2 +-
 rteval/systopology.py                    | 38 ++++++++++++++++++++----
 6 files changed, 66 insertions(+), 14 deletions(-)

Comments

John Kacur Sept. 9, 2021, 12:34 p.m. UTC | #1
On Wed, 1 Sep 2021, Punit Agrawal wrote:

> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> 

> Certain systems such as Arm v7 do not have support for Numa nodes,

> i.e., "/sys/devices/system/node*" does not exist. Instead of erroring

> out in this situation, it would be better if rteval could use

> alternate sources to get the system topology and memory information.

> 

> Introduce the notion of a fake Numa node (as a class) which is used

> when no numa nodes are found on the system. Other than the

> constructor, it provides the same interface as the existing NumaNode

> class so existing users should work without any changes.

> 

> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> ---

>  rteval/systopology.py | 38 ++++++++++++++++++++++++++++++++------

>  1 file changed, 32 insertions(+), 6 deletions(-)

> 

> diff --git a/rteval/systopology.py b/rteval/systopology.py

> index c61ec1a58514..7ce9a8c4f707 100644

> --- a/rteval/systopology.py

> +++ b/rteval/systopology.py

> @@ -191,6 +191,31 @@ class NumaNode:

>          """ return list of cpus for this node """

>          return self.cpus.getcpulist()

>  

> +class FakeNumaNode(NumaNode):

> +    """class representing a fake NUMA node. The fake NUMA node is used on

> +    systems which don't have NUMA enabled (no

> +    /sys/devices/system/node) such as Arm v7

> +

> +    """

> +

> +    cpupath = '/sys/devices/system/cpu'

> +    mempath = '/proc/meminfo'

> +

> +    def __init__(self):

> +        self.nodeid = 0

> +        self.cpus = CpuList(sysread(FakeNumaNode.cpupath, "possible"))

> +        self.getmeminfo()

> +

> +    def getmeminfo(self):

> +        self.meminfo = {}

> +        for l in open(FakeNumaNode.mempath, "r"):

> +            elements = l.split()

> +            key = elements[0][0:-1]

> +            val = int(elements[1])

> +            if len(elements) == 3 and elements[2] == "kB":

> +                val *= 1024

> +            self.meminfo[key] = val

> +

>  #

>  # Class to abstract the system topology of numa nodes and cpus

>  #

> @@ -238,12 +263,13 @@ class SysTopology:

>  

>      def getinfo(self):

>          nodes = glob.glob(os.path.join(SysTopology.nodepath, 'node[0-9]*'))

> -        if not nodes:

> -            raise RuntimeError("No valid nodes found in %s!" % SysTopology.nodepath)

> -        nodes.sort()

> -        for n in nodes:

> -            node = int(os.path.basename(n)[4:])

> -            self.nodes[node] = NumaNode(n)

> +        if nodes:

> +            nodes.sort()

> +            for n in nodes:

> +                node = int(os.path.basename(n)[4:])

> +                self.nodes[node] = NumaNode(n)

> +        else:

> +            self.nodes[0] = FakeNumaNode()

>  

>      def getnodes(self):

>          return list(self.nodes.keys())

> -- 

> 2.32.0

> 

> 


This is quite clever. I am just going to rename Fake to Sim as short for 
simulated, but other than that.

Signed-off-by: John Kacur <jkacur@redhat.com>
John Kacur Sept. 9, 2021, 12:47 p.m. UTC | #2
On Wed, 1 Sep 2021, Punit Agrawal wrote:

> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> 

> When running rteval on a system with kthreads running with "deadline"

> policy, an exception is encountered when parsing the output of "ps".

> 

>     [DEBUG] cmd: /usr/bin/ps -eocommand,pid,policy,rtprio,comm

>     Traceback (most recent call last):

>     ...

>       File "...rteval/rteval/sysinfo/kernel.py", line 60, in kernel_get_kthreads

> 	ret_kthreads[v[0]] = {'policy' : policies[bytes.decode(v[1])],

>       KeyError: 'DLN'

> 

> The kernel uses deadline policy for "schedutil" cpufreq governor

> threads. Fix the crash in rteval by adding support for "deadline" to

> the list of policies.

> 

> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> ---

>  rteval/sysinfo/kernel.py | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/rteval/sysinfo/kernel.py b/rteval/sysinfo/kernel.py

> index 97ad9402b13e..f2e9d72ac2ef 100644

> --- a/rteval/sysinfo/kernel.py

> +++ b/rteval/sysinfo/kernel.py

> @@ -44,7 +44,7 @@ class KernelInfo:

>  

>  

>      def kernel_get_kthreads(self):

> -        policies = {'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

> +        policies = {'DLN': 'deadline', 'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

>          ret_kthreads = {}

>          self.__log(Log.DEBUG, "getting kthread status")

>          cmd = '%s -eocommand,pid,policy,rtprio,comm' % getcmdpath('ps')

> -- 

> 2.32.0

> 

> 


Signed-off-by: John Kacur <jkacur@redhat.com>
John Kacur Sept. 9, 2021, 7:23 p.m. UTC | #3
On Wed, 1 Sep 2021, Punit Agrawal wrote:

> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> 

> kcompile either sets the affinity of the workload threads to a user

> specified list of cpus (using taskset) or binds the threads to the

> local numa node (using numactl). In the absence of both the following

> hard to parse error is thrown -

> 

>     Exception in thread kcompile:

>     Traceback (most recent call last):

>     ...

>       File "...rteval/rteval/modules/loads/kcompile.py", line 55, in __init__

> 	self.binder = 'taskset -c %s' % compress_cpulist(cpulist)

>       File "...rteval/rteval/misc.py", line 62, in compress_cpulist

> 	if isinstance(cpulist[0], int):

>     TypeError: 'NoneType' object is not subscriptable

> 

> Instead, add a warning to report the missing affinity information /

> tools and continue executing the workload with no affinity - relying

> on the affinity settings being set by the kernel.

> 

> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> ---

>  rteval/modules/loads/kcompile.py | 6 +++++-

>  1 file changed, 5 insertions(+), 1 deletion(-)

> 

> diff --git a/rteval/modules/loads/kcompile.py b/rteval/modules/loads/kcompile.py

> index d1955c7aee3d..5c83b78525e5 100644

> --- a/rteval/modules/loads/kcompile.py

> +++ b/rteval/modules/loads/kcompile.py

> @@ -51,8 +51,12 @@ class KBuildJob:

>              os.mkdir(self.objdir)

>          if os.path.exists('/usr/bin/numactl') and not cpulist:

>              self.binder = 'numactl --cpunodebind %d' % int(self.node)

> -        else:

> +        elif cpulist is not None:

>              self.binder = 'taskset -c %s' % compress_cpulist(cpulist)

> +        else:

> +            self.log(Log.WARN, 'Unable to set job affinity - please install "numactl" or pass "cpulist"')

> +            self.log(Log.WARN, 'Running with default OS decided affinity')

> +            self.binder = ''

>          if cpulist:

>              self.jobs = self.calc_jobs_per_cpu() * len(cpulist)

>          else:

> -- 

> 2.32.0

> 

> 


Wow, that's some messy code! (Not yours, the original).

I think for embedded arm distros, numactl is not always available, so we 
shouldn't tell users to install it, and just make the code work when it's 
not there. Also, it should be perfectly fine to run without passing a 
cpulist, so it's not something we should warn the users about.

You've got the right idea to just make the self.binder empty.
I've got a version that cleans-up a few more things,
I've tested with no numactl and no cpulist, no numactl and a cpulist
with numactl and no cpulist and with numactl and with a cpulist.

Could you give it a test on your arm system?
Sending in a separte mail

John
Punit Agrawal Sept. 13, 2021, 2:28 a.m. UTC | #4
Hi John,

John Kacur <jkacur@redhat.com> writes:

> On Wed, 1 Sep 2021, Punit Agrawal wrote:

>

>> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> 

>> Certain systems such as Arm v7 do not have support for Numa nodes,

>> i.e., "/sys/devices/system/node*" does not exist. Instead of erroring

>> out in this situation, it would be better if rteval could use

>> alternate sources to get the system topology and memory information.

>> 

>> Introduce the notion of a fake Numa node (as a class) which is used

>> when no numa nodes are found on the system. Other than the

>> constructor, it provides the same interface as the existing NumaNode

>> class so existing users should work without any changes.

>> 

>> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> ---

>>  rteval/systopology.py | 38 ++++++++++++++++++++++++++++++++------

>>  1 file changed, 32 insertions(+), 6 deletions(-)

>> 

>> diff --git a/rteval/systopology.py b/rteval/systopology.py

>> index c61ec1a58514..7ce9a8c4f707 100644

>> --- a/rteval/systopology.py

>> +++ b/rteval/systopology.py

>> @@ -191,6 +191,31 @@ class NumaNode:

>>          """ return list of cpus for this node """

>>          return self.cpus.getcpulist()

>>  

>> +class FakeNumaNode(NumaNode):

>> +    """class representing a fake NUMA node. The fake NUMA node is used on

>> +    systems which don't have NUMA enabled (no

>> +    /sys/devices/system/node) such as Arm v7

>> +

>> +    """

>> +

>> +    cpupath = '/sys/devices/system/cpu'

>> +    mempath = '/proc/meminfo'

>> +

>> +    def __init__(self):

>> +        self.nodeid = 0

>> +        self.cpus = CpuList(sysread(FakeNumaNode.cpupath, "possible"))

>> +        self.getmeminfo()

>> +

>> +    def getmeminfo(self):

>> +        self.meminfo = {}

>> +        for l in open(FakeNumaNode.mempath, "r"):

>> +            elements = l.split()

>> +            key = elements[0][0:-1]

>> +            val = int(elements[1])

>> +            if len(elements) == 3 and elements[2] == "kB":

>> +                val *= 1024

>> +            self.meminfo[key] = val

>> +

>>  #

>>  # Class to abstract the system topology of numa nodes and cpus

>>  #

>> @@ -238,12 +263,13 @@ class SysTopology:

>>  

>>      def getinfo(self):

>>          nodes = glob.glob(os.path.join(SysTopology.nodepath, 'node[0-9]*'))

>> -        if not nodes:

>> -            raise RuntimeError("No valid nodes found in %s!" % SysTopology.nodepath)

>> -        nodes.sort()

>> -        for n in nodes:

>> -            node = int(os.path.basename(n)[4:])

>> -            self.nodes[node] = NumaNode(n)

>> +        if nodes:

>> +            nodes.sort()

>> +            for n in nodes:

>> +                node = int(os.path.basename(n)[4:])

>> +                self.nodes[node] = NumaNode(n)

>> +        else:

>> +            self.nodes[0] = FakeNumaNode()

>>  

>>      def getnodes(self):

>>          return list(self.nodes.keys())

>> -- 

>> 2.32.0

>> 

>> 

>

> This is quite clever. I am just going to rename Fake to Sim as short for 

> simulated, but other than that.

>

> Signed-off-by: John Kacur <jkacur@redhat.com>


Thanks for taking a look.

I'll take a look at the updates you sent and respond there as
appropriate.

Regards,
Punit
Punit Agrawal Sept. 15, 2021, 8:45 a.m. UTC | #5
Hi John,

John Kacur <jkacur@redhat.com> writes:

> On Wed, 1 Sep 2021, Punit Agrawal wrote:

>

>> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> 

>> When running rteval on a system with kthreads running with "deadline"

>> policy, an exception is encountered when parsing the output of "ps".

>> 

>>     [DEBUG] cmd: /usr/bin/ps -eocommand,pid,policy,rtprio,comm

>>     Traceback (most recent call last):

>>     ...

>>       File "...rteval/rteval/sysinfo/kernel.py", line 60, in kernel_get_kthreads

>> 	ret_kthreads[v[0]] = {'policy' : policies[bytes.decode(v[1])],

>>       KeyError: 'DLN'

>> 

>> The kernel uses deadline policy for "schedutil" cpufreq governor

>> threads. Fix the crash in rteval by adding support for "deadline" to

>> the list of policies.

>> 

>> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> ---

>>  rteval/sysinfo/kernel.py | 2 +-

>>  1 file changed, 1 insertion(+), 1 deletion(-)

>> 

>> diff --git a/rteval/sysinfo/kernel.py b/rteval/sysinfo/kernel.py

>> index 97ad9402b13e..f2e9d72ac2ef 100644

>> --- a/rteval/sysinfo/kernel.py

>> +++ b/rteval/sysinfo/kernel.py

>> @@ -44,7 +44,7 @@ class KernelInfo:

>>  

>>  

>>      def kernel_get_kthreads(self):

>> -        policies = {'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

>> +        policies = {'DLN': 'deadline', 'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

>>          ret_kthreads = {}

>>          self.__log(Log.DEBUG, "getting kthread status")

>>          cmd = '%s -eocommand,pid,policy,rtprio,comm' % getcmdpath('ps')

>> -- 

>> 2.32.0

>> 

>> 

>

> Signed-off-by: John Kacur <jkacur@redhat.com>


It looks like this patch and the other one converting hackbench memory
check to a warning was missed when applying the rest of the series -
John, could you please pick these as well.

Thanks,
Punit
John Kacur Sept. 15, 2021, 12:17 p.m. UTC | #6
On Wed, 15 Sep 2021, Punit Agrawal wrote:

> Hi John,

> 

> John Kacur <jkacur@redhat.com> writes:

> 

> > On Wed, 1 Sep 2021, Punit Agrawal wrote:

> >

> >> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> >> 

> >> When running rteval on a system with kthreads running with "deadline"

> >> policy, an exception is encountered when parsing the output of "ps".

> >> 

> >>     [DEBUG] cmd: /usr/bin/ps -eocommand,pid,policy,rtprio,comm

> >>     Traceback (most recent call last):

> >>     ...

> >>       File "...rteval/rteval/sysinfo/kernel.py", line 60, in kernel_get_kthreads

> >> 	ret_kthreads[v[0]] = {'policy' : policies[bytes.decode(v[1])],

> >>       KeyError: 'DLN'

> >> 

> >> The kernel uses deadline policy for "schedutil" cpufreq governor

> >> threads. Fix the crash in rteval by adding support for "deadline" to

> >> the list of policies.

> >> 

> >> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

> >> ---

> >>  rteval/sysinfo/kernel.py | 2 +-

> >>  1 file changed, 1 insertion(+), 1 deletion(-)

> >> 

> >> diff --git a/rteval/sysinfo/kernel.py b/rteval/sysinfo/kernel.py

> >> index 97ad9402b13e..f2e9d72ac2ef 100644

> >> --- a/rteval/sysinfo/kernel.py

> >> +++ b/rteval/sysinfo/kernel.py

> >> @@ -44,7 +44,7 @@ class KernelInfo:

> >>  

> >>  

> >>      def kernel_get_kthreads(self):

> >> -        policies = {'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

> >> +        policies = {'DLN': 'deadline', 'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

> >>          ret_kthreads = {}

> >>          self.__log(Log.DEBUG, "getting kthread status")

> >>          cmd = '%s -eocommand,pid,policy,rtprio,comm' % getcmdpath('ps')

> >> -- 

> >> 2.32.0

> >> 

> >> 

> >

> > Signed-off-by: John Kacur <jkacur@redhat.com>

> 

> It looks like this patch and the other one converting hackbench memory

> check to a warning was missed when applying the rest of the series -

> John, could you please pick these as well.

> 

> Thanks,

> Punit

> 


I added the patch to support deadline policy, the other one had already 
been added, I just hadn't pushed upstream yet, but I just did so, so go 
ahead and pull.

John
Punit Agrawal Sept. 16, 2021, 11:34 a.m. UTC | #7
John Kacur <jkacur@redhat.com> writes:

> On Wed, 15 Sep 2021, Punit Agrawal wrote:

>

>> Hi John,

>> 

>> John Kacur <jkacur@redhat.com> writes:

>> 

>> > On Wed, 1 Sep 2021, Punit Agrawal wrote:

>> >

>> >> From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> >> 

>> >> When running rteval on a system with kthreads running with "deadline"

>> >> policy, an exception is encountered when parsing the output of "ps".

>> >> 

>> >>     [DEBUG] cmd: /usr/bin/ps -eocommand,pid,policy,rtprio,comm

>> >>     Traceback (most recent call last):

>> >>     ...

>> >>       File "...rteval/rteval/sysinfo/kernel.py", line 60, in kernel_get_kthreads

>> >> 	ret_kthreads[v[0]] = {'policy' : policies[bytes.decode(v[1])],

>> >>       KeyError: 'DLN'

>> >> 

>> >> The kernel uses deadline policy for "schedutil" cpufreq governor

>> >> threads. Fix the crash in rteval by adding support for "deadline" to

>> >> the list of policies.

>> >> 

>> >> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>

>> >> ---

>> >>  rteval/sysinfo/kernel.py | 2 +-

>> >>  1 file changed, 1 insertion(+), 1 deletion(-)

>> >> 

>> >> diff --git a/rteval/sysinfo/kernel.py b/rteval/sysinfo/kernel.py

>> >> index 97ad9402b13e..f2e9d72ac2ef 100644

>> >> --- a/rteval/sysinfo/kernel.py

>> >> +++ b/rteval/sysinfo/kernel.py

>> >> @@ -44,7 +44,7 @@ class KernelInfo:

>> >>  

>> >>  

>> >>      def kernel_get_kthreads(self):

>> >> -        policies = {'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

>> >> +        policies = {'DLN': 'deadline', 'FF':'fifo', 'RR':'rrobin', 'TS':'other', '?':'unknown'}

>> >>          ret_kthreads = {}

>> >>          self.__log(Log.DEBUG, "getting kthread status")

>> >>          cmd = '%s -eocommand,pid,policy,rtprio,comm' % getcmdpath('ps')

>> >> -- 

>> >> 2.32.0

>> >> 

>> >> 

>> >

>> > Signed-off-by: John Kacur <jkacur@redhat.com>

>> 

>> It looks like this patch and the other one converting hackbench memory

>> check to a warning was missed when applying the rest of the series -

>> John, could you please pick these as well.

>> 

>> Thanks,

>> Punit

>> 

>

> I added the patch to support deadline policy, the other one had already 

> been added, I just hadn't pushed upstream yet, but I just did so, so go 

> ahead and pull.


Thanks, I see both the patches now.