From patchwork Tue Feb 16 14:54:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: gary.robertson@linaro.org X-Patchwork-Id: 62038 Delivered-To: patch@linaro.org Received: by 10.112.43.199 with SMTP id y7csp1711398lbl; Tue, 16 Feb 2016 06:56:43 -0800 (PST) X-Received: by 10.55.204.194 with SMTP id n63mr15953202qkl.12.1455634602923; Tue, 16 Feb 2016 06:56:42 -0800 (PST) Return-Path: Received: from lists.linaro.org (lists.linaro.org. [54.225.227.206]) by mx.google.com with ESMTP id z2si40940523qkg.60.2016.02.16.06.56.42; Tue, 16 Feb 2016 06:56:42 -0800 (PST) Received-SPF: pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) client-ip=54.225.227.206; Authentication-Results: mx.google.com; spf=pass (google.com: domain of lng-odp-bounces@lists.linaro.org designates 54.225.227.206 as permitted sender) smtp.mailfrom=lng-odp-bounces@lists.linaro.org; dkim=neutral (body hash did not verify) header.i=@linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id 832B361752; Tue, 16 Feb 2016 14:56:42 +0000 (UTC) Authentication-Results: lists.linaro.org; dkim=fail reason="verification failed; unprotected key" header.d=linaro.org header.i=@linaro.org header.b=Hl/K3kgl; dkim-adsp=none (unprotected policy); dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on ip-10-142-244-252 X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, T_DKIM_INVALID, URIBL_BLOCKED autolearn=disabled version=3.4.0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by lists.linaro.org (Postfix) with ESMTP id 49AAE617DC; Tue, 16 Feb 2016 14:55:24 +0000 (UTC) X-Original-To: lng-odp@lists.linaro.org Delivered-To: lng-odp@lists.linaro.org Received: by lists.linaro.org (Postfix, from userid 109) id 7289861752; Tue, 16 Feb 2016 14:55:17 +0000 (UTC) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by lists.linaro.org (Postfix) with ESMTPS id 1726D61713 for ; Tue, 16 Feb 2016 14:55:07 +0000 (UTC) Received: by mail-ob0-f182.google.com with SMTP id jq7so77375513obb.0 for ; Tue, 16 Feb 2016 06:55:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/A8TNbcfp0rhOecbXUI0gR5fyiqjKuKP36S3OGvv0xE=; b=Hl/K3kglWOh1tbl29buKELFtZLuLgq+eskLEOCbfYMdqY27Hii42iPBTNk7+a1ThL7 tVXMXcK6MlT/3k635otFA6Kr6NhOrX+Nnl/4aFOe7qPwgAdjUpx/rlUL4btM4NrUV8Rl pq1odIyb1B1ISLc/PsKzPYBCNN6yMu1hqQvog= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/A8TNbcfp0rhOecbXUI0gR5fyiqjKuKP36S3OGvv0xE=; b=h6FVI4afV1oc+9w+wffiPlPCJPQ1LvP1jsswEB9SDsTdiFbwXh5xtaq3FtmdGZ3e5i G1l3qs5ZlU71N7fEGmwp++Dzh+QCo1Ge6Z//suwm4pU6+YuPiu9OCEfH7aOZaL8gD1i1 jJtnmBVgAEqEtTeGww1YKihV6qDDOh5DBYudijKMcuyVNhGqTxUXaadyErXcgyQjtVxK IPbCTB+fywrifDGeW36Zv0jWsz6BAgPUhbEE9mRYQCsfDgfamVLmJszipvFqbg1SyA5g GevXBV1a41PuXQOQHlBy/R/GBngehiWBJI7ZOK8Vtx16xRt3iq3eTDWz0k0hniBb/RCQ ov9g== X-Gm-Message-State: AG10YOQTMEG1n0Ca76GO8ZFyryb/b9qyQZWTB+ll9QIKU9hwpGJMOEhXBrNO+KMF99ojT36L57w= X-Received: by 10.182.108.201 with SMTP id hm9mr13503473obb.16.1455634475743; Tue, 16 Feb 2016 06:54:35 -0800 (PST) Received: from honkintosh.cybertech.lan (65-120-133-114.dia.static.qwest.net. [65.120.133.114]) by smtp.googlemail.com with ESMTPSA id h127sm11292528oib.12.2016.02.16.06.54.34 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 16 Feb 2016 06:54:35 -0800 (PST) From: "Gary S. Robertson" To: mike.holmes@linaro.org, bill.fischofer@linaro.org, maxim.uvarov@linaro.org, anders.roxell@linaro.org, petri.savolainen@linaro.org Date: Tue, 16 Feb 2016 08:54:20 -0600 Message-Id: <1455634460-26806-3-git-send-email-gary.robertson@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1455634460-26806-1-git-send-email-gary.robertson@linaro.org> References: <56C2ED44.5010209@linaro.org> <1455634460-26806-1-git-send-email-gary.robertson@linaro.org> X-Topics: patch Cc: lng-odp@lists.linaro.org Subject: [lng-odp] [PATCH V2 2/2] linux-generic: Make cpu detection work with NO_HZ_FULL X-BeenThere: lng-odp@lists.linaro.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: "The OpenDataPlane \(ODP\) List" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: lng-odp-bounces@lists.linaro.org Sender: "lng-odp" sched_getaffinity() and pthread_getaffinity_np() do not return an accurate mask of all CPUs in the machine when the kernel is compiled with NO_HZ_FULL support. See Linaro BUG 2027 for details. Additionally, performance of tasks on isolated CPU cores may be compromised if other tasks are run on 'thread siblings' of those cores which share portions of the CPU core hardware (such as core-local cache memory). This code replaces the 'getaffinity' based CPU discovery logic and restricts worker CPU selection to use only one CPU per core when there are 'thread siblings' -aka 'hyperthread CPUs'- present. Thread siblings are not an issue for control CPUs and so are included in the 'control' cpumask along with their 'primary' counterpart. The results of these changes which address BUG 2027 are: (1) all CPUs known to the kernel at boot time are considered for use by ODP regardless of the default CPU affinity masks set by the kernel scheduler, (2) hyperthreaded CPUs are not used for default worker cpumasks, -and- (3) an additional 'hyperthreaded' CPU may be included in the default control cpumask. Also - this code: (a) adds control worker cpumasks to the linux-generic global data (b) adds logic to odp_init_global() to initialize these masks, -and- (c) reduces odp_cpumask_default_control() and odp_cpumask_default_worker() to use the content of these new cpumasks without modification. These changes provide prerequisite infrastructure for pending changes which will allow ODP to accept cpumasks passed in from external entities such as a provisioning service. Signed-off-by: Gary S. Robertson --- platform/linux-generic/include/odp_internal.h | 31 ++-- platform/linux-generic/odp_cpumask_task.c | 45 +++-- platform/linux-generic/odp_init.c | 230 +++++++++++++++++++++++++- platform/linux-generic/odp_system_info.c | 14 +- 4 files changed, 272 insertions(+), 48 deletions(-) diff --git a/platform/linux-generic/include/odp_internal.h b/platform/linux-generic/include/odp_internal.h index e75154a..77ba6b0 100644 --- a/platform/linux-generic/include/odp_internal.h +++ b/platform/linux-generic/include/odp_internal.h @@ -19,6 +19,7 @@ extern "C" { #endif #include +#include #include #include @@ -40,26 +41,32 @@ struct odp_global_data_s { odp_log_func_t log_fn; odp_abort_func_t abort_fn; odp_system_info_t system_info; + odp_cpumask_t control_cpus; + odp_cpumask_t worker_cpus; }; enum init_stage { NO_INIT = 0, /* No init stages completed */ - TIME_INIT = 1, - SYSINFO_INIT = 2, - SHM_INIT = 3, - THREAD_INIT = 4, - POOL_INIT = 5, - QUEUE_INIT = 6, - SCHED_INIT = 7, - PKTIO_INIT = 8, - TIMER_INIT = 9, - CRYPTO_INIT = 10, - CLASSIFICATION_INIT = 11, - ALL_INIT = 12 /* All init stages completed */ + CPUMASK_INIT = 1, + TIME_INIT = 2, + SYSINFO_INIT = 3, + SHM_INIT = 4, + THREAD_INIT = 5, + POOL_INIT = 6, + QUEUE_INIT = 7, + SCHED_INIT = 8, + PKTIO_INIT = 9, + TIMER_INIT = 10, + CRYPTO_INIT = 11, + CLASSIFICATION_INIT = 12, + ALL_INIT = 13 /* All init stages completed */ }; extern struct odp_global_data_s odp_global_data; +/* Number of logical CPUs detected at boot time */ +extern int numcpus; + int _odp_term_global(enum init_stage stage); int _odp_term_local(enum init_stage stage); diff --git a/platform/linux-generic/odp_cpumask_task.c b/platform/linux-generic/odp_cpumask_task.c index c5093e0..36a158b 100644 --- a/platform/linux-generic/odp_cpumask_task.c +++ b/platform/linux-generic/odp_cpumask_task.c @@ -12,55 +12,52 @@ #include #include +/* + * The following functions assume that odp_init_global() or some external + * logic has previously initialized the globally accessible cpumasks + * for ODP control and worker CPU selections. + */ int odp_cpumask_default_worker(odp_cpumask_t *mask, int num) { - int ret, cpu, i; - cpu_set_t cpuset; - - ret = pthread_getaffinity_np(pthread_self(), - sizeof(cpu_set_t), &cpuset); - if (ret != 0) - ODP_ABORT("failed to read CPU affinity value\n"); - - odp_cpumask_zero(mask); + odp_cpumask_t overlap; + int cpu, i; /* * If no user supplied number or it's too large, then attempt * to use all CPUs */ - if (0 == num || CPU_SETSIZE < num) - num = CPU_COUNT(&cpuset); + cpu = odp_cpumask_count(&odp_global_data.worker_cpus); + if (0 == num || cpu < num) + num = cpu; /* build the mask, allocating down from highest numbered CPU */ + odp_cpumask_zero(mask); for (cpu = 0, i = CPU_SETSIZE - 1; i >= 0 && cpu < num; --i) { - if (CPU_ISSET(i, &cpuset)) { + if (odp_cpumask_isset(&odp_global_data.worker_cpus, i)) { odp_cpumask_set(mask, i); cpu++; } } - if (odp_cpumask_isset(mask, 0)) - ODP_DBG("\n\tCPU0 will be used for both control and worker threads,\n" - "\tthis will likely have a performance impact on the worker thread.\n"); + odp_cpumask_and(&overlap, mask, &odp_global_data.control_cpus); + if (odp_cpumask_count(&overlap)) + ODP_DBG("\n\tWorker and Control CPU selections overlap...\n" + "\tthis will likely have a performance impact on the worker threads.\n"); return cpu; } int odp_cpumask_default_control(odp_cpumask_t *mask, int num ODP_UNUSED) { - odp_cpumask_zero(mask); - /* By default all control threads on CPU 0 */ - odp_cpumask_set(mask, 0); - return 1; + odp_cpumask_copy(mask, &odp_global_data.control_cpus); + + return odp_cpumask_count(mask); } int odp_cpumask_all_available(odp_cpumask_t *mask) { - odp_cpumask_t mask_work, mask_ctrl; - - odp_cpumask_default_worker(&mask_work, 0); - odp_cpumask_default_control(&mask_ctrl, 0); - odp_cpumask_or(mask, &mask_work, &mask_ctrl); + odp_cpumask_or(mask, &odp_global_data.worker_cpus, + &odp_global_data.control_cpus); return odp_cpumask_count(mask); } diff --git a/platform/linux-generic/odp_init.c b/platform/linux-generic/odp_init.c index 3a990d2..775be4b 100644 --- a/platform/linux-generic/odp_init.c +++ b/platform/linux-generic/odp_init.c @@ -4,13 +4,230 @@ * SPDX-License-Identifier: BSD-3-Clause */ -#include -#include +#include + +#include +#include +#include +#include +#include + +#include #include #include +#include +#include struct odp_global_data_s odp_global_data; +static char pathname_buf[257]; +static char cpuname[5]; + +/* cpumask of logical CPUs representing physical CPU cores */ +static cpu_set_t primary_cpus; +/* cpumask of thread siblings for CPU 0 */ +static cpu_set_t hyperthread_cpus; + +/* + * Populate a cpumask of 'primary' CPUs and a cpumask of + * hyperthread siblings for the current 'primary' CPU. + */ +static inline void process_sibling_list(void) +{ + char *endptr; + char *remaining; + char *cur_token; + long int cpu_long; + int cpu; + + cur_token = strtok_r(pathname_buf, ",", &remaining); + if (cur_token) { + errno = 0; + cpu_long = strtol(cur_token, &endptr, 10); + if (!(errno || (endptr == cur_token))) { + /* + * Mark the first sibling as a 'primary' CPU + */ + cpu = (int)cpu_long; + CPU_SET(cpu, &primary_cpus); + + /* Check for hyperthread siblings */ + cur_token = strtok_r((char *)NULL, ",", &remaining); + /* + * For default CPU availability we only care about + * siblings for CPU 0 - the only 'primary' CPU + * used for 'control' tasks + */ + while (cur_token && !cpu) { + errno = 0; + cpu_long = strtol(cur_token, &endptr, 10); + if (!(errno || (endptr == cur_token))) { + /* + * Mark any other siblings found + * as 'hyperthread' CPUs + */ + CPU_SET(cpu_long, &hyperthread_cpus); + } + cur_token = strtok_r((char *)NULL, ",", + &remaining); + } + } + } +} + +/* + * Examine the topology information for the current configured 'logical CPU' + * and populate a cpumask of 'primary' CPUs and a cpumask of + * hyperthread siblings for the current 'primary' CPU. + */ +static int process_cpu_info_dir(long int cpu_idnum) +{ + FILE *cpulist_file; + char *unused; + + /* Track number of logical CPUs discovered */ + if (numcpus < (int)(cpu_idnum + 1)) + numcpus = (int)(cpu_idnum + 1); + + if (cpu_idnum < CPU_SETSIZE) { + /* Build a pathname to the CPU siblings list */ + strcpy(pathname_buf, "/sys/devices/system/cpu/cpu"); + sprintf(cpuname, "%ld", cpu_idnum); + strcat(pathname_buf, cpuname); + strcat(pathname_buf, "/topology/thread_siblings_list"); + + /* Open the siblings list file */ + cpulist_file = fopen(pathname_buf, "r"); + if (cpulist_file) { + /* Read and process the thread sibling list */ + unused = fgets(pathname_buf, + (int)(sizeof(pathname_buf) - 1), + cpulist_file); + /* Make the C compiler happy - use 'unused' */ + if (unused) + unused = (char *)NULL; + process_sibling_list(); + fclose(cpulist_file); + return 0; + } else { + return -1; + } + } else { + return -1; + } +} + +/* + * We need to know about all CPUs which were discovered at boot time. + * Furthermore, on platforms with 'hyperthreading' enabled, each physical + * CPU core shows up as two (or more) logical CPUs despite the fact that + * the 'logical CPUs' share some of the hardware in the physical CPU core. + * Consequently performance of an isolated CPU may be compromised if its + * 'hyperthread CPU' siblings are also running tasks. + * The code below populates a cpumask of available physical CPU cores + * as well as a cpumask of hyperthreaded siblings for 'control' CPU 0. + */ +static int get_cpu_topology(void) +{ + char *numptr; + char *endptr; + long int cpu_idnum; + DIR *d; + struct dirent *dir; + int error = 0; + + CPU_ZERO(&primary_cpus); + CPU_ZERO(&hyperthread_cpus); + + /* + * Scan the /sysfs pseudo-filesystem for CPU info directories. + * There should be one subdirectory for each installed logical CPU + */ + d = opendir("/sys/devices/system/cpu"); + if (d) { + while ((dir = readdir(d)) != NULL) { + cpu_idnum = CPU_SETSIZE; + + /* + * If the current directory entry doesn't represent + * a CPU info subdirectory then skip to the next entry. + */ + if (dir->d_type == DT_DIR) { + if (!strncmp(dir->d_name, "cpu", 3)) { + /* + * Directory name starts with "cpu"... + * Try to extract a CPU ID number + * from the remainder of the dirname. + */ + errno = 0; + numptr = dir->d_name; + numptr += 3; + cpu_idnum = strtol(numptr, &endptr, + 10); + if (errno || (endptr == numptr)) + continue; + } else { + continue; + } + } else { + continue; + } + /* + * If we get here the current directory entry specifies + * a CPU info subdir for the CPU indexed by cpu_idnum. + */ + error = process_cpu_info_dir(cpu_idnum); + if (error) + break; + } + closedir(d); + return error; + } else { + return -1; + } +} + +/* + * This function obtains system information specifying which cpus are + * available at boot time. These data are then used to produce cpumasks of + * configured CPUs which are appropriate for either isolated or 'non-isolated' + * task scheduling. + */ +static int get_available_cpus(void) +{ + int cpu; + + /* Clear the global cpumasks for control and worker CPUs */ + odp_cpumask_zero(&odp_global_data.control_cpus); + odp_cpumask_zero(&odp_global_data.worker_cpus); + + /* + * Derive cpumasks of configured 'primary' CPU cores and + * a cpumask of thread siblings for each 'primary' CPU configured. + */ + if (get_cpu_topology()) + return -1; + + /* + * First ensure that only 'primary' CPUs are considered from those + * specified for the 'worker' scheduling cpumask. + * Also ensure CPU 0 is not included in the worker mask. + */ + for (cpu = 1; cpu < CPU_SETSIZE; cpu++) + if (CPU_ISSET(cpu, &primary_cpus)) + odp_cpumask_set(&odp_global_data.worker_cpus, cpu); + + /* + * Ensure CPU 0 is included in the control mask. + * If CPU 0 has any hyperthread siblings, include them as well. + */ + odp_cpumask_set(&odp_global_data.control_cpus, 0); + for (cpu = 1; cpu < CPU_SETSIZE; cpu++) + if (CPU_ISSET(cpu, &hyperthread_cpus)) + odp_cpumask_set(&odp_global_data.control_cpus, cpu); + return 0; +} + int odp_init_global(const odp_init_t *params, const odp_platform_init_t *platform_params ODP_UNUSED) { @@ -25,6 +242,12 @@ int odp_init_global(const odp_init_t *params, odp_global_data.abort_fn = params->abort_fn; } + if (get_available_cpus()) { + ODP_ERR("ODP cpumask init failed.\n"); + goto init_failed; + } + stage = CPUMASK_INIT; + if (odp_time_init_global()) { ODP_ERR("ODP time init failed.\n"); goto init_failed; @@ -187,6 +410,9 @@ int _odp_term_global(enum init_stage stage) } /* Fall through */ + case CPUMASK_INIT: + /* Fall through */ + case NO_INIT: ; } diff --git a/platform/linux-generic/odp_system_info.c b/platform/linux-generic/odp_system_info.c index 42aef8a..11d44c8 100644 --- a/platform/linux-generic/odp_system_info.c +++ b/platform/linux-generic/odp_system_info.c @@ -30,21 +30,15 @@ #define HUGE_PAGE_DIR "/sys/kernel/mm/hugepages" +/* Number of logical CPUs detected at boot time */ +int numcpus; /* - * Report the number of CPUs in the affinity mask of the main thread + * Report the number of logical CPUs detected at boot time */ static int sysconf_cpu_count(void) { - cpu_set_t cpuset; - int ret; - - ret = pthread_getaffinity_np(pthread_self(), - sizeof(cpuset), &cpuset); - if (ret != 0) - return 0; - - return CPU_COUNT(&cpuset); + return numcpus; } #if defined __x86_64__ || defined __i386__ || defined __OCTEON__ || \