From patchwork Thu Jan 18 17:53:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 124998 Delivered-To: patch@linaro.org Received: by 10.46.64.27 with SMTP id n27csp228435lja; Thu, 18 Jan 2018 09:54:11 -0800 (PST) X-Google-Smtp-Source: ACJfBosuPL5OBhcVZ93t1PERsNQOAu6k8MWwAj00jhnlcO5Y3EP88FMRdrEViLzsNg+nnB+bSWHS X-Received: by 10.98.172.7 with SMTP id v7mr21296524pfe.66.1516298051334; Thu, 18 Jan 2018 09:54:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516298051; cv=none; d=google.com; s=arc-20160816; b=AkSRBvZktJXKeo/bHBLeqnwXkX158uX741ZZ13sohQNR+uW8fEZbmXR+KQ1ohg6ajc f8FuM7v5BlQ2Z424w6O0O1l6VoiIkIPF63hvxjbMaqXJi1jtz83xnm7dFwFeeCYpRD4c KdlnVPEcEK0XXs15F+faUbLdncW7RrDBJ9P307Dd0D1s4gX+fQyJioyawZb8o3hHBBBe yNlEw3RJ/sklsv0W9T81Tw4/Ao5RPkImgFysQKbXLfQ81f5/Y5AZFa3n2NbRAdUyLQfv skV82cbs4B+WB2Thk0jVn2RJS2BgJfL0Ab6K9XsCMOD2Ez0VrPoaYREm4/kEKf7twpFi 2CoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from:delivered-to :sender:list-help:list-post:list-archive:list-subscribe :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=QggD3hAev+rxfF0BB0PdXJ0Q1EVQyi19vfKiLrAPRMA=; b=FB5JLHoxs3XCA/4tUFCDm3ceEtjfiaU0UD1bygym0wBJjUw/7V0RAW2dMU/kFtTZxd P+zbI/ZTYbUoJPWFsVAqqlvEVXsOryXLuD7PQ3VmGA0lbvRfNTmHCTBb6AGWth/NwqPU kIBwQAWB1Owd1vX8x22tiFrbfGjxfZfTrnlOx+m45GSTM5xqGcJQ/mki7NnX1FFP/qkg 9blpl07Qa4zeXXYCmqsDd+65XZZSTdKJJr9cpVSWgvTTiiBD/u4Uay5oGQdMP0SHNtRL MUg9vHAiSTHd5zZA5D9ZrZbRk4LDkE8bEDGoxowARwLyLdXrz0T4G+O8XMPEh+uUa9kC OpmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=UBfLSOSw; spf=pass (google.com: domain of libc-alpha-return-89325-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=libc-alpha-return-89325-patch=linaro.org@sourceware.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id i1-v6si76126plt.121.2018.01.18.09.54.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 09:54:11 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-return-89325-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=UBfLSOSw; spf=pass (google.com: domain of libc-alpha-return-89325-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=libc-alpha-return-89325-patch=linaro.org@sourceware.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=NTEPX+0SMAUS8K5h0yjSzSJ5BjfIqc4 pDQgieUxkaIw4fUe+hXZB0yzZBURWAKcZq7iasjBMmPJrsN+ik8CS+JmrK15+uIQ IsJo2j/jV4ZNS3lNQihWPSyZpCXOztVKBwcrFRbJs4dKROkM/ZjS/YVf8dQeMPSD ZNrYT2FYQ9aY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; s=default; bh=1FkYefPP6Ogc6HU7HYQROrtX5dA=; b=UBfLS OSwoWLx5kNL/A1cpIgVeWRFS0f4tWkIQ/ahMtz4zYXhSN32SurP07SXr7/R8rvqb g5jgoLr9sb5m1JPPzqvoZQPjCHm8EZrZP2Ph3R0d9Epi7c0ZpKteg0Mp+aL7c15U xt9HW0vAYRgtimny9Fa1AjWKqqfS7kEyHpbumA= Received: (qmail 79498 invoked by alias); 18 Jan 2018 17:53:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 79378 invoked by uid 89); 18 Jan 2018 17:53:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=10001, positions, LOCALES X-HELO: mail-qt0-f179.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=QggD3hAev+rxfF0BB0PdXJ0Q1EVQyi19vfKiLrAPRMA=; b=icbgpPkfrZKbQI31mTXkXpFqQanqY+zw5CkWpE7d0xtO9FA75ICXw5iaLptWVSOIRz sN8OOTs47TfVJArCQIQUwSzyz3CUx6PzGm3SlHC2ceamoxorrIeMnyh4m1pxWyjYAWYB NUkEZeISxzl0Fu+OU1myYQb6M8fCkP6wNV5WY7HLorLOnz1N932+rEPOhgmWe4FdfoU/ LaTWugjjOhtwh55Nrjob/oIkZcsT2qFwHkM2IgkOJt9xiorvrEo4tRDKQAv8TzjvC8f3 JBjOfk9h2Ea5bQW1DgaIgMsMm+JqyxE2xlc4q4nFzewUDfqwtQ5TlJTua+XaZ8gEJy/3 9GKg== X-Gm-Message-State: AKwxytdzPpbp/e1cz/JVuYyV/sDutx1/Xjlom68dzhktt1hVKCPPTdNp xTNmBaJTMMzU6+sYYW3qjS7/2nD7MnU= X-Received: by 10.200.7.67 with SMTP id k3mr30694318qth.68.1516298013520; Thu, 18 Jan 2018 09:53:33 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH 3/7] benchtests: Add bench-qsort Date: Thu, 18 Jan 2018 15:53:18 -0200 Message-Id: <1516298002-4618-4-git-send-email-adhemerval.zanella@linaro.org> In-Reply-To: <1516298002-4618-1-git-send-email-adhemerval.zanella@linaro.org> References: <1516298002-4618-1-git-send-email-adhemerval.zanella@linaro.org> This patch adds a qsort benchmark. I tried to model the benchmark taking in consideration the possible input variation in both internal element size, element numbers, and internal state for 1. real word cases and 2. possible scenarios based on hardware characteristics. For 1. I tracked qsort usage (using a simple preload library to dump its usage and a script to pos-process it) on both GCC bootstrap and Firefox. GCC 8 bootstrap build shows 51786641 call to qsort with the following characterics: Key: number of elements: key=2 : 39.74 key=3 : 19.23 key=4 : 9.77 key=1 : 8.44 key=0 : 6.60 key=5 : 4.40 key=7 : 2.37 key=6 : 2.25 key=9 : 1.48 key=8 : 0.97 Key: element size in bytes: key=8 : 91.74 key=32 : 3.62 key=4 : 2.42 key=40 : 1.20 key=16 : 0.67 key=24 : 0.30 key=48 : 0.05 key=56 : 0.00 key=1 : 0.00 Key: total size (number of elements * element size): key=16 : 35.98 key=24 : 18.67 key=32 : 9.79 key=8 : 8.28 key=0 : 6.60 key=40 : 4.21 key=64 : 3.15 key=48 : 2.24 key=56 : 2.15 key=80 : 1.45 So for GCC: - 80% of total qsort usage are done with 10 elements of less. - All usages are done element size of maximum of 56 bytes. - 90% of calls are done with array of maximum size of 80 bytes or less. The Firefox usage was done with 2 hours of usage, with first 10 minutes activelly openning and closing different types of sites. It resulted in 21042 calls with following characteristics: Key: number of elements: key=7 : 24.40 key=1 : 10.44 key=3 : 6.33 key=4 : 5.81 key=2 : 5.46 key=6 : 4.80 key=17 : 4.54 key=0 : 3.07 key=5 : 3.05 key=9 : 2.51 key=12 : 2.06 Key: element size in bytes: key=8 : 94.49 key=28 : 4.40 key=2 : 0.70 key=16 : 0.19 key=36 : 0.07 key=12 : 0.07 key=40 : 0.07 key=24 : 0.03 Key: total size (number of elements * element size): key=56 : 24.20 key=8 : 10.27 key=24 : 6.36 key=32 : 5.86 key=16 : 5.46 key=48 : 4.80 key=476 : 3.75 key=0 : 3.07 key=40 : 3.05 key=72 : 2.50 So for Firefox: - 72% of total qsort usage are done with 18 elements of less. - All usages are done element size of maximum of 40 bytes. - 70% of calls are done with array of maximum size of 476 bytes or less. For 2. I used the idea of a machine with 3 levels of cache with sizes 32kb (L1), 256kb (L2), and 4096 (L3). It resulted in a benchmark with following traits: * It checks four types of input arrays: sorted, mostly sorted, unsorted, and repeated. For 'sorted' the array is already sorted, 'mostly sorted' the array will have a certain number of random elements with random values (current ratio used is 20%), for 'unsorted' the array will contain random elements from full range based on used type, and for 'repeated' the array will have random elements with a certain number (current ratio is 20%) of a repeated element distributed randomly. * Three elements sizes are checked: uint32_t, uint64_t, and an element with 32 bytes (but using the uint64_t comparison checks). These element sizes are used to 1. to avoid include the comparison function itself and/or memory copy in sort benchmark itself, and 2. because key of size_t are the most used for both GCC and Firefox. * Five different element numbers: 64 (which cover mostly of used element sizes for both GCC and Firefox), 4096/8192 (which cover 32 KB of L1 for 32 and 64 bits), 32768/65536 (L2 with 256 KB), and 24288/1048576 (L3 with 4 MB). The sizes are configurable by --nelem option. Checked on x86_64-linux-gnu * benchtests/Makefile (stdlib-benchset): Add qsort. * benchtests/bench-qsort.c: New file. --- benchtests/Makefile | 2 +- benchtests/bench-qsort.c | 352 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 353 insertions(+), 1 deletion(-) create mode 100644 benchtests/bench-qsort.c -- 2.7.4 diff --git a/benchtests/Makefile b/benchtests/Makefile index ff99d25..6d6a5e9 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -66,7 +66,7 @@ LOCALES := en_US.UTF-8 tr_TR.UTF-8 cs_CZ.UTF-8 fa_IR.UTF-8 fr_FR.UTF-8 \ include ../gen-locales.mk endif -stdlib-benchset := strtod +stdlib-benchset := strtod qsort stdio-common-benchset := sprintf diff --git a/benchtests/bench-qsort.c b/benchtests/bench-qsort.c new file mode 100644 index 0000000..097459b --- /dev/null +++ b/benchtests/bench-qsort.c @@ -0,0 +1,352 @@ +/* Measure qsort function. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +#include "json-lib.h" +#include "bench-timing.h" +#include "bench-util.h" + +#include +#include +#include + +#define ARRAY_SIZE(__array) (sizeof (__array) / sizeof (__array[0])) + +/* Type of inputs arrays: + - Sorted: array already sorted in placed. + - MostlySorted: sorted array with 'MostlySortedRatio * size' elements + in random positions set to random values. + - Unsorted: all elements in array set to random values. + - Repeated: random array with 'RepeatedRation' elements in random + positions set to an unique value. */ +typedef enum { + Sorted = 0, + MostlySorted = 1, + Unsorted = 2, + Repeated = 3, +} arraytype_t; + +/* Ratio of total of elements which will randomized. */ +static const double MostlySortedRatio = 0.2; + +/* Ratio of total of elements which will be repeated. */ +static const double RepeatedRatio = 0.2; + +struct array_t +{ + arraytype_t type; + const char *name; +} arraytypes[] = +{ + { Sorted, "Sorted" }, + { Unsorted, "Unsorted" }, + { MostlySorted, "MostlySorted" }, + { Repeated, "Repeated" }, +}; + + +typedef int (*cmpfunc_t)(const void *, const void *); +typedef void (*seq_element_t) (void *, size_t); + +static inline void * +arr (void *base, size_t idx, size_t size) +{ + return (void*)((uintptr_t)base + (idx * size)); +} + +static struct mt19937_64 mt; + +/* Fill the BUFFER with size SIZE in bytes with random uint64_t obtained from + the global MT state. */ +static inline void +fill_rand (void *buffer, size_t size) +{ + uint8_t *array = (uint8_t*)(buffer); + for (size_t i = 0; i < size; i++) + array[i] = uniform_uint64_distribution (mt64_rand (&mt), 0, UINT8_MAX); +} + +static void * +create_array (size_t nmemb, size_t type_size, arraytype_t type, + seq_element_t seq) +{ + size_t size = nmemb * type_size; + void *array = xmalloc (size); + + switch (type) + { + case Sorted: + for (size_t i = 0; i < nmemb; i++) + seq (arr (array, i, type_size), i); + break; + + case MostlySorted: + { + for (size_t i = 0; i < nmemb; i++) + seq (arr (array, i, type_size), i); + + /* Change UNSORTED elements (based on MostlySortedRatio ratio) + in the sorted array. */ + size_t unsorted = (size_t)(nmemb * MostlySortedRatio); + for (size_t i = 0; i < unsorted; i++) + { + size_t pos = uniform_uint64_distribution (mt64_rand (&mt), 0, + nmemb - 1); + fill_rand (arr (array, pos, type_size), type_size); + } + } + break; + + case Unsorted: + fill_rand (array, size); + break; + + case Repeated: + { + fill_rand (array, size); + + void *randelem = xmalloc (type_size); + fill_rand (randelem, type_size); + + /* Repeat REPEATED elements (based on RepeatRatio ratio) in the random + array. */ + size_t repeated = (size_t)(nmemb * RepeatedRatio); + for (size_t i = 0; i < repeated; i++) + { + size_t pos = uniform_uint64_distribution (mt64_rand (&mt), 0, + nmemb - 1); + memcpy (arr (array, pos, type_size), randelem, type_size); + } + free (randelem); + } + break; + } + + return array; +} + +/* Functions for uint32_t type. */ +static int +cmp_uint32_t (const void *a, const void *b) +{ + uint32_t ia = *(uint32_t*)a; + uint32_t ib = *(uint32_t*)b; + return (ia > ib) - (ia < ib); +} + +static void +seq_uint32_t (void *base, size_t idx) +{ + *(uint32_t *)base = idx; +} + +/* Functions for uint64_t type. */ +static int +cmp_uint64_t (const void *a, const void *b) +{ + uint64_t ia = *(uint64_t*)a; + uint64_t ib = *(uint64_t*)b; + return (ia > ib) - (ia < ib); +} + +static void +seq_uint64_t (void *base, size_t idx) +{ + *(uint64_t *)base = idx; +} + +/* Number of elements of determined type to be measured. */ +static const size_t default_elem[] = +{ + 256/sizeof(size_t), /* 64/128 which covers mostly used element number + on GCC build. */ + 32768/sizeof(size_t), /* 4096/8192 to fit on a L1 with 32 KB. */ + 262144/sizeof(size_t), /* 32768/65536 to fit on a L2 with 256 KB. */ + 4194304/sizeof(size_t), /* 524288/1048576 to fix on a L3 with 4 MB. */ +}; + + +#define OPT_NELEM 10000 +#define OPT_SEED 10001 +#define CMDLINE_OPTIONS \ + { "nelem", required_argument, NULL, OPT_NELEM }, \ + { "seed", required_argument, NULL, OPT_SEED }, + +static const size_t max_nelem = 16; +static size_t *elems = NULL; +static size_t nelem = 0; +static uint64_t seed = 0; +static bool seed_set = false; + +static void __attribute__ ((used)) +cmdline_process_function (int c) +{ + switch (c) + { + /* Handle the --nelem option to run different sizes than DEFAULT_ELEM. + The elements sizes as passed with a ':' as the delimiter, for + instance --nelem 32:128:1024 will ran 32, 128, and 1024 elements. */ + case OPT_NELEM: + { + elems = xmalloc (max_nelem * sizeof (size_t)); + nelem = 0; + + char *saveptr; + char *token; + token = strtok_r (optarg, ":", &saveptr); + if (token == NULL) + { + printf ("error: invalid --nelem value\n"); + exit (EXIT_FAILURE); + } + do + { + if (nelem == max_nelem) + { + printf ("error: invalid --nelem value (max elem)\n"); + exit (EXIT_FAILURE); + } + elems[nelem++] = atol (token); + token = strtok_r (saveptr, ":", &saveptr); + } while (token != NULL); + } + break; + + /* handle the --seed option to use a different seed than a random one. + The SEED used should be a uint64_t number. */ + case OPT_SEED: + { + unsigned long int value = strtoull (optarg, NULL, 0); + if (errno == ERANGE || value > UINT64_MAX) + { + printf ("error: seed should be a value in range of " + "[0, UINT64_MAX]\n"); + exit (EXIT_FAILURE); + } + seed = value; + seed_set = true; + } + } +} + +#define CMDLINE_PROCESS cmdline_process_function + +static const size_t inner_loop_iters = 16; + +struct run_t +{ + size_t type_size; + cmpfunc_t cmp; + seq_element_t seq; +}; +static const struct run_t runs[] = +{ + { sizeof (uint32_t), cmp_uint32_t, seq_uint32_t }, + { sizeof (uint64_t), cmp_uint64_t, seq_uint64_t }, + { 32, cmp_uint64_t, seq_uint64_t }, +}; + +static int +do_test (void) +{ + if (!seed_set) + { + /* Use default seed in case of error. */ + random_seed (&seed, sizeof (seed)); + } + mt64_seed (&mt, seed); + + json_ctx_t json_ctx; + + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + json_attr_object_begin (&json_ctx, "functions"); + json_attr_object_begin (&json_ctx, "qsort"); + json_attr_uint (&json_ctx, "seed", seed); + + json_array_begin (&json_ctx, "results"); + + const size_t *welem = elems == NULL ? default_elem : elems; + const size_t wnelem = elems == NULL ? ARRAY_SIZE (default_elem) + : nelem; + + for (int j = 0; j < ARRAY_SIZE (runs); j++) + { + for (int i = 0; i < ARRAY_SIZE (arraytypes); i++) + { + for (int k = 0; k < wnelem; k++) + { + json_element_object_begin (&json_ctx); + + size_t nmemb = welem[k]; + size_t ts = runs[j].type_size; + size_t arraysize = nmemb * ts; + + json_attr_uint (&json_ctx, "nmemb", nmemb); + json_attr_uint (&json_ctx, "type_size", ts); + json_attr_string (&json_ctx, "property", arraytypes[i].name); + + void *base = create_array (nmemb, ts, arraytypes[i].type, runs[j].seq); + void *work = xmalloc (arraysize); + + timing_t total; + TIMING_INIT (total); + + for (int n = 0; n < inner_loop_iters; n++) + { + memcpy (work, base, arraysize); + + timing_t start, end, diff; + TIMING_NOW (start); + qsort (work, nmemb, ts, runs[j].cmp); + TIMING_NOW (end); + + TIMING_DIFF (diff, start, end); + TIMING_ACCUM (total, diff); + } + + json_attr_uint (&json_ctx, "timings", + (double) total / (double) inner_loop_iters); + json_element_object_end (&json_ctx); + + free (base); + free (work); + } + } + } + + json_array_end (&json_ctx); + + json_attr_object_end (&json_ctx); + json_attr_object_end (&json_ctx); + json_document_end (&json_ctx); + + return 0; +} + +#define TIMEOUT 600 +#include