From patchwork Fri Oct 21 15:55:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 78687 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp1371021qge; Fri, 21 Oct 2016 08:55:46 -0700 (PDT) X-Received: by 10.98.138.79 with SMTP id y76mr2891001pfd.158.1477065346866; Fri, 21 Oct 2016 08:55:46 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id h7si2702710pad.79.2016.10.21.08.55.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Oct 2016 08:55:46 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-439258-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-439258-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-439258-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=C3+Fh8tW7xpQT9LRi P2UY4+d+hgF4/9dqPOuV/OMDlpxb0LhCQ2Rxf/UMo4ePPNZeTsYy43h/msDVJcvJ Ahwl9N4fnPxFp9SCXl8sMlanKFowMhjQWaLyz+8mOpYz2grpcsYxGpMRbp4GHJ34 TXU7/fJcjHiRz2b5HRHOIqIbRk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=8+FgwJ41zk/mcw9D4lpXyTg +m/Y=; b=nfj9OJj4nixbYeB1XjTUVFsRBcU7mauKO79Md0klQboWnGzdEeg37cm zXhqWnW8xSjjbhOc2hlOycqnkzLjJcWJ11lP0KnUesBm4OVAl6ptwKnhxbVugp31 Z03ANhofjmMY1p2zuX0t1OPjj51LFccFpcso+a+sDxP/13+V1BCU= Received: (qmail 92248 invoked by alias); 21 Oct 2016 15:55:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 92226 invoked by uid 89); 21 Oct 2016 15:55:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=no version=3.3.2 spammy=qualification, make_pair, uniformly, common_type X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 21 Oct 2016 15:55:12 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 38A27C04B931; Fri, 21 Oct 2016 15:55:11 +0000 (UTC) Received: from localhost (ovpn-116-70.ams2.redhat.com [10.36.116.70]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u9LFtADG032386; Fri, 21 Oct 2016 11:55:10 -0400 Date: Fri, 21 Oct 2016 16:55:09 +0100 From: Jonathan Wakely To: Eelis van der Weegen Cc: libstdc++@gcc.gnu.org, GCC Patches Subject: Re: [patch, libstdc++] Optimize selection sampling by using generator to get two values at once Message-ID: <20161021155509.GU2922@redhat.com> References: <58074F89.4050403@eelis.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <58074F89.4050403@eelis.net> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.0 (2016-08-17) On 19/10/16 12:48 +0200, Eelis van der Weegen wrote: >This is the same optimization as was recently applied to std::shuffle. > >It reduces the runtime of the following program by 20% on my machine: > > int main() > { > std::vector a(10000), b(1000); > std::mt19937 gen; > > uint64_t c = 0; > > for (int i = 0; i != 1000; ++i) > { > std::sample(a.begin(), a.end(), b.begin(), b.size(), gen); > c += uint64_t(b[32]); > } > > std::cout << c; > } Thanks, I've committd this slightly revised version to trunk (tweaking some whitespace, removing some redundant std:: qualification, and using foo_t aliases instead of typename foo::type). Tested powerpc64le-linux. Committed to trunk. commit 01535578bc44d810e7cf4c2bfbc3836d7977e229 Author: Jonathan Wakely Date: Fri Oct 21 16:37:21 2016 +0100 Optimize RNG use in std::sample selection sampling 2016-10-21 Eelis van der Weegen * include/bits/stl_algo.h (__gen_two_uniform_ints): Move logic out of shuffle into new function. (shuffle): Call __gen_two_uniform_ints. (__sample): Use __gen_two_uniform_ints and perform two samples at a time. diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h index 6c771bb..3ecb33b 100644 --- a/libstdc++-v3/include/bits/stl_algo.h +++ b/libstdc++-v3/include/bits/stl_algo.h @@ -3741,6 +3741,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef _GLIBCXX_USE_C99_STDINT_TR1 /** + * @brief Generate two uniformly distributed integers using a + * single distribution invocation. + * @param __b0 The upper bound for the first integer. + * @param __b1 The upper bound for the second integer. + * @param __g A UniformRandomBitGenerator. + * @return A pair (i, j) with i and j uniformly distributed + * over [0, __b0) and [0, __b1), respectively. + * + * Requires: __b0 * __b1 <= __g.max() - __g.min(). + * + * Using uniform_int_distribution with a range that is very + * small relative to the range of the generator ends up wasting + * potentially expensively generated randomness, since + * uniform_int_distribution does not store leftover randomness + * between invocations. + * + * If we know we want two integers in ranges that are sufficiently + * small, we can compose the ranges, use a single distribution + * invocation, and significantly reduce the waste. + */ + template + pair<_IntType, _IntType> + __gen_two_uniform_ints(_IntType __b0, _IntType __b1, + _UniformRandomBitGenerator&& __g) + { + _IntType __x + = uniform_int_distribution<_IntType>{0, (__b0 * __b1) - 1}(__g); + return std::make_pair(__x / __b1, __x % __b1); + } + + /** * @brief Shuffle the elements of a sequence using a uniform random * number generator. * @ingroup mutating_algorithms @@ -3773,8 +3804,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; - typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; - typedef typename std::common_type::type __uc_type; + typedef typename remove_reference<_UniformRandomNumberGenerator>::type + _Gen; + typedef typename common_type::type + __uc_type; const __uc_type __urngrange = __g.max() - __g.min(); const __uc_type __urange = __uc_type(__last - __first); @@ -3801,13 +3834,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION while (__i != __last) { const __uc_type __swap_range = __uc_type(__i - __first) + 1; - const __uc_type __comp_range = __swap_range * (__swap_range + 1); - std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; - const __uc_type __pospos = __d(__g); + const pair<__uc_type, __uc_type> __pospos = + __gen_two_uniform_ints(__swap_range, __swap_range + 1, __g); - std::iter_swap(__i++, __first + (__pospos % __swap_range)); - std::iter_swap(__i++, __first + (__pospos / __swap_range)); + std::iter_swap(__i++, __first + __pospos.first); + std::iter_swap(__i++, __first + __pospos.second); } return; @@ -5695,9 +5727,52 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO { using __distrib_type = uniform_int_distribution<_Size>; using __param_type = typename __distrib_type::param_type; + using _USize = make_unsigned_t<_Size>; + using _Gen = remove_reference_t<_UniformRandomBitGenerator>; + using __uc_type = common_type_t; + __distrib_type __d{}; _Size __unsampled_sz = std::distance(__first, __last); - for (__n = std::min(__n, __unsampled_sz); __n != 0; ++__first) + __n = std::min(__n, __unsampled_sz); + + // If possible, we use __gen_two_uniform_ints to efficiently produce + // two random numbers using a single distribution invocation: + + const __uc_type __urngrange = __g.max() - __g.min(); + if (__urngrange / __uc_type(__unsampled_sz) >= __uc_type(__unsampled_sz)) + // I.e. (__urngrange >= __unsampled_sz * __unsampled_sz) but without + // wrapping issues. + { + while (__n != 0 && __unsampled_sz >= 2) + { + const pair<_Size, _Size> p = + __gen_two_uniform_ints(__unsampled_sz, __unsampled_sz - 1, __g); + + --__unsampled_sz; + if (p.first < __n) + { + *__out++ = *__first; + --__n; + } + + ++__first; + + if (__n == 0) break; + + --__unsampled_sz; + if (p.second < __n) + { + *__out++ = *__first; + --__n; + } + + ++__first; + } + } + + // The loop above is otherwise equivalent to this one-at-a-time version: + + for (; __n != 0; ++__first) if (__d(__g, __param_type{0, --__unsampled_sz}) < __n) { *__out++ = *__first;