From patchwork Thu Nov 17 23:03:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 82806 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp1037217qge; Thu, 17 Nov 2016 15:07:28 -0800 (PST) X-Received: by 10.28.154.86 with SMTP id c83mr18194973wme.23.1479424048171; Thu, 17 Nov 2016 15:07:28 -0800 (PST) Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id p21si133588wma.116.2016.11.17.15.07.27 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 17 Nov 2016 15:07:28 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org Received: from localhost ([::1]:33694 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c7VmA-0003Ux-Mb for patch@linaro.org; Thu, 17 Nov 2016 18:07:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49173) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c7ViF-0001I9-L7 for qemu-devel@nongnu.org; Thu, 17 Nov 2016 18:03:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c7Vi6-0005kF-TL for qemu-devel@nongnu.org; Thu, 17 Nov 2016 18:03:23 -0500 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:36529) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c7Vi6-0005jz-Id for qemu-devel@nongnu.org; Thu, 17 Nov 2016 18:03:14 -0500 Received: by mail-wm0-x242.google.com with SMTP id m203so419147wma.3 for ; Thu, 17 Nov 2016 15:03:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=BNeeEhlZT6JrbpWLlKUw01+sNuz6RsWiIzSSkUtwo8k=; b=KphxHmjrYLPCbjyn1cH/vZ3UO+Gp3oU8vx9oKYIgkh+U7mw+hEH2/Ks0Lo2U3imtNz 9PMr1h/ojLs8igKN05A0Zq0kcs4ZtFOWbLOv8AXXRTU++2FHQUEKseb/w+4z+M8jp6Zi w7amrGuMb4nVNNRmfUTxwDn/mSXsPfdP0XPnjB4m6QPB4kqv28jeoglGrAglXVUPG/Qn CWK5niogtScTM0GHYqV5wcYk+SUhS5RTPOb9jsZu7y4Z2Slpmg3c4jq6n4x/kgJl2WPy bezNWdVd6u+KjpasmeVLWLrM50YcNW1xJOwUkZ4S1P+1UehigKh82qkZ2WfN0XzRQC6b V9nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=BNeeEhlZT6JrbpWLlKUw01+sNuz6RsWiIzSSkUtwo8k=; b=fDhqO8146SPLAduyfvQx58jMgWDYLCmRtPKomNA16/Jc2I6zXLNHcD//KBcDDGGX0t AXSvhlBB2yN3We3MuuJaDJTPa31cmTMvLh6euC+bBhfa3y4Nn7EjON80rBCCCHR5vjON hD48cIe+mxlWxK+Qq7PLXOyKdHGL1JiN4L/pa016zCx3FIPxnK3YHsUuKrMY/4MXZeBV mwywJw4q5ADC8ALsBV2gUwFtK+fsTBNxh71HIFE3C3d1Mq265BazzxK9ZAjVZ5VClwRf E2CRNvqv0fG6G6ClF36tg35ra/shhoJevvK5lIVcTXc6S/D4df8+aRIrnT4UegSTbemC yu0w== X-Gm-Message-State: ABUngveK9/xKEy3h43hTr4/tFMbP0eTSg6UizuJUq8g/WokA+SX5FpuOLtXwLiMAlRRqpA== X-Received: by 10.28.37.2 with SMTP id l2mr5590923wml.86.1479423793234; Thu, 17 Nov 2016 15:03:13 -0800 (PST) Received: from bigtime.twiddle.net ([87.111.149.139]) by smtp.googlemail.com with ESMTPSA id 138sm161065wms.20.2016.11.17.15.03.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Nov 2016 15:03:12 -0800 (PST) To: Bastian Koppelmann , qemu-devel@nongnu.org References: <1479324335-2074-1-git-send-email-rth@twiddle.net> <1479324335-2074-17-git-send-email-rth@twiddle.net> <1e2e6eab-921a-f30b-93d4-66bd62f68dd1@mail.uni-paderborn.de> <0e841b99-5994-120d-bd4f-051b694d1ca4@mail.uni-paderborn.de> From: Richard Henderson Message-ID: Date: Fri, 18 Nov 2016 00:03:08 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <0e841b99-5994-120d-bd4f-051b694d1ca4@mail.uni-paderborn.de> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::242 Subject: Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" On 11/17/2016 11:09 PM, Bastian Koppelmann wrote: > On 11/17/2016 08:59 PM, Richard Henderson wrote: >> On 11/17/2016 08:53 PM, Richard Henderson wrote: >>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote: >>>> On 11/16/2016 08:25 PM, Richard Henderson wrote: >>>>> + >>>>> + OP_32_64(clz): >>>>> + if (const_args[2]) { >>>>> + tcg_debug_assert(have_bmi1); >>>>> + tcg_debug_assert(args[2] == (rexw ? 64 : 32)); >>>>> + tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]); >>>>> + } else { >>>>> + /* ??? See above. */ >>>>> + tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]); >>>> >>>> The Intel ISA manual states that it find the bit index of the most >>>> significant bit, where the least significant bit is index 0. So for the >>>> input 0x2 this should return 1. However this is not the number of >>>> leading zeros. >>> >>> Oh, of course you're right. I thought I was testing this, but while >>> alpha does >>> have this operation, it turns out it isn't used much. >> >> Alternately, what I tested was on a haswell machine, which takes the >> LZCNT path, which *does* produce the intended results. Just the BSR >> path doesn't. > > Luckily my old laptop is a Core 2 Duo without LZCNT :) Heh. Well, I've given it another few tests with LZCNT hacked off, and with i686 32-bit. Here's an incremental update. Wherein I also note that lzcnt isn't in the same cpuid flag as tzcnt. Double whoops. r~ diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 3eeb58f..c3f7adc 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -139,6 +139,11 @@ static bool have_bmi2; #else # define have_bmi2 0 #endif +#if defined(CONFIG_CPUID_H) && defined(bit_LZCNT) +static bool have_lzcnt; +#else +# define have_lzcnt 0 +#endif static tcg_insn_unit *tb_ret_addr; @@ -1148,6 +1153,76 @@ static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest, } #endif +static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1, + TCGArg arg2, bool a2const) +{ + if (a2const) { + tcg_debug_assert(have_bmi1); + tcg_debug_assert(arg2 == (rexw ? 64 : 32)); + tcg_out_modrm(s, OPC_TZCNT + rexw, dest, arg1); + } else { + /* ??? The manual says that the output is undefined when the + input is zero, but real hardware leaves it unchanged. As + noted in target-i386/translate.c, real programs depend on + this -- now we are one more of those. */ + /* ??? We could avoid this if TCG had an early clobber marking + for the output. */ + tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1); + if (dest != arg2) { + tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2); + } + } +} + +static void tcg_out_clz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1, + TCGArg arg2, bool a2const) +{ + TCGLabel *over; + TCGType type; + unsigned rev; + + /* ??? All this would be easier (and would avoid the semi-undefined + behaviour) if TCG had an early clobber marking for the output. */ + + if (have_lzcnt) { + if (a2const && arg2 == (rexw ? 64 : 32)) { + tcg_out_modrm(s, OPC_LZCNT + rexw, dest, arg1); + return; + } + if (!a2const && dest != arg2) { + tcg_out_modrm(s, OPC_LZCNT + rexw, dest, arg1); + tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2); + return; + } + } + + over = gen_new_label(); + type = rexw ? TCG_TYPE_I64: TCG_TYPE_I32; + rev = rexw ? 63 : 31; + + tcg_out_modrm(s, OPC_BSR + rexw, dest, arg1); + + /* Recall that the output of BSR is the index not the count. + Therefore we must adjust the result by ^ (SIZE-1). In some + cases below, we prefer an extra XOR to an extra JMP. */ + if (!a2const && dest == arg2) { + /* ??? See the comment in tcg_out_ctz re BSF. */ + tcg_out_jxx(s, tcg_cond_to_jcc[TCG_COND_EQ], over, 1); + tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0); + tcg_out_label(s, over, s->code_ptr); + } else { + tcg_out_jxx(s, tcg_cond_to_jcc[TCG_COND_NE], over, 1); + if (a2const) { + tcg_out_movi(s, type, dest, arg2 ^ rev); + } else { + tcg_out_mov(s, type, dest, arg2); + tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0); + } + tcg_out_label(s, over, s->code_ptr); + tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0); + } +} + static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest) { intptr_t disp = tcg_pcrel_diff(s, dest) - 5; @@ -2024,34 +2099,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, break; OP_32_64(ctz): - if (const_args[2]) { - tcg_debug_assert(have_bmi1); - tcg_debug_assert(args[2] == (rexw ? 64 : 32)); - tcg_out_modrm(s, OPC_TZCNT + rexw, args[0], args[1]); - } else { - /* ??? The manual says that the output is undefined when the - input is zero, but real hardware leaves it unchanged. As - noted in target-i386/translate.c, real programs depend on - this -- now we are one more of those. */ - tcg_out_modrm(s, OPC_BSF + rexw, args[0], args[1]); - if (args[0] != args[2]) { - tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]); - } - } + tcg_out_ctz(s, rexw, args[0], args[1], args[2], const_args[2]); break; - OP_32_64(clz): - if (const_args[2]) { - tcg_debug_assert(have_bmi1); - tcg_debug_assert(args[2] == (rexw ? 64 : 32)); - tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]); - } else { - /* ??? See above. */ - tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]); - if (args[0] != args[2]) { - tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]); - } - } + tcg_out_clz(s, rexw, args[0], args[1], args[2], const_args[2]); break; case INDEX_op_brcond_i32: @@ -2281,7 +2332,7 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_sar_i32, { "r", "0", "Ci" } }, { INDEX_op_rotl_i32, { "r", "0", "ci" } }, { INDEX_op_rotr_i32, { "r", "0", "ci" } }, - { INDEX_op_clz_i32, { "r", "r", "rW" } }, + { INDEX_op_clz_i32, { "r", "r", "ri" } }, { INDEX_op_ctz_i32, { "r", "r", "rW" } }, { INDEX_op_brcond_i32, { "r", "ri" } }, @@ -2344,7 +2395,7 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_sar_i64, { "r", "0", "Ci" } }, { INDEX_op_rotl_i64, { "r", "0", "ci" } }, { INDEX_op_rotr_i64, { "r", "0", "ci" } }, - { INDEX_op_clz_i64, { "r", "r", "rW" } }, + { INDEX_op_clz_i64, { "r", "r", "re" } }, { INDEX_op_ctz_i64, { "r", "r", "rW" } }, { INDEX_op_brcond_i64, { "r", "re" } }, @@ -2498,6 +2549,10 @@ static void tcg_target_init(TCGContext *s) need to probe for it. */ have_movbe = (c & bit_MOVBE) != 0; #endif +#ifndef have_lzcnt + /* LZCNT was introduced with AMD Barcelona and Intel Haswell CPUs. */ + have_lzcnt = (c & bit_LZCNT) != 0; +#endif } if (max >= 7) {