From patchwork Mon Dec 19 15:57:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 88493 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp1231683qgi; Mon, 19 Dec 2016 07:57:42 -0800 (PST) X-Received: by 10.84.217.199 with SMTP id d7mr35922785plj.165.1482163062205; Mon, 19 Dec 2016 07:57:42 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 3si18874471pls.10.2016.12.19.07.57.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Dec 2016 07:57:42 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-444744-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-444744-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-444744-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:in-reply-to:references:content-type :mime-version:message-id; q=dns; s=default; b=H86X9qd1ZxgyJ1I1+9 1v4A3Q2tQjE7+96Pl54xnSHXVPedAKwFyee6/hwoy9JpUILyBWLPPSJRY+Y3FmeL 0uOE6wBXl6nFtq2zde+SfYqQiN0HVPIFlVVvVELrKj2cFR03p0PR+7qD3wd1LlZU HziWBiRJKXbedw2e6d7rr1nGM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:in-reply-to:references:content-type :mime-version:message-id; s=default; bh=/ZfSQzuqcISE78Uo/uodGOuK 0pw=; b=GnzGj4wN3F+NSgm2v1CE4nvQzGdyyvMuA8PSWz66XCvbuhSihemixuog BgWunTbR40k3A/0VshfFYDnA5uPsLP3fstQGyKZDZNcYEsI8Wj12guyviqRl9o5O kvS4RwB8p6gOv2ckMctEt6FDWP3DWfTLXqoEW7xlJfZ1Fvw8mqo= Received: (qmail 50495 invoked by alias); 19 Dec 2016 15:57:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 50483 invoked by uid 89); 19 Dec 2016 15:57:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=aaron, phd, Aaron, 78, 6 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Dec 2016 15:57:14 +0000 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uBJFsO3d143858 for ; Mon, 19 Dec 2016 10:57:13 -0500 Received: from e17.ny.us.ibm.com (e17.ny.us.ibm.com [129.33.205.207]) by mx0b-001b2d01.pphosted.com with ESMTP id 27eec3gadt-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 19 Dec 2016 10:57:13 -0500 Received: from localhost by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 19 Dec 2016 10:57:12 -0500 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 19 Dec 2016 10:57:09 -0500 Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 397B06E8040; Mon, 19 Dec 2016 10:56:42 -0500 (EST) Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uBJFv9Zk16712080; Mon, 19 Dec 2016 15:57:09 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0DBF72803D; Mon, 19 Dec 2016 10:57:09 -0500 (EST) Received: from ragesh3a (unknown [9.85.129.212]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP id 8EA2E2803E; Mon, 19 Dec 2016 10:57:08 -0500 (EST) Subject: Re: [PATCH] builtin expansion of strncmp for rs6000 From: Aaron Sawdey To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, David Edelsohn , Bill Schmidt Date: Mon, 19 Dec 2016 09:57:07 -0600 In-Reply-To: <20161216231436.GD30845@gate.crashing.org> References: <1481819465.5501.5.camel@linux.vnet.ibm.com> <20161216231436.GD30845@gate.crashing.org> Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16121915-0040-0000-0000-00000229E414 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006278; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000198; SDB=6.00796056; UDB=6.00386305; IPR=6.00573869; BA=6.00004986; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013658; XFM=3.00000011; UTC=2016-12-19 15:57:11 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16121915-0041-0000-0000-0000061CE896 Message-Id: <1482163027.5501.8.camel@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-12-19_13:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1612190201 X-IsSubscribed: yes On Fri, 2016-12-16 at 17:14 -0600, Segher Boessenkool wrote: > Please repost.  Thanks, Hi Segher, Thanks for the review. Attached is an updated patch that should address the issues you noted. Bootstrap/regtest in progress on ppc64le -mcpu=power8, ok for trunk if results are clean? Thanks, Aaron -- Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 243799) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -78,6 +78,7 @@ extern int expand_block_clear (rtx[]); extern int expand_block_move (rtx[]); extern bool expand_block_compare (rtx[]); +extern bool expand_strn_compare (rtx[]); extern const char * rs6000_output_load_multiple (rtx[]); extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode); extern bool rs6000_is_valid_and_mask (rtx, machine_mode); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 243799) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -19382,7 +19382,388 @@ return true; } +/* Generate alignment check and branch code to set up for + strncmp when we don't have DI alignment. + STRNCMP_LABEL is the label to branch if there is a page crossing. + SRC is the string pointer to be examined. + BYTES is the max number of bytes to compare. */ +static void +expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes) +{ + rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label); + rtx src_check = copy_addr_to_reg (XEXP (src, 0)); + if (GET_MODE (src_check) == SImode) + emit_insn (gen_andsi3 (src_check, src_check, GEN_INT (0xfff))); + else + emit_insn (gen_anddi3 (src_check, src_check, GEN_INT (0xfff))); + rtx cond = gen_reg_rtx (CCmode); + emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check, + GEN_INT (4096 - bytes))); + rtx cmp_rtx = gen_rtx_LT (VOIDmode, cond, const0_rtx); + + rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, + pc_rtx, lab_ref); + rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + JUMP_LABEL (j) = strncmp_label; + LABEL_NUSES (strncmp_label) += 1; +} + +/* Expand a string compare operation with length, and return + true if successful. Return false if we should let the + compiler generate normal code, probably a strncmp call. + + OPERANDS[0] is the target (result). + OPERANDS[1] is the first source. + OPERANDS[2] is the second source. + OPERANDS[3] is the length. + OPERANDS[4] is the alignment in bytes. */ +bool +expand_strn_compare (rtx operands[]) +{ + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; + HOST_WIDE_INT cmp_bytes = 0; + rtx src1 = orig_src1; + rtx src2 = orig_src2; + + /* If this is not a fixed size compare, just call strncmp. */ + if (!CONST_INT_P (bytes_rtx)) + return false; + + /* This must be a fixed size alignment. */ + if (!CONST_INT_P (align_rtx)) + return false; + + int base_align = INTVAL (align_rtx); + int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT; + int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT; + + /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff. */ + if (SLOW_UNALIGNED_ACCESS (word_mode, align1) + || SLOW_UNALIGNED_ACCESS (word_mode, align2)) + return false; + + gcc_assert (GET_MODE (target) == SImode); + + HOST_WIDE_INT bytes = INTVAL (bytes_rtx); + + /* If we have an LE target without ldbrx and word_mode is DImode, + then we must avoid using word_mode. */ + int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX + && word_mode == DImode); + + int word_mode_size = GET_MODE_SIZE (word_mode); + + int offset = 0; + machine_mode load_mode = + select_block_compare_mode (offset, bytes, base_align, word_mode_ok); + int load_mode_size = GET_MODE_SIZE (load_mode); + + /* We don't want to generate too much code. Also if bytes is + 4096 or larger we always want the library strncmp anyway. */ + int groups = ROUND_UP (bytes, load_mode_size) / load_mode_size; + if (bytes >= 4096 || groups > rs6000_string_compare_inline_limit) + return false; + + rtx result_reg = gen_reg_rtx (word_mode); + rtx final_move_label = gen_label_rtx (); + rtx final_label = gen_label_rtx (); + rtx begin_compare_label = NULL; + + if (base_align < 8) + { + /* Generate code that checks distance to 4k boundary for this case. */ + begin_compare_label = gen_label_rtx (); + rtx strncmp_label = gen_label_rtx (); + rtx jmp; + + /* Strncmp for power8 in glibc does this: + rldicl r8,r3,0,52 + cmpldi cr7,r8,4096-16 + bgt cr7,L(pagecross) */ + + if (align1 < 8) + expand_strncmp_align_check (strncmp_label, src1, bytes); + if (align2 < 8) + expand_strncmp_align_check (strncmp_label, src2, bytes); + + /* Now generate the following sequence: + - branch to begin_compare + - strncmp_label + - call to strncmp + - branch to final_label + - begin_compare_label */ + + rtx cmp_ref = gen_rtx_LABEL_REF (VOIDmode, begin_compare_label); + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, cmp_ref)); + JUMP_LABEL(jmp) = begin_compare_label; + LABEL_NUSES (begin_compare_label) += 1; + emit_barrier (); + + emit_label (strncmp_label); + + if (!REG_P (XEXP (src1, 0))) + { + rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); + src1 = replace_equiv_address (src1, src1_reg); + } + + if (!REG_P (XEXP (src2, 0))) + { + rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0)); + src2 = replace_equiv_address (src2, src2_reg); + } + + /* -m32 -mpowerpc64 results in word_mode being DImode even + though otherwise it is 32-bit. The length arg to strncmp + is a size_t which will be the same size as pointers. */ + rtx len_rtx; + if (TARGET_64BIT) + len_rtx = gen_reg_rtx(DImode); + else + len_rtx = gen_reg_rtx(SImode); + + emit_move_insn (len_rtx, bytes_rtx); + + emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"), + target, LCT_NORMAL, GET_MODE (target), 3, + force_reg (Pmode, XEXP (src1, 0)), Pmode, + force_reg (Pmode, XEXP (src2, 0)), Pmode, + len_rtx, GET_MODE (len_rtx)); + + rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label); + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref)); + JUMP_LABEL (jmp) = final_label; + LABEL_NUSES (final_label) += 1; + emit_barrier (); + emit_label (begin_compare_label); + } + + rtx cleanup_label = NULL; + rtx tmp_reg_src1 = gen_reg_rtx (word_mode); + rtx tmp_reg_src2 = gen_reg_rtx (word_mode); + + /* Generate sequence of ld/ldbrx, cmpb to compare out + to the length specified. */ + while (bytes > 0) + { + /* Compare sequence: + check each 8B with: ld/ld cmpd bne + cleanup code at end: + cmpb get byte that differs + cmpb look for zero byte + orc combine + cntlzd get bit of first zero/diff byte + subfic convert for rldcl use + rldcl rldcl extract diff/zero byte + subf subtract for final result + + The last compare can branch around the cleanup code if the + result is zero because the strings are exactly equal. */ + int align = compute_current_alignment (base_align, offset); + if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + load_mode = select_block_compare_mode (offset, bytes, align, + word_mode_ok); + else + load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok); + load_mode_size = GET_MODE_SIZE (load_mode); + if (bytes >= load_mode_size) + cmp_bytes = load_mode_size; + else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + { + /* Move this load back so it doesn't go past the end. + P8/P9 can do this efficiently. */ + int extra_bytes = load_mode_size - bytes; + cmp_bytes = bytes; + if (extra_bytes < offset) + { + offset -= extra_bytes; + cmp_bytes = load_mode_size; + bytes = cmp_bytes; + } + } + else + /* P7 and earlier can't do the overlapping load trick fast, + so this forces a non-overlapping load and a shift to get + rid of the extra bytes. */ + cmp_bytes = bytes; + + src1 = adjust_address (orig_src1, load_mode, offset); + src2 = adjust_address (orig_src2, load_mode, offset); + + if (!REG_P (XEXP (src1, 0))) + { + rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); + src1 = replace_equiv_address (src1, src1_reg); + } + set_mem_size (src1, cmp_bytes); + + if (!REG_P (XEXP (src2, 0))) + { + rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0)); + src2 = replace_equiv_address (src2, src2_reg); + } + set_mem_size (src2, cmp_bytes); + + do_load_for_compare (tmp_reg_src1, src1, load_mode); + do_load_for_compare (tmp_reg_src2, src2, load_mode); + + /* We must always left-align the data we read, and + clear any bytes to the right that are beyond the string. + Otherwise the cmpb sequence won't produce the correct + results. The beginning of the compare will be done + with word_mode so will not have any extra shifts or + clear rights. */ + + if (load_mode_size < word_mode_size) + { + /* Rotate left first. */ + rtx sh = GEN_INT (BITS_PER_UNIT * (word_mode_size - load_mode_size)); + if (word_mode == DImode) + { + emit_insn (gen_rotldi3 (tmp_reg_src1, tmp_reg_src1, sh)); + emit_insn (gen_rotldi3 (tmp_reg_src2, tmp_reg_src2, sh)); + } + else + { + emit_insn (gen_rotlsi3 (tmp_reg_src1, tmp_reg_src1, sh)); + emit_insn (gen_rotlsi3 (tmp_reg_src2, tmp_reg_src2, sh)); + } + } + + if (cmp_bytes < word_mode_size) + { + /* Now clear right. This plus the rotate can be + turned into a rldicr instruction. */ + HOST_WIDE_INT mb = BITS_PER_UNIT * (word_mode_size - cmp_bytes); + rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb); + if (word_mode == DImode) + { + emit_insn (gen_anddi3_mask (tmp_reg_src1, tmp_reg_src1, mask)); + emit_insn (gen_anddi3_mask (tmp_reg_src2, tmp_reg_src2, mask)); + } + else + { + emit_insn (gen_andsi3_mask (tmp_reg_src1, tmp_reg_src1, mask)); + emit_insn (gen_andsi3_mask (tmp_reg_src2, tmp_reg_src2, mask)); + } + } + + int remain = bytes - cmp_bytes; + + rtx dst_label; + if (remain > 0) + { + if (!cleanup_label) + cleanup_label = gen_label_rtx (); + dst_label = cleanup_label; + } + else + dst_label = final_move_label; + + rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label); + rtx cond = gen_reg_rtx (CCmode); + + if (remain == 0) + { + /* For the last chunk, subf. also + generates the zero result we need. */ + rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2); + rs6000_emit_dot_insn (result_reg, tmp, 1, cond); + } + else + emit_move_insn (cond, gen_rtx_COMPARE (CCmode, + tmp_reg_src1, tmp_reg_src2)); + + rtx cmp_rtx; + if (remain > 0) + cmp_rtx = gen_rtx_NE (VOIDmode, cond, const0_rtx); + else + cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx); + + rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, + lab_ref, pc_rtx); + rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + JUMP_LABEL (j) = dst_label; + LABEL_NUSES (dst_label) += 1; + + offset += cmp_bytes; + bytes -= cmp_bytes; + } + + if (cleanup_label) + emit_label (cleanup_label); + + /* Generate the final sequence that identifies the differing + byte and generates the final result, taking into account + zero bytes: + + cmpb cmpb_result1, src1, src2 + cmpb cmpb_result2, src1, zero + orc cmpb_result1, cmp_result1, cmpb_result2 + cntlzd get bit of first zero/diff byte + addi convert for rldcl use + rldcl rldcl extract diff/zero byte + subf subtract for final result + */ + + rtx cmpb_diff = gen_reg_rtx (word_mode); + rtx cmpb_zero = gen_reg_rtx (word_mode); + rtx rot_amt = gen_reg_rtx (word_mode); + rtx zero_reg = gen_reg_rtx (word_mode); + + rtx rot1_1 = gen_reg_rtx(word_mode); + rtx rot1_2 = gen_reg_rtx(word_mode); + rtx rot2_1 = gen_reg_rtx(word_mode); + rtx rot2_2 = gen_reg_rtx(word_mode); + + if (word_mode == SImode) + { + emit_insn (gen_cmpbsi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2)); + emit_insn (gen_movsi (zero_reg, GEN_INT(0))); + emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg)); + emit_insn (gen_one_cmplsi2 (cmpb_diff,cmpb_diff)); + emit_insn (gen_iorsi3 (cmpb_diff, cmpb_diff, cmpb_zero)); + emit_insn (gen_clzsi2 (rot_amt, cmpb_diff)); + emit_insn (gen_addsi3 (rot_amt, rot_amt, GEN_INT (8))); + emit_insn (gen_rotlsi3 (rot1_1, tmp_reg_src1, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_andsi3_mask (rot1_2, rot1_1, GEN_INT(0xff))); + emit_insn (gen_rotlsi3 (rot2_1, tmp_reg_src2, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_andsi3_mask (rot2_2, rot2_1, GEN_INT(0xff))); + emit_insn (gen_subsi3 (result_reg, rot1_2, rot2_2)); + } + else + { + emit_insn (gen_cmpbdi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2)); + emit_insn (gen_movdi (zero_reg, GEN_INT(0))); + emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg)); + emit_insn (gen_one_cmpldi2 (cmpb_diff,cmpb_diff)); + emit_insn (gen_iordi3 (cmpb_diff, cmpb_diff, cmpb_zero)); + emit_insn (gen_clzdi2 (rot_amt, cmpb_diff)); + emit_insn (gen_adddi3 (rot_amt, rot_amt, GEN_INT (8))); + emit_insn (gen_rotldi3 (rot1_1, tmp_reg_src1, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_anddi3_mask (rot1_2, rot1_1, GEN_INT(0xff))); + emit_insn (gen_rotldi3 (rot2_1, tmp_reg_src2, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_anddi3_mask (rot2_2, rot2_1, GEN_INT(0xff))); + emit_insn (gen_subdi3 (result_reg, rot1_2, rot2_2)); + } + + emit_label (final_move_label); + emit_insn (gen_movsi (target, + gen_lowpart (SImode, result_reg))); + emit_label (final_label); + return true; +} + + /* Expand a block move operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 243799) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -117,6 +117,7 @@ UNSPEC_BPERM UNSPEC_COPYSIGN UNSPEC_PARITY + UNSPEC_CMPB UNSPEC_FCTIW UNSPEC_FCTID UNSPEC_LFIWAX @@ -2316,6 +2317,13 @@ "prty %0,%1" [(set_attr "type" "popcnt")]) +(define_insn "cmpb3" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") + (unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r") + (match_operand:GPR 2 "gpc_reg_operand" "r")] UNSPEC_CMPB))] + "TARGET_CMPB" + "cmpb %0,%1,%2" + [(set_attr "type" "cmp")]) ;; Since the hardware zeros the upper part of the register, save generating the ;; AND immediate if we are converting to unsigned @@ -8844,7 +8852,7 @@ FAIL; }") -;; String/block compare insn. +;; String compare N insn. ;; Argument 0 is the target (result) ;; Argument 1 is the destination ;; Argument 2 is the source @@ -8851,6 +8859,27 @@ ;; Argument 3 is the length ;; Argument 4 is the alignment +(define_expand "cmpstrnsi" + [(parallel [(set (match_operand:SI 0) + (compare:SI (match_operand:BLK 1) + (match_operand:BLK 2))) + (use (match_operand:SI 3)) + (use (match_operand:SI 4))])] + "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)" +{ + if (expand_strn_compare (operands)) + DONE; + else + FAIL; +}) + +;; Block compare insn. +;; Argument 0 is the target (result) +;; Argument 1 is the destination +;; Argument 2 is the source +;; Argument 3 is the length +;; Argument 4 is the alignment + (define_expand "cmpmemsi" [(parallel [(set (match_operand:SI 0) (compare:SI (match_operand:BLK 1) Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 243799) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -337,6 +337,10 @@ Target Report Var(rs6000_block_compare_inline_limit) Init(5) RejectNegative Joined UInteger Save Specify the maximum number pairs of load instructions that should be generated inline for the compare. If the number needed exceeds the limit, a call to memcmp will be generated instead. +mstring-compare-inline-limit= +Target Report Var(rs6000_string_compare_inline_limit) Init(8) RejectNegative Joined UInteger Save +Specify the maximum number pairs of load instructions that should be generated inline for the compare. If the number needed exceeds the limit, a call to strncmp will be generated instead. + misel Target Report Mask(ISEL) Var(rs6000_isa_flags) Generate isel instructions.