[bugfix] builtin expansion of strcmp for rs6000

Message ID 1484155600.10559.2.camel@linux.vnet.ibm.com
State New
Headers show

Commit Message

Aaron Sawdey Jan. 11, 2017, 5:26 p.m.
This expands on the previous patch. For strcmp and for strncmp with N
larger than 64, the first 64 bytes of comparison is expanded inline and
then a call to strcmp or strncmp is emitted to compare the remainder if
the strings are found to be equal at that point. 

-mstring-compare-inline-limit=N determines how many block comparisons
are emitted. With the default 8, and 64-bit code, you get 64 bytes. 

Performance testing on a power8 system shows that the code is anywhere
from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on
alignment and length.

In the process of adding this I discovered that the expansion being
generated for strncmp had a bug in that it was not testing for a zero
byte to terminate the comparison. As a result inputs like
strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal. The
initial comparison of a doubleword would be equal so a second one would
be fetched and compared, ignoring the zero byte that should have
terminated comparison. The fix is to use a cmpb to check for zero bytes
in the equality case before comparing the next chunk. I updated the
strncmp-1.c test case to check for this, and also added a new 
strcmp-1.c test case to check strcmp expansion. Also both now have a
length 100 tests to check the transition from the inline comparison to
the library call for the remainder.

ChangeLog
2017-01-11  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
	* config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg.
	* config/rs6000/rs6000.c (expand_strn_compare): Add ability to expand
	strcmp. Fix bug where comparison didn't stop with zero byte.
	* config/rs6000/rs6000.md (cmpstrnsi): Args to expand_strn_compare.
	(cmpstrsi): Add pattern.

gcc.dg/ChangeLog
2017-01-11  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
	* gcc.dg/strcmp-1.c: New.
	* gcc.dg/strncmp-1.c: Add test for a bug that escaped.




-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Comments

Aaron Sawdey Jan. 16, 2017, 9:09 p.m. | #1
Here is an updated version of this patch. 

Tulio noted that glibc's strncmp test was failing. This turned out to
be the use of signed HOST_WIDE_INT for handling strncmp length. The
glibc test calls strncmp with length 2^64-1, presumably to provoke
exactly this type of bug. Fixing the issue required changing
select_block_compare_mode() and expand_block_compare() as well.

The other change is if we must emit a runtime check for the 4k
crossing, then we might as well set base_align to 8 and emit the best
possible code.

OK for trunk if bootstrap/regtest passes on ppc64/ppc64le?

Thanks,
  Aaron

On Wed, 2017-01-11 at 11:26 -0600, Aaron Sawdey wrote:
> This expands on the previous patch. For strcmp and for strncmp with N

> larger than 64, the first 64 bytes of comparison is expanded inline

> and

> then a call to strcmp or strncmp is emitted to compare the remainder

> if

> the strings are found to be equal at that point. 

> 

> -mstring-compare-inline-limit=N determines how many block comparisons

> are emitted. With the default 8, and 64-bit code, you get 64 bytes. 

> 

> Performance testing on a power8 system shows that the code is

> anywhere

> from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on

> alignment and length.

> 

> In the process of adding this I discovered that the expansion being

> generated for strncmp had a bug in that it was not testing for a zero

> byte to terminate the comparison. As a result inputs like

> strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal.

> The

> initial comparison of a doubleword would be equal so a second one

> would

> be fetched and compared, ignoring the zero byte that should have

> terminated comparison. The fix is to use a cmpb to check for zero

> bytes

> in the equality case before comparing the next chunk. I updated the

> strncmp-1.c test case to check for this, and also added a new 

> strcmp-1.c test case to check strcmp expansion. Also both now have a

> length 100 tests to check the transition from the inline comparison

> to

> the library call for the remainder.

> 

> ChangeLog

> 2017-01-11  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

> 	* config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg.

> 	* config/rs6000/rs6000.c (expand_strn_compare): Add ability to

> expand

> 	strcmp. Fix bug where comparison didn't stop with zero byte.

> 	* config/rs6000/rs6000.md (cmpstrnsi): Args to

> expand_strn_compare.

> 	(cmpstrsi): Add pattern.

> 

> gcc.dg/ChangeLog

> 2017-01-11  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

> 	* gcc.dg/strcmp-1.c: New.

> 	* gcc.dg/strncmp-1.c: Add test for a bug that escaped.

> 

> 

> 

> 

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 244503)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,7 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
-extern bool expand_strn_compare (rtx[]);
+extern bool expand_strn_compare (rtx[], int);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 244503)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19310,7 +19310,8 @@
    WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
    the largest allowable mode.  */
 static machine_mode
-select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,
+select_block_compare_mode (unsigned HOST_WIDE_INT offset,
+			   unsigned HOST_WIDE_INT bytes,
 			   HOST_WIDE_INT align, bool word_mode_ok)
 {
   /* First see if we can do a whole load unit
@@ -19322,7 +19323,7 @@
      Do largest chunk possible without violating alignment rules.  */
 
   /* The most we can read without potential page crossing.  */
-  HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
+  unsigned HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
 
   if (word_mode_ok && bytes >= UNITS_PER_WORD)
     return word_mode;
@@ -19410,8 +19411,8 @@
   gcc_assert (GET_MODE (target) == SImode);
 
   /* Anything to move? */
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-  if (bytes <= 0)
+  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+  if (bytes == 0)
     return true;
 
   /* The code generated for p7 and older is not faster than glibc
@@ -19432,14 +19433,14 @@
 
   /* Strategy phase.  How many ops will this take and should we expand it?  */
 
-  int offset = 0;
+  unsigned HOST_WIDE_INT offset = 0;
   machine_mode load_mode =
     select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
-  int load_mode_size = GET_MODE_SIZE (load_mode);
+  unsigned int load_mode_size = GET_MODE_SIZE (load_mode);
 
   /* We don't want to generate too much code.  */
   if (ROUND_UP (bytes, load_mode_size) / load_mode_size
-      > rs6000_block_compare_inline_limit)
+      > (unsigned HOST_WIDE_INT)rs6000_block_compare_inline_limit)
     return false;
 
   bool generate_6432_conversion = false;
@@ -19483,7 +19484,7 @@
 	{
 	  /* Move this load back so it doesn't go past the end.
 	     P8/P9 can do this efficiently.  */
-	  int extra_bytes = load_mode_size - bytes;
+	  unsigned int extra_bytes = load_mode_size - bytes;
 	  cmp_bytes = bytes;
 	  if (extra_bytes < offset)
 	    {
@@ -19497,7 +19498,7 @@
 	   so this forces a non-overlapping load and a shift to get
 	   rid of the extra bytes.  */
 	cmp_bytes = bytes;
-      
+
       src1 = adjust_address (orig_src1, load_mode, offset);
       src2 = adjust_address (orig_src2, load_mode, offset);
 
@@ -19682,22 +19683,36 @@
    OPERANDS[0] is the target (result).
    OPERANDS[1] is the first source.
    OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
    OPERANDS[3] is the length.
-   OPERANDS[4] is the alignment in bytes.  */
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.  */
 bool
-expand_strn_compare (rtx operands[])
+expand_strn_compare (rtx operands[], int no_length)
 {
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
-  HOST_WIDE_INT cmp_bytes = 0;
+  rtx bytes_rtx, align_rtx;
+  if (no_length)
+    {
+      bytes_rtx = NULL;
+      align_rtx = operands[3];
+    }
+  else
+    {
+      bytes_rtx = operands[3];
+      align_rtx = operands[4];
+    }
+  unsigned HOST_WIDE_INT cmp_bytes = 0;
   rtx src1 = orig_src1;
   rtx src2 = orig_src2;
 
-  /* If this is not a fixed size compare, just call strncmp.  */
-  if (!CONST_INT_P (bytes_rtx))
+  /* If we have a length, it must be constant. This simplifies things
+     a bit as we don't have to generate code to check if we've exceeded
+     the length. Later this could be expanded to handle this case.  */
+  if (!no_length && !CONST_INT_P (bytes_rtx))
     return false;
 
   /* This must be a fixed size alignment.  */
@@ -19715,26 +19730,45 @@
 
   gcc_assert (GET_MODE (target) == SImode);
 
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-
   /* If we have an LE target without ldbrx and word_mode is DImode,
      then we must avoid using word_mode.  */
   int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
 		       && word_mode == DImode);
 
-  int word_mode_size = GET_MODE_SIZE (word_mode);
+  unsigned int word_mode_size = GET_MODE_SIZE (word_mode);
 
-  int offset = 0;
+  unsigned HOST_WIDE_INT offset = 0;
+  unsigned HOST_WIDE_INT bytes; /* N from the strncmp args if available.  */
+  unsigned HOST_WIDE_INT compare_length; /* How much to compare inline.  */
+  if (no_length)
+    /* Use this as a standin to determine the mode to use.  */
+    bytes = rs6000_string_compare_inline_limit * word_mode_size;
+  else
+    bytes = INTVAL (bytes_rtx);
+
   machine_mode load_mode =
     select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
-  int load_mode_size = GET_MODE_SIZE (load_mode);
+  unsigned int load_mode_size = GET_MODE_SIZE (load_mode);
+  compare_length = rs6000_string_compare_inline_limit * load_mode_size;
 
-  /* We don't want to generate too much code.  Also if bytes is
-     4096 or larger we always want the library strncmp anyway.  */
-  int groups = ROUND_UP (bytes, load_mode_size) / load_mode_size;
-  if (bytes >= 4096 || groups > rs6000_string_compare_inline_limit)
-    return false;
+  /* If we have equality at the end of the last compare and we have not
+     found the end of the string, we need to call strcmp/strncmp to
+     compare the remainder.  */
+  bool equality_compare_rest = false;
 
+  if (no_length)
+    {
+      bytes = compare_length;
+      equality_compare_rest = true;
+    }
+  else
+    {
+      if (bytes <= compare_length)
+	compare_length = bytes;
+      else
+	equality_compare_rest = true;
+    }
+
   rtx result_reg = gen_reg_rtx (word_mode);
   rtx final_move_label = gen_label_rtx ();
   rtx final_label = gen_label_rtx ();
@@ -19753,10 +19787,14 @@
 	 bgt	cr7,L(pagecross) */
 
       if (align1 < 8)
-	expand_strncmp_align_check (strncmp_label, src1, bytes);
+	expand_strncmp_align_check (strncmp_label, src1, compare_length);
       if (align2 < 8)
-	expand_strncmp_align_check (strncmp_label, src2, bytes);
+	expand_strncmp_align_check (strncmp_label, src2, compare_length);
 
+      /* After the runtime alignment checks, we can use any alignment we
+	 like as we know there is no 4k boundary crossing.  */
+      base_align = 8;
+
       /* Now generate the following sequence:
 	 - branch to begin_compare
 	 - strncmp_label
@@ -19784,22 +19822,30 @@
 	  src2 = replace_equiv_address (src2, src2_reg);
 	}
 
-      /* -m32 -mpowerpc64 results in word_mode being DImode even
-	 though otherwise it is 32-bit. The length arg to strncmp
-	 is a size_t which will be the same size as pointers.  */
-      rtx len_rtx;
-      if (TARGET_64BIT)
-	len_rtx = gen_reg_rtx(DImode);
+      if (no_length)
+	emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
+				 target, LCT_NORMAL, GET_MODE (target), 2,
+				 force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				 force_reg (Pmode, XEXP (src2, 0)), Pmode);
       else
-	len_rtx = gen_reg_rtx(SImode);
+	{
+	  /* -m32 -mpowerpc64 results in word_mode being DImode even
+	     though otherwise it is 32-bit. The length arg to strncmp
+	     is a size_t which will be the same size as pointers.  */
+	  rtx len_rtx;
+	  if (TARGET_64BIT)
+	    len_rtx = gen_reg_rtx(DImode);
+	  else
+	    len_rtx = gen_reg_rtx(SImode);
 
-      emit_move_insn (len_rtx, bytes_rtx);
+	  emit_move_insn (len_rtx, bytes_rtx);
 
-      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
-			       target, LCT_NORMAL, GET_MODE (target), 3,
-			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
-			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
-			       len_rtx, GET_MODE (len_rtx));
+	  emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+				   target, LCT_NORMAL, GET_MODE (target), 3,
+				   force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				   force_reg (Pmode, XEXP (src2, 0)), Pmode,
+				   len_rtx, GET_MODE (len_rtx));
+	}
 
       rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
       jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
@@ -19815,10 +19861,12 @@
 
   /* Generate sequence of ld/ldbrx, cmpb to compare out
      to the length specified.  */
-  while (bytes > 0)
+  unsigned HOST_WIDE_INT bytes_to_compare = compare_length;
+  while (bytes_to_compare > 0)
     {
       /* Compare sequence:
          check each 8B with: ld/ld cmpd bne
+	 If equal, use rldicr/cmpb to check for zero byte.
          cleanup code at end:
          cmpb          get byte that differs
          cmpb          look for zero byte
@@ -19832,24 +19880,25 @@
          result is zero because the strings are exactly equal.  */
       int align = compute_current_alignment (base_align, offset);
       if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
-	load_mode = select_block_compare_mode (offset, bytes, align,
+	load_mode = select_block_compare_mode (offset, bytes_to_compare, align,
 					       word_mode_ok);
       else
-	load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok);
+	load_mode = select_block_compare_mode (0, bytes_to_compare, align,
+					       word_mode_ok);
       load_mode_size = GET_MODE_SIZE (load_mode);
-      if (bytes >= load_mode_size)
+      if (bytes_to_compare >= load_mode_size)
 	cmp_bytes = load_mode_size;
       else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
 	{
 	  /* Move this load back so it doesn't go past the end.
 	     P8/P9 can do this efficiently.  */
-	  int extra_bytes = load_mode_size - bytes;
-	  cmp_bytes = bytes;
+	  unsigned int extra_bytes = load_mode_size - bytes_to_compare;
+	  cmp_bytes = bytes_to_compare;
 	  if (extra_bytes < offset)
 	    {
 	      offset -= extra_bytes;
 	      cmp_bytes = load_mode_size;
-	      bytes = cmp_bytes;
+	      bytes_to_compare = cmp_bytes;
 	    }
 	}
       else
@@ -19856,7 +19905,7 @@
 	/* P7 and earlier can't do the overlapping load trick fast,
 	   so this forces a non-overlapping load and a shift to get
 	   rid of the extra bytes.  */
-	cmp_bytes = bytes;
+	cmp_bytes = bytes_to_compare;
 
       src1 = adjust_address (orig_src1, load_mode, offset);
       src2 = adjust_address (orig_src2, load_mode, offset);
@@ -19919,37 +19968,45 @@
 	    }
 	}
 
-      int remain = bytes - cmp_bytes;
+      /* Cases to handle.  A and B are chunks of the two strings.
+         1: Not end of comparison:
+	   A != B: branch to cleanup code to compute result.
+           A == B: check for 0 byte, next block if not found.
+         2: End of the inline comparison:
+	   A != B: branch to cleanup code to compute result.
+           A == B: check for 0 byte, call strcmp/strncmp
+	 3: compared requested N bytes:
+	   A == B: branch to result 0.
+           A != B: cleanup code to compute result.  */
 
+      unsigned HOST_WIDE_INT remain = bytes_to_compare - cmp_bytes;
+
       rtx dst_label;
-      if (remain > 0)
+      if (remain > 0 || equality_compare_rest)
 	{
+	  /* Branch to cleanup code, otherwise fall through to do
+	     more compares.  */
 	  if (!cleanup_label)
 	    cleanup_label = gen_label_rtx ();
 	  dst_label = cleanup_label;
 	}
       else
+	/* Branch to end and produce result of 0.  */
 	dst_label = final_move_label;
 
       rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label);
       rtx cond = gen_reg_rtx (CCmode);
 
-      if (remain == 0)
-	{
-	  /* For the last chunk, subf. also
-	     generates the zero result we need.  */
-	  rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
-	  rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
-	}
-      else
-	emit_move_insn (cond, gen_rtx_COMPARE (CCmode,
-					       tmp_reg_src1, tmp_reg_src2));
+      /* Always produce the 0 result, it is needed if
+	 cmpb finds a 0 byte in this chunk.  */
+      rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
+      rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
 
       rtx cmp_rtx;
-      if (remain > 0)
+      if (remain == 0 && !equality_compare_rest)
+	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
+      else
 	cmp_rtx = gen_rtx_NE (VOIDmode, cond, const0_rtx);
-      else
-	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
 
       rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
 					 lab_ref, pc_rtx);
@@ -19957,10 +20014,102 @@
       JUMP_LABEL (j) = dst_label;
       LABEL_NUSES (dst_label) += 1;
 
+      if (remain > 0 || equality_compare_rest)
+	{
+	  /* Generate a cmpb to test for a 0 byte and branch
+	     to final result if found.  */
+	  rtx cmpb_zero = gen_reg_rtx (word_mode);
+	  rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label);
+	  rtx condz = gen_reg_rtx (CCmode);
+	  rtx zero_reg = gen_reg_rtx (word_mode);
+	  if (word_mode == SImode)
+	    {
+	      emit_insn (gen_movsi (zero_reg, GEN_INT(0)));
+	      emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+	      if ( cmp_bytes < word_mode_size )
+		{ /* Don't want to look at zero bytes past end.  */
+		  HOST_WIDE_INT mb =
+		    BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+		  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);
+		  emit_insn (gen_andsi3_mask (cmpb_zero, cmpb_zero, mask));
+		}
+	    }
+	  else
+	    {
+	      emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
+	      emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+	      if ( cmp_bytes < word_mode_size )
+		{ /* Don't want to look at zero bytes past end.  */
+		  HOST_WIDE_INT mb =
+		    BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+		  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);
+		  emit_insn (gen_anddi3_mask (cmpb_zero, cmpb_zero, mask));
+		}
+	    }
+
+	  emit_move_insn (condz, gen_rtx_COMPARE (CCmode, cmpb_zero, zero_reg));
+	  rtx cmpnz_rtx = gen_rtx_NE (VOIDmode, condz, const0_rtx);
+	  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmpnz_rtx,
+					     lab_ref_fin, pc_rtx);
+	  rtx j2 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+	  JUMP_LABEL (j2) = final_move_label;
+	  LABEL_NUSES (final_move_label) += 1;
+
+	}
+
       offset += cmp_bytes;
-      bytes -= cmp_bytes;
+      bytes_to_compare -= cmp_bytes;
     }
 
+  if (equality_compare_rest)
+    {
+      /* Update pointers past what has been compared already.  */
+      src1 = adjust_address (orig_src1, load_mode, offset);
+      src2 = adjust_address (orig_src2, load_mode, offset);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+      set_mem_size (src1, cmp_bytes);
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+      set_mem_size (src2, cmp_bytes);
+
+      /* Construct call to strcmp/strncmp to compare the rest of the string.  */
+      if (no_length)
+	emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
+				 target, LCT_NORMAL, GET_MODE (target), 2,
+				 force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				 force_reg (Pmode, XEXP (src2, 0)), Pmode);
+      else
+	{
+	  rtx len_rtx;
+	  if (TARGET_64BIT)
+	    len_rtx = gen_reg_rtx(DImode);
+	  else
+	    len_rtx = gen_reg_rtx(SImode);
+
+	  emit_move_insn (len_rtx, GEN_INT (bytes - compare_length));
+	  emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+				   target, LCT_NORMAL, GET_MODE (target), 3,
+				   force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				   force_reg (Pmode, XEXP (src2, 0)), Pmode,
+				   len_rtx, GET_MODE (len_rtx));
+	}
+
+      rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
+      rtx jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
+      JUMP_LABEL (jmp) = final_label;
+      LABEL_NUSES (final_label) += 1;
+      emit_barrier ();
+    }
+
   if (cleanup_label)
     emit_label (cleanup_label);
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 244503)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -9102,12 +9102,31 @@
 	      (use (match_operand:SI 4))])]
   "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
 {
-  if (expand_strn_compare (operands))
+  if (expand_strn_compare (operands, 0))
     DONE;
   else	
     FAIL;
 })
 
+;; String compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the destination
+;; Argument 2 is the source
+;; Argument 3 is the alignment
+
+(define_expand "cmpstrsi"
+  [(parallel [(set (match_operand:SI 0)
+               (compare:SI (match_operand:BLK 1)
+                           (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))])]
+  "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
+{
+  if (expand_strn_compare (operands, 1))
+    DONE;
+  else	
+    FAIL;
+})
+
 ;; Block compare insn.
 ;; Argument 0 is the target (result)
 ;; Argument 1 is the destination
Index: gcc/testsuite/gcc.dg/strcmp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/strcmp-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/strcmp-1.c	(working copy)
@@ -0,0 +1,635 @@
+/* Test strcmp builtin expansion for compilation and proper execution.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target ptr32plus } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define RUN_TEST(SZ, ALIGN) test_strcmp_ ## SZ ## _ ## ALIGN ()
+
+#define DEF_TEST(SZ, ALIGN)                                                 \
+static void test_strcmp_ ## SZ ## _ ## ALIGN (void) {     		    \
+  char one[3 * (SZ > 10 ? SZ : 10)];					    \
+  char two[3 * (SZ > 10 ? SZ : 10)];					    \
+  char three[8192] __attribute__ ((aligned (4096)));       		    \
+  char four[8192] __attribute__ ((aligned (4096)));        		    \
+  int i,j;                                                                  \
+  memset(one,0,sizeof(one));				   		    \
+  memset(two,0,sizeof(two));				   		    \
+  memset(three,0,sizeof(three));			   		    \
+  memset(four,0,sizeof(four));				   		    \
+  for (i = 0 ; i < SZ ; i++)			   		            \
+    {							   		    \
+      int r1;					           		    \
+      char *a = one + (i & 1) * ALIGN;			   		    \
+      char *b = two + (i & 1) * ALIGN;			   		    \
+      memset(a, '-', SZ);					   	    \
+      memset(b, '-', SZ);					   	    \
+      a[i] = '1';					   		    \
+      b[i] = '2';					   		    \
+      a[SZ] = 0;							    \
+      b[SZ] = 0;					   		    \
+      if (!((r1 = strcmp (b, a)) > 0))   		   		    \
+	abort ();							    \
+      if (!((r1 = strcmp (a, b)) < 0))			   	            \
+	abort ();							    \
+      b[i] = '1';					   		    \
+      if (!((r1 = strcmp (a, b)) == 0))		   		            \
+	abort ();							    \
+      for(j = i; j < SZ ; j++)			   		            \
+	{						   		    \
+	  a[j] = '1';            			   		    \
+	  b[j] = '2';			                   		    \
+	}						   		    \
+      if (!((r1 = strcmp (b, a)) > 0))		   		            \
+	abort ();							    \
+      if (!((r1 = strcmp (a, b)) < 0))		   		            \
+	abort ();							    \
+      for(j = 0; j < i ; j++)						    \
+        {								    \
+	  memset(a, '-', SZ);						    \
+	  memset(b, '-', SZ);						    \
+	  a[j] = '\0';							    \
+	  a[j+1] = '1';							    \
+	  b[j] = '\0';							    \
+	  b[j+1] = '2';							    \
+	  if ((r1 = strcmp (b, a)) != 0)				    \
+	    abort ();							    \
+	}                                                                   \
+      a = three + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
+      b = four + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
+      memset(a, '-', SZ);					   	    \
+      memset(b, '-', SZ);					   	    \
+      a[i] = '1';					   		    \
+      b[i] = '2';					   		    \
+      a[SZ] = 0;							    \
+      b[SZ] = 0;					   		    \
+      if (!((r1 = strcmp(b, a)) > 0))   		   		    \
+	abort ();							    \
+      if (!((r1 = strcmp(a, b)) < 0))			   	            \
+	abort ();							    \
+      b[i] = '1';					   		    \
+      if (!((r1 = strcmp(a, b)) == 0))		   		            \
+	abort ();							    \
+    }							                    \
+}                                                                
+
+#ifdef TEST_ALL
+DEF_TEST(1,1)
+DEF_TEST(1,2)
+DEF_TEST(1,4)
+DEF_TEST(1,8)
+DEF_TEST(1,16)
+DEF_TEST(2,1)
+DEF_TEST(2,2)
+DEF_TEST(2,4)
+DEF_TEST(2,8)
+DEF_TEST(2,16)
+DEF_TEST(3,1)
+DEF_TEST(3,2)
+DEF_TEST(3,4)
+DEF_TEST(3,8)
+DEF_TEST(3,16)
+DEF_TEST(4,1)
+DEF_TEST(4,2)
+DEF_TEST(4,4)
+DEF_TEST(4,8)
+DEF_TEST(4,16)
+DEF_TEST(5,1)
+DEF_TEST(5,2)
+DEF_TEST(5,4)
+DEF_TEST(5,8)
+DEF_TEST(5,16)
+DEF_TEST(6,1)
+DEF_TEST(6,2)
+DEF_TEST(6,4)
+DEF_TEST(6,8)
+DEF_TEST(6,16)
+DEF_TEST(7,1)
+DEF_TEST(7,2)
+DEF_TEST(7,4)
+DEF_TEST(7,8)
+DEF_TEST(7,16)
+DEF_TEST(8,1)
+DEF_TEST(8,2)
+DEF_TEST(8,4)
+DEF_TEST(8,8)
+DEF_TEST(8,16)
+DEF_TEST(9,1)
+DEF_TEST(9,2)
+DEF_TEST(9,4)
+DEF_TEST(9,8)
+DEF_TEST(9,16)
+DEF_TEST(10,1)
+DEF_TEST(10,2)
+DEF_TEST(10,4)
+DEF_TEST(10,8)
+DEF_TEST(10,16)
+DEF_TEST(11,1)
+DEF_TEST(11,2)
+DEF_TEST(11,4)
+DEF_TEST(11,8)
+DEF_TEST(11,16)
+DEF_TEST(12,1)
+DEF_TEST(12,2)
+DEF_TEST(12,4)
+DEF_TEST(12,8)
+DEF_TEST(12,16)
+DEF_TEST(13,1)
+DEF_TEST(13,2)
+DEF_TEST(13,4)
+DEF_TEST(13,8)
+DEF_TEST(13,16)
+DEF_TEST(14,1)
+DEF_TEST(14,2)
+DEF_TEST(14,4)
+DEF_TEST(14,8)
+DEF_TEST(14,16)
+DEF_TEST(15,1)
+DEF_TEST(15,2)
+DEF_TEST(15,4)
+DEF_TEST(15,8)
+DEF_TEST(15,16)
+DEF_TEST(16,1)
+DEF_TEST(16,2)
+DEF_TEST(16,4)
+DEF_TEST(16,8)
+DEF_TEST(16,16)
+DEF_TEST(17,1)
+DEF_TEST(17,2)
+DEF_TEST(17,4)
+DEF_TEST(17,8)
+DEF_TEST(17,16)
+DEF_TEST(18,1)
+DEF_TEST(18,2)
+DEF_TEST(18,4)
+DEF_TEST(18,8)
+DEF_TEST(18,16)
+DEF_TEST(19,1)
+DEF_TEST(19,2)
+DEF_TEST(19,4)
+DEF_TEST(19,8)
+DEF_TEST(19,16)
+DEF_TEST(20,1)
+DEF_TEST(20,2)
+DEF_TEST(20,4)
+DEF_TEST(20,8)
+DEF_TEST(20,16)
+DEF_TEST(21,1)
+DEF_TEST(21,2)
+DEF_TEST(21,4)
+DEF_TEST(21,8)
+DEF_TEST(21,16)
+DEF_TEST(22,1)
+DEF_TEST(22,2)
+DEF_TEST(22,4)
+DEF_TEST(22,8)
+DEF_TEST(22,16)
+DEF_TEST(23,1)
+DEF_TEST(23,2)
+DEF_TEST(23,4)
+DEF_TEST(23,8)
+DEF_TEST(23,16)
+DEF_TEST(24,1)
+DEF_TEST(24,2)
+DEF_TEST(24,4)
+DEF_TEST(24,8)
+DEF_TEST(24,16)
+DEF_TEST(25,1)
+DEF_TEST(25,2)
+DEF_TEST(25,4)
+DEF_TEST(25,8)
+DEF_TEST(25,16)
+DEF_TEST(26,1)
+DEF_TEST(26,2)
+DEF_TEST(26,4)
+DEF_TEST(26,8)
+DEF_TEST(26,16)
+DEF_TEST(27,1)
+DEF_TEST(27,2)
+DEF_TEST(27,4)
+DEF_TEST(27,8)
+DEF_TEST(27,16)
+DEF_TEST(28,1)
+DEF_TEST(28,2)
+DEF_TEST(28,4)
+DEF_TEST(28,8)
+DEF_TEST(28,16)
+DEF_TEST(29,1)
+DEF_TEST(29,2)
+DEF_TEST(29,4)
+DEF_TEST(29,8)
+DEF_TEST(29,16)
+DEF_TEST(30,1)
+DEF_TEST(30,2)
+DEF_TEST(30,4)
+DEF_TEST(30,8)
+DEF_TEST(30,16)
+DEF_TEST(31,1)
+DEF_TEST(31,2)
+DEF_TEST(31,4)
+DEF_TEST(31,8)
+DEF_TEST(31,16)
+DEF_TEST(32,1)
+DEF_TEST(32,2)
+DEF_TEST(32,4)
+DEF_TEST(32,8)
+DEF_TEST(32,16)
+DEF_TEST(33,1)
+DEF_TEST(33,2)
+DEF_TEST(33,4)
+DEF_TEST(33,8)
+DEF_TEST(33,16)
+DEF_TEST(34,1)
+DEF_TEST(34,2)
+DEF_TEST(34,4)
+DEF_TEST(34,8)
+DEF_TEST(34,16)
+DEF_TEST(35,1)
+DEF_TEST(35,2)
+DEF_TEST(35,4)
+DEF_TEST(35,8)
+DEF_TEST(35,16)
+DEF_TEST(36,1)
+DEF_TEST(36,2)
+DEF_TEST(36,4)
+DEF_TEST(36,8)
+DEF_TEST(36,16)
+DEF_TEST(37,1)
+DEF_TEST(37,2)
+DEF_TEST(37,4)
+DEF_TEST(37,8)
+DEF_TEST(37,16)
+DEF_TEST(38,1)
+DEF_TEST(38,2)
+DEF_TEST(38,4)
+DEF_TEST(38,8)
+DEF_TEST(38,16)
+DEF_TEST(39,1)
+DEF_TEST(39,2)
+DEF_TEST(39,4)
+DEF_TEST(39,8)
+DEF_TEST(39,16)
+DEF_TEST(40,1)
+DEF_TEST(40,2)
+DEF_TEST(40,4)
+DEF_TEST(40,8)
+DEF_TEST(40,16)
+DEF_TEST(41,1)
+DEF_TEST(41,2)
+DEF_TEST(41,4)
+DEF_TEST(41,8)
+DEF_TEST(41,16)
+DEF_TEST(42,1)
+DEF_TEST(42,2)
+DEF_TEST(42,4)
+DEF_TEST(42,8)
+DEF_TEST(42,16)
+DEF_TEST(43,1)
+DEF_TEST(43,2)
+DEF_TEST(43,4)
+DEF_TEST(43,8)
+DEF_TEST(43,16)
+DEF_TEST(44,1)
+DEF_TEST(44,2)
+DEF_TEST(44,4)
+DEF_TEST(44,8)
+DEF_TEST(44,16)
+DEF_TEST(45,1)
+DEF_TEST(45,2)
+DEF_TEST(45,4)
+DEF_TEST(45,8)
+DEF_TEST(45,16)
+DEF_TEST(46,1)
+DEF_TEST(46,2)
+DEF_TEST(46,4)
+DEF_TEST(46,8)
+DEF_TEST(46,16)
+DEF_TEST(47,1)
+DEF_TEST(47,2)
+DEF_TEST(47,4)
+DEF_TEST(47,8)
+DEF_TEST(47,16)
+DEF_TEST(48,1)
+DEF_TEST(48,2)
+DEF_TEST(48,4)
+DEF_TEST(48,8)
+DEF_TEST(48,16)
+DEF_TEST(49,1)
+DEF_TEST(49,2)
+DEF_TEST(49,4)
+DEF_TEST(49,8)
+DEF_TEST(49,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
+#else
+DEF_TEST(3,1)
+DEF_TEST(4,1)
+DEF_TEST(4,2)
+DEF_TEST(4,4)
+DEF_TEST(5,1)
+DEF_TEST(6,1)
+DEF_TEST(7,1)
+DEF_TEST(8,1)
+DEF_TEST(8,2)
+DEF_TEST(8,4)
+DEF_TEST(8,8)
+DEF_TEST(9,1)
+DEF_TEST(16,1)
+DEF_TEST(16,2)
+DEF_TEST(16,4)
+DEF_TEST(16,8)
+DEF_TEST(16,16)
+DEF_TEST(32,1)
+DEF_TEST(32,2)
+DEF_TEST(32,4)
+DEF_TEST(32,8)
+DEF_TEST(32,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
+#endif
+
+int
+main(int argc, char **argv)
+{
+
+#ifdef TEST_ALL
+  RUN_TEST(1,1);
+  RUN_TEST(1,2);
+  RUN_TEST(1,4);
+  RUN_TEST(1,8);
+  RUN_TEST(1,16);
+  RUN_TEST(2,1);
+  RUN_TEST(2,2);
+  RUN_TEST(2,4);
+  RUN_TEST(2,8);
+  RUN_TEST(2,16);
+  RUN_TEST(3,1);
+  RUN_TEST(3,2);
+  RUN_TEST(3,4);
+  RUN_TEST(3,8);
+  RUN_TEST(3,16);
+  RUN_TEST(4,1);
+  RUN_TEST(4,2);
+  RUN_TEST(4,4);
+  RUN_TEST(4,8);
+  RUN_TEST(4,16);
+  RUN_TEST(5,1);
+  RUN_TEST(5,2);
+  RUN_TEST(5,4);
+  RUN_TEST(5,8);
+  RUN_TEST(5,16);
+  RUN_TEST(6,1);
+  RUN_TEST(6,2);
+  RUN_TEST(6,4);
+  RUN_TEST(6,8);
+  RUN_TEST(6,16);
+  RUN_TEST(7,1);
+  RUN_TEST(7,2);
+  RUN_TEST(7,4);
+  RUN_TEST(7,8);
+  RUN_TEST(7,16);
+  RUN_TEST(8,1);
+  RUN_TEST(8,2);
+  RUN_TEST(8,4);
+  RUN_TEST(8,8);
+  RUN_TEST(8,16);
+  RUN_TEST(9,1);
+  RUN_TEST(9,2);
+  RUN_TEST(9,4);
+  RUN_TEST(9,8);
+  RUN_TEST(9,16);
+  RUN_TEST(10,1);
+  RUN_TEST(10,2);
+  RUN_TEST(10,4);
+  RUN_TEST(10,8);
+  RUN_TEST(10,16);
+  RUN_TEST(11,1);
+  RUN_TEST(11,2);
+  RUN_TEST(11,4);
+  RUN_TEST(11,8);
+  RUN_TEST(11,16);
+  RUN_TEST(12,1);
+  RUN_TEST(12,2);
+  RUN_TEST(12,4);
+  RUN_TEST(12,8);
+  RUN_TEST(12,16);
+  RUN_TEST(13,1);
+  RUN_TEST(13,2);
+  RUN_TEST(13,4);
+  RUN_TEST(13,8);
+  RUN_TEST(13,16);
+  RUN_TEST(14,1);
+  RUN_TEST(14,2);
+  RUN_TEST(14,4);
+  RUN_TEST(14,8);
+  RUN_TEST(14,16);
+  RUN_TEST(15,1);
+  RUN_TEST(15,2);
+  RUN_TEST(15,4);
+  RUN_TEST(15,8);
+  RUN_TEST(15,16);
+  RUN_TEST(16,1);
+  RUN_TEST(16,2);
+  RUN_TEST(16,4);
+  RUN_TEST(16,8);
+  RUN_TEST(16,16);
+  RUN_TEST(17,1);
+  RUN_TEST(17,2);
+  RUN_TEST(17,4);
+  RUN_TEST(17,8);
+  RUN_TEST(17,16);
+  RUN_TEST(18,1);
+  RUN_TEST(18,2);
+  RUN_TEST(18,4);
+  RUN_TEST(18,8);
+  RUN_TEST(18,16);
+  RUN_TEST(19,1);
+  RUN_TEST(19,2);
+  RUN_TEST(19,4);
+  RUN_TEST(19,8);
+  RUN_TEST(19,16);
+  RUN_TEST(20,1);
+  RUN_TEST(20,2);
+  RUN_TEST(20,4);
+  RUN_TEST(20,8);
+  RUN_TEST(20,16);
+  RUN_TEST(21,1);
+  RUN_TEST(21,2);
+  RUN_TEST(21,4);
+  RUN_TEST(21,8);
+  RUN_TEST(21,16);
+  RUN_TEST(22,1);
+  RUN_TEST(22,2);
+  RUN_TEST(22,4);
+  RUN_TEST(22,8);
+  RUN_TEST(22,16);
+  RUN_TEST(23,1);
+  RUN_TEST(23,2);
+  RUN_TEST(23,4);
+  RUN_TEST(23,8);
+  RUN_TEST(23,16);
+  RUN_TEST(24,1);
+  RUN_TEST(24,2);
+  RUN_TEST(24,4);
+  RUN_TEST(24,8);
+  RUN_TEST(24,16);
+  RUN_TEST(25,1);
+  RUN_TEST(25,2);
+  RUN_TEST(25,4);
+  RUN_TEST(25,8);
+  RUN_TEST(25,16);
+  RUN_TEST(26,1);
+  RUN_TEST(26,2);
+  RUN_TEST(26,4);
+  RUN_TEST(26,8);
+  RUN_TEST(26,16);
+  RUN_TEST(27,1);
+  RUN_TEST(27,2);
+  RUN_TEST(27,4);
+  RUN_TEST(27,8);
+  RUN_TEST(27,16);
+  RUN_TEST(28,1);
+  RUN_TEST(28,2);
+  RUN_TEST(28,4);
+  RUN_TEST(28,8);
+  RUN_TEST(28,16);
+  RUN_TEST(29,1);
+  RUN_TEST(29,2);
+  RUN_TEST(29,4);
+  RUN_TEST(29,8);
+  RUN_TEST(29,16);
+  RUN_TEST(30,1);
+  RUN_TEST(30,2);
+  RUN_TEST(30,4);
+  RUN_TEST(30,8);
+  RUN_TEST(30,16);
+  RUN_TEST(31,1);
+  RUN_TEST(31,2);
+  RUN_TEST(31,4);
+  RUN_TEST(31,8);
+  RUN_TEST(31,16);
+  RUN_TEST(32,1);
+  RUN_TEST(32,2);
+  RUN_TEST(32,4);
+  RUN_TEST(32,8);
+  RUN_TEST(32,16);
+  RUN_TEST(33,1);
+  RUN_TEST(33,2);
+  RUN_TEST(33,4);
+  RUN_TEST(33,8);
+  RUN_TEST(33,16);
+  RUN_TEST(34,1);
+  RUN_TEST(34,2);
+  RUN_TEST(34,4);
+  RUN_TEST(34,8);
+  RUN_TEST(34,16);
+  RUN_TEST(35,1);
+  RUN_TEST(35,2);
+  RUN_TEST(35,4);
+  RUN_TEST(35,8);
+  RUN_TEST(35,16);
+  RUN_TEST(36,1);
+  RUN_TEST(36,2);
+  RUN_TEST(36,4);
+  RUN_TEST(36,8);
+  RUN_TEST(36,16);
+  RUN_TEST(37,1);
+  RUN_TEST(37,2);
+  RUN_TEST(37,4);
+  RUN_TEST(37,8);
+  RUN_TEST(37,16);
+  RUN_TEST(38,1);
+  RUN_TEST(38,2);
+  RUN_TEST(38,4);
+  RUN_TEST(38,8);
+  RUN_TEST(38,16);
+  RUN_TEST(39,1);
+  RUN_TEST(39,2);
+  RUN_TEST(39,4);
+  RUN_TEST(39,8);
+  RUN_TEST(39,16);
+  RUN_TEST(40,1);
+  RUN_TEST(40,2);
+  RUN_TEST(40,4);
+  RUN_TEST(40,8);
+  RUN_TEST(40,16);
+  RUN_TEST(41,1);
+  RUN_TEST(41,2);
+  RUN_TEST(41,4);
+  RUN_TEST(41,8);
+  RUN_TEST(41,16);
+  RUN_TEST(42,1);
+  RUN_TEST(42,2);
+  RUN_TEST(42,4);
+  RUN_TEST(42,8);
+  RUN_TEST(42,16);
+  RUN_TEST(43,1);
+  RUN_TEST(43,2);
+  RUN_TEST(43,4);
+  RUN_TEST(43,8);
+  RUN_TEST(43,16);
+  RUN_TEST(44,1);
+  RUN_TEST(44,2);
+  RUN_TEST(44,4);
+  RUN_TEST(44,8);
+  RUN_TEST(44,16);
+  RUN_TEST(45,1);
+  RUN_TEST(45,2);
+  RUN_TEST(45,4);
+  RUN_TEST(45,8);
+  RUN_TEST(45,16);
+  RUN_TEST(46,1);
+  RUN_TEST(46,2);
+  RUN_TEST(46,4);
+  RUN_TEST(46,8);
+  RUN_TEST(46,16);
+  RUN_TEST(47,1);
+  RUN_TEST(47,2);
+  RUN_TEST(47,4);
+  RUN_TEST(47,8);
+  RUN_TEST(47,16);
+  RUN_TEST(48,1);
+  RUN_TEST(48,2);
+  RUN_TEST(48,4);
+  RUN_TEST(48,8);
+  RUN_TEST(48,16);
+  RUN_TEST(49,1);
+  RUN_TEST(49,2);
+  RUN_TEST(49,4);
+  RUN_TEST(49,8);
+  RUN_TEST(49,16);
+#else
+  RUN_TEST(3,1);
+  RUN_TEST(4,1);
+  RUN_TEST(4,2);
+  RUN_TEST(4,4);
+  RUN_TEST(5,1);
+  RUN_TEST(6,1);
+  RUN_TEST(7,1);
+  RUN_TEST(8,1);
+  RUN_TEST(8,2);
+  RUN_TEST(8,4);
+  RUN_TEST(8,8);
+  RUN_TEST(9,1);
+  RUN_TEST(16,1);
+  RUN_TEST(16,2);
+  RUN_TEST(16,4);
+  RUN_TEST(16,8);
+  RUN_TEST(16,16);
+  RUN_TEST(32,1);
+  RUN_TEST(32,2);
+  RUN_TEST(32,4);
+  RUN_TEST(32,8);
+  RUN_TEST(32,16);
+#endif
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/strncmp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/strncmp-1.c	(revision 244503)
+++ gcc/testsuite/gcc.dg/strncmp-1.c	(working copy)
@@ -1,4 +1,4 @@
-/* Test memcmp builtin expansion for compilation and proper execution.  */
+/* Test strncmp builtin expansion for compilation and proper execution.  */
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-require-effective-target ptr32plus } */
@@ -32,31 +32,32 @@
       a[SZ] = 0;							    \
       b[SZ] = 0;					   		    \
       if (!((r1 = strncmp (b, a, SZ)) > 0))   		   		    \
-        {								    \
-	  abort ();							    \
-	}								    \
+	abort ();							    \
       if (!((r1 = strncmp (a, b, SZ)) < 0))			   	    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       b[i] = '1';					   		    \
       if (!((r1 = strncmp (a, b, SZ)) == 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       for(j = i; j < SZ ; j++)			   		            \
 	{						   		    \
 	  a[j] = '1';            			   		    \
 	  b[j] = '2';			                   		    \
 	}						   		    \
-      if (!((r1 = strncmp(b, a, SZ)) > 0))		   		    \
+      if (!((r1 = strncmp (b, a, SZ)) > 0))		   		    \
+	abort ();							    \
+      if (!((r1 = strncmp (a, b, SZ)) < 0))		   		    \
+	abort ();							    \
+      for(j = 0; j < i ; j++)						    \
         {								    \
-	  abort ();							    \
-	}             							    \
-      if (!((r1 = strncmp(a, b, SZ)) < 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}		   						    \
+	  memset(a, '-', SZ);						    \
+	  memset(b, '-', SZ);						    \
+	  a[j] = '\0';							    \
+	  a[j+1] = '1';							    \
+	  b[j] = '\0';							    \
+	  b[j+1] = '2';							    \
+	  if ((r1 = strncmp (b, a, SZ)) != 0)				    \
+	    abort ();							    \
+	}                                                                   \
       a = three + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
       b = four + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
       memset(a, '-', SZ);					   	    \
@@ -66,18 +67,12 @@
       a[SZ] = 0;							    \
       b[SZ] = 0;					   		    \
       if (!((r1 = strncmp(b, a, SZ)) > 0))   		   		    \
-        {								    \
-	  abort ();							    \
-	}								    \
+	abort ();							    \
       if (!((r1 = strncmp(a, b, SZ)) < 0))			   	    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       b[i] = '1';					   		    \
       if (!((r1 = strncmp(a, b, SZ)) == 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
     }							                    \
 }                                                                
 
@@ -327,6 +322,11 @@
 DEF_TEST(49,4)
 DEF_TEST(49,8)
 DEF_TEST(49,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
 #else
 DEF_TEST(3,1)
 DEF_TEST(4,1)
@@ -350,6 +350,11 @@
 DEF_TEST(32,4)
 DEF_TEST(32,8)
 DEF_TEST(32,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
 #endif
 
 int

Segher Boessenkool Jan. 16, 2017, 11 p.m. | #2
On Mon, Jan 16, 2017 at 03:09:35PM -0600, Aaron Sawdey wrote:
> Tulio noted that glibc's strncmp test was failing. This turned out to

> be the use of signed HOST_WIDE_INT for handling strncmp length. The

> glibc test calls strncmp with length 2^64-1, presumably to provoke

> exactly this type of bug. Fixing the issue required changing

> select_block_compare_mode() and expand_block_compare() as well.

> 

> The other change is if we must emit a runtime check for the 4k

> crossing, then we might as well set base_align to 8 and emit the best

> possible code.


Some nits...

> --- gcc/config/rs6000/rs6000.c	(revision 244503)

> +++ gcc/config/rs6000/rs6000.c	(working copy)

> @@ -19310,7 +19310,8 @@

>     WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is

>     the largest allowable mode.  */

>  static machine_mode

> -select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,

> +select_block_compare_mode (unsigned HOST_WIDE_INT offset,

> +			   unsigned HOST_WIDE_INT bytes,

>  			   HOST_WIDE_INT align, bool word_mode_ok)


"align" should probably be unsigned as well?

> @@ -19410,8 +19411,8 @@

>    gcc_assert (GET_MODE (target) == SImode);

>  

>    /* Anything to move? */

> -  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);

> -  if (bytes <= 0)

> +  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);

> +  if (bytes == 0)

>      return true;


UINTVAL?  Please check the rest of the patch for this, too.

>    /* We don't want to generate too much code.  */

>    if (ROUND_UP (bytes, load_mode_size) / load_mode_size

> -      > rs6000_block_compare_inline_limit)

> +      > (unsigned HOST_WIDE_INT)rs6000_block_compare_inline_limit)

>      return false;


Space after cast operator.  Why do you need a cast at all?  It already
is unsigned.

> +	  /* -m32 -mpowerpc64 results in word_mode being DImode even

> +	     though otherwise it is 32-bit. The length arg to strncmp

> +	     is a size_t which will be the same size as pointers.  */

> +	  rtx len_rtx;

> +	  if (TARGET_64BIT)

> +	    len_rtx = gen_reg_rtx(DImode);

> +	  else

> +	    len_rtx = gen_reg_rtx(SImode);


Space before opening paren in function calls.

> +  while (bytes_to_compare > 0)

>      {

>        /* Compare sequence:

>           check each 8B with: ld/ld cmpd bne

> +	 If equal, use rldicr/cmpb to check for zero byte.

>           cleanup code at end:

>           cmpb          get byte that differs

>           cmpb          look for zero byte


Mixed spaces/tabs indent.

> @@ -19919,37 +19968,45 @@

>  	    }

>  	}

>  

> -      int remain = bytes - cmp_bytes;

> +      /* Cases to handle.  A and B are chunks of the two strings.

> +         1: Not end of comparison:

> +	   A != B: branch to cleanup code to compute result.

> +           A == B: check for 0 byte, next block if not found.

> +         2: End of the inline comparison:

> +	   A != B: branch to cleanup code to compute result.

> +           A == B: check for 0 byte, call strcmp/strncmp

> +	 3: compared requested N bytes:

> +	   A == B: branch to result 0.

> +           A != B: cleanup code to compute result.  */


And here.

> @@ -19957,10 +20014,102 @@

>        JUMP_LABEL (j) = dst_label;

>        LABEL_NUSES (dst_label) += 1;

>  

> +      if (remain > 0 || equality_compare_rest)

> +	{

> +	  /* Generate a cmpb to test for a 0 byte and branch

> +	     to final result if found.  */

> +	  rtx cmpb_zero = gen_reg_rtx (word_mode);

> +	  rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label);

> +	  rtx condz = gen_reg_rtx (CCmode);

> +	  rtx zero_reg = gen_reg_rtx (word_mode);

> +	  if (word_mode == SImode)

> +	    {

> +	      emit_insn (gen_movsi (zero_reg, GEN_INT(0)));


Space before (0).

> +	      emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));

> +	      if ( cmp_bytes < word_mode_size )


No spaces inside the parens.

> +		{ /* Don't want to look at zero bytes past end.  */


Put the comment on a separate line please.

> +	      emit_insn (gen_movdi (zero_reg, GEN_INT(0)));

> +	      emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));

> +	      if ( cmp_bytes < word_mode_size )

> +		{ /* Don't want to look at zero bytes past end.  */


Again and again and again :-)

Looks good otherwise.


Segher
Peter Bergner Jan. 17, 2017, 2:30 p.m. | #3
On 1/16/17 3:09 PM, Aaron Sawdey wrote:
> Here is an updated version of this patch.

>

> Tulio noted that glibc's strncmp test was failing. This turned out to

> be the use of signed HOST_WIDE_INT for handling strncmp length. The

> glibc test calls strncmp with length 2^64-1, presumably to provoke

> exactly this type of bug. Fixing the issue required changing

> select_block_compare_mode() and expand_block_compare() as well.


If glibc's testsuite exposed a bug, then we should also add a similar
bug to our testsuite.  I scanned the patch and I'm not sure I see
that exact test scenario.  Is it there and I'm just not seeing it?

Peter
Aaron Sawdey Jan. 17, 2017, 3:34 p.m. | #4
On Tue, 2017-01-17 at 08:30 -0600, Peter Bergner wrote:
> On 1/16/17 3:09 PM, Aaron Sawdey wrote:

> > Here is an updated version of this patch.

> > 

> > Tulio noted that glibc's strncmp test was failing. This turned out

> > to

> > be the use of signed HOST_WIDE_INT for handling strncmp length. The

> > glibc test calls strncmp with length 2^64-1, presumably to provoke

> > exactly this type of bug. Fixing the issue required changing

> > select_block_compare_mode() and expand_block_compare() as well.

> 

> If glibc's testsuite exposed a bug, then we should also add a similar

> bug to our testsuite.  I scanned the patch and I'm not sure I see

> that exact test scenario.  Is it there and I'm just not seeing it?

> 

> Peter

> 


Nope, you didn't miss it, Peter. I will add such a test as a separate
patch, this one has dragged on for a long time. I have another more
comprehensive test case for strcmp/strncmp I want to add anyway.

Aaron

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Patch hide | download patch | download mbox

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 244322)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,7 +78,7 @@ 
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
-extern bool expand_strn_compare (rtx[]);
+extern bool expand_strn_compare (rtx[], int);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 244322)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19635,22 +19635,36 @@ 
    OPERANDS[0] is the target (result).
    OPERANDS[1] is the first source.
    OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
    OPERANDS[3] is the length.
-   OPERANDS[4] is the alignment in bytes.  */
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.  */
 bool
-expand_strn_compare (rtx operands[])
+expand_strn_compare (rtx operands[], int no_length)
 {
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
+  rtx bytes_rtx, align_rtx;
+  if (no_length)
+    {
+      bytes_rtx = NULL;
+      align_rtx = operands[3];
+    }
+  else
+    {
+      bytes_rtx = operands[3];
+      align_rtx = operands[4];
+    }
   HOST_WIDE_INT cmp_bytes = 0;
   rtx src1 = orig_src1;
   rtx src2 = orig_src2;
 
-  /* If this is not a fixed size compare, just call strncmp.  */
-  if (!CONST_INT_P (bytes_rtx))
+  /* If we have a length, it must be constant. This simplifies things
+     a bit as we don't have to generate code to check if we've exceeded
+     the length. Later this could be expanded to handle this case.  */
+  if (!no_length && !CONST_INT_P (bytes_rtx))
     return false;
 
   /* This must be a fixed size alignment.  */
@@ -19668,8 +19682,6 @@ 
 
   gcc_assert (GET_MODE (target) == SImode);
 
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-
   /* If we have an LE target without ldbrx and word_mode is DImode,
      then we must avoid using word_mode.  */
   int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
@@ -19678,16 +19690,37 @@ 
   int word_mode_size = GET_MODE_SIZE (word_mode);
 
   int offset = 0;
+  HOST_WIDE_INT bytes; /* N from the strncmp args if available.  */
+  HOST_WIDE_INT compare_length; /* How much we are going to compare inline.  */
+  if (no_length)
+    /* Use this as a standin to determine the mode to use.  */
+    bytes = rs6000_string_compare_inline_limit * word_mode_size;
+  else
+    bytes = INTVAL (bytes_rtx);
+
   machine_mode load_mode =
     select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
   int load_mode_size = GET_MODE_SIZE (load_mode);
+  compare_length = rs6000_string_compare_inline_limit * load_mode_size;
 
-  /* We don't want to generate too much code.  Also if bytes is
-     4096 or larger we always want the library strncmp anyway.  */
-  int groups = ROUND_UP (bytes, load_mode_size) / load_mode_size;
-  if (bytes >= 4096 || groups > rs6000_string_compare_inline_limit)
-    return false;
+  /* If we have equality at the end of the last compare and we have not
+     found the end of the string, we need to call strcmp/strncmp to
+     compare the remainder.  */
+  bool equality_compare_rest = false;
 
+  if (no_length)
+    {
+      bytes = compare_length;
+      equality_compare_rest = true;
+    }
+  else
+    {
+      if (bytes <= compare_length)
+	compare_length = bytes;
+      else
+	equality_compare_rest = true;
+    }
+
   rtx result_reg = gen_reg_rtx (word_mode);
   rtx final_move_label = gen_label_rtx ();
   rtx final_label = gen_label_rtx ();
@@ -19706,9 +19739,9 @@ 
 	 bgt	cr7,L(pagecross) */
 
       if (align1 < 8)
-	expand_strncmp_align_check (strncmp_label, src1, bytes);
+	expand_strncmp_align_check (strncmp_label, src1, compare_length);
       if (align2 < 8)
-	expand_strncmp_align_check (strncmp_label, src2, bytes);
+	expand_strncmp_align_check (strncmp_label, src2, compare_length);
 
       /* Now generate the following sequence:
 	 - branch to begin_compare
@@ -19737,22 +19770,30 @@ 
 	  src2 = replace_equiv_address (src2, src2_reg);
 	}
 
-      /* -m32 -mpowerpc64 results in word_mode being DImode even
-	 though otherwise it is 32-bit. The length arg to strncmp
-	 is a size_t which will be the same size as pointers.  */
-      rtx len_rtx;
-      if (TARGET_64BIT)
-	len_rtx = gen_reg_rtx(DImode);
+      if (no_length)
+	emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
+				 target, LCT_NORMAL, GET_MODE (target), 2,
+				 force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				 force_reg (Pmode, XEXP (src2, 0)), Pmode);
       else
-	len_rtx = gen_reg_rtx(SImode);
+	{
+	  /* -m32 -mpowerpc64 results in word_mode being DImode even
+	     though otherwise it is 32-bit. The length arg to strncmp
+	     is a size_t which will be the same size as pointers.  */
+	  rtx len_rtx;
+	  if (TARGET_64BIT)
+	    len_rtx = gen_reg_rtx(DImode);
+	  else
+	    len_rtx = gen_reg_rtx(SImode);
 
-      emit_move_insn (len_rtx, bytes_rtx);
+	  emit_move_insn (len_rtx, bytes_rtx);
 
-      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
-			       target, LCT_NORMAL, GET_MODE (target), 3,
-			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
-			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
-			       len_rtx, GET_MODE (len_rtx));
+	  emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+				   target, LCT_NORMAL, GET_MODE (target), 3,
+				   force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				   force_reg (Pmode, XEXP (src2, 0)), Pmode,
+				   len_rtx, GET_MODE (len_rtx));
+	}
 
       rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
       jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
@@ -19768,10 +19809,12 @@ 
 
   /* Generate sequence of ld/ldbrx, cmpb to compare out
      to the length specified.  */
-  while (bytes > 0)
+  HOST_WIDE_INT bytes_to_compare = compare_length;
+  while (bytes_to_compare > 0)
     {
       /* Compare sequence:
          check each 8B with: ld/ld cmpd bne
+	 If equal, use rldicr/cmpb to check for zero byte.
          cleanup code at end:
          cmpb          get byte that differs
          cmpb          look for zero byte
@@ -19785,24 +19828,25 @@ 
          result is zero because the strings are exactly equal.  */
       int align = compute_current_alignment (base_align, offset);
       if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
-	load_mode = select_block_compare_mode (offset, bytes, align,
+	load_mode = select_block_compare_mode (offset, bytes_to_compare, align,
 					       word_mode_ok);
       else
-	load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok);
+	load_mode = select_block_compare_mode (0, bytes_to_compare, align,
+					       word_mode_ok);
       load_mode_size = GET_MODE_SIZE (load_mode);
-      if (bytes >= load_mode_size)
+      if (bytes_to_compare >= load_mode_size)
 	cmp_bytes = load_mode_size;
       else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
 	{
 	  /* Move this load back so it doesn't go past the end.
 	     P8/P9 can do this efficiently.  */
-	  int extra_bytes = load_mode_size - bytes;
-	  cmp_bytes = bytes;
+	  int extra_bytes = load_mode_size - bytes_to_compare;
+	  cmp_bytes = bytes_to_compare;
 	  if (extra_bytes < offset)
 	    {
 	      offset -= extra_bytes;
 	      cmp_bytes = load_mode_size;
-	      bytes = cmp_bytes;
+	      bytes_to_compare = cmp_bytes;
 	    }
 	}
       else
@@ -19809,7 +19853,7 @@ 
 	/* P7 and earlier can't do the overlapping load trick fast,
 	   so this forces a non-overlapping load and a shift to get
 	   rid of the extra bytes.  */
-	cmp_bytes = bytes;
+	cmp_bytes = bytes_to_compare;
 
       src1 = adjust_address (orig_src1, load_mode, offset);
       src2 = adjust_address (orig_src2, load_mode, offset);
@@ -19872,37 +19916,45 @@ 
 	    }
 	}
 
-      int remain = bytes - cmp_bytes;
+      /* Cases to handle.  A and B are chunks of the two strings.
+         1: Not end of comparison:
+	   A != B: branch to cleanup code to compute result.
+           A == B: check for 0 byte, next block if not found.
+         2: End of the inline comparison:
+	   A != B: branch to cleanup code to compute result.
+           A == B: check for 0 byte, call strcmp/strncmp
+	 3: compared requested N bytes:
+	   A == B: branch to result 0.
+           A != B: cleanup code to compute result.  */
 
+      int remain = bytes_to_compare - cmp_bytes;
+
       rtx dst_label;
-      if (remain > 0)
+      if (remain > 0 || equality_compare_rest)
 	{
+	  /* Branch to cleanup code, otherwise fall through to do
+	     more compares.  */
 	  if (!cleanup_label)
 	    cleanup_label = gen_label_rtx ();
 	  dst_label = cleanup_label;
 	}
       else
+	/* Branch to end and produce result of 0.  */
 	dst_label = final_move_label;
 
       rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label);
       rtx cond = gen_reg_rtx (CCmode);
 
-      if (remain == 0)
-	{
-	  /* For the last chunk, subf. also
-	     generates the zero result we need.  */
-	  rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
-	  rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
-	}
-      else
-	emit_move_insn (cond, gen_rtx_COMPARE (CCmode,
-					       tmp_reg_src1, tmp_reg_src2));
+      /* Always produce the 0 result, it is needed if
+	 cmpb finds a 0 byte in this chunk.  */
+      rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
+      rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
 
       rtx cmp_rtx;
-      if (remain > 0)
+      if (remain == 0 && !equality_compare_rest)
+	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
+      else
 	cmp_rtx = gen_rtx_NE (VOIDmode, cond, const0_rtx);
-      else
-	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
 
       rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
 					 lab_ref, pc_rtx);
@@ -19910,10 +19962,104 @@ 
       JUMP_LABEL (j) = dst_label;
       LABEL_NUSES (dst_label) += 1;
 
+      if (remain > 0 || equality_compare_rest)
+	{
+	  /* Generate a cmpb to test for a 0 byte and branch
+	     to final result if found.  */
+	  rtx cmpb_zero = gen_reg_rtx (word_mode);
+	  rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label);
+	  rtx condz = gen_reg_rtx (CCmode);
+	  rtx zero_reg = gen_reg_rtx (word_mode);
+	  if (word_mode == SImode)
+	    {
+	      emit_insn (gen_movsi (zero_reg, GEN_INT(0)));
+	      emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+	      if ( cmp_bytes < word_mode_size )
+		{ /* Don't want to look at zero bytes past end.  */
+		  HOST_WIDE_INT mb =
+		    BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+		  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);
+		  emit_insn (gen_andsi3_mask (cmpb_zero, cmpb_zero, mask));
+		}
+	    }
+	  else
+	    {
+	      emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
+	      emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+	      if ( cmp_bytes < word_mode_size )
+		{ /* Don't want to look at zero bytes past end.  */
+		  HOST_WIDE_INT mb =
+		    BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+		  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);
+		  emit_insn (gen_anddi3_mask (cmpb_zero, cmpb_zero, mask));
+		}
+	    }
+
+	  emit_move_insn (condz, gen_rtx_COMPARE (CCmode, cmpb_zero, zero_reg));
+	  rtx cmpnz_rtx = gen_rtx_NE (VOIDmode, condz, const0_rtx);
+	  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmpnz_rtx,
+					     lab_ref_fin, pc_rtx);
+	  rtx j2 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+	  JUMP_LABEL (j2) = final_move_label;
+	  LABEL_NUSES (final_move_label) += 1;
+
+	}
+
       offset += cmp_bytes;
-      bytes -= cmp_bytes;
+      bytes_to_compare -= cmp_bytes;
     }
 
+  if (equality_compare_rest)
+    {
+      /* Update pointers past what has been compared already.  */
+      src1 = adjust_address (orig_src1, load_mode, offset);
+      src2 = adjust_address (orig_src2, load_mode, offset);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+      set_mem_size (src1, cmp_bytes);
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+      set_mem_size (src2, cmp_bytes);
+
+      /* Construct call to strcmp/strncmp to compare the rest of the string.  */
+      if (no_length)
+	emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
+				 target, LCT_NORMAL, GET_MODE (target), 2,
+				 force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				 force_reg (Pmode, XEXP (src2, 0)), Pmode);
+      else
+	{
+	  rtx len_rtx;
+	  if (TARGET_64BIT)
+	    len_rtx = gen_reg_rtx(DImode);
+	  else
+	    len_rtx = gen_reg_rtx(SImode);
+
+	  rtx len_remain = gen_rtx_MINUS (GET_MODE (len_rtx),
+					  bytes_rtx, GEN_INT (bytes_to_compare));
+	  emit_move_insn (len_rtx, len_remain);
+	  emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+				   target, LCT_NORMAL, GET_MODE (target), 3,
+				   force_reg (Pmode, XEXP (src1, 0)), Pmode,
+				   force_reg (Pmode, XEXP (src2, 0)), Pmode,
+				   len_rtx, GET_MODE (len_rtx));
+	}
+
+      rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
+      rtx jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
+      JUMP_LABEL (jmp) = final_label;
+      LABEL_NUSES (final_label) += 1;
+      emit_barrier ();
+    }
+
   if (cleanup_label)
     emit_label (cleanup_label);
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 244322)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -9106,12 +9106,31 @@ 
 	      (use (match_operand:SI 4))])]
   "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
 {
-  if (expand_strn_compare (operands))
+  if (expand_strn_compare (operands, 0))
     DONE;
   else	
     FAIL;
 })
 
+;; String compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the destination
+;; Argument 2 is the source
+;; Argument 3 is the alignment
+
+(define_expand "cmpstrsi"
+  [(parallel [(set (match_operand:SI 0)
+               (compare:SI (match_operand:BLK 1)
+                           (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))])]
+  "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
+{
+  if (expand_strn_compare (operands, 1))
+    DONE;
+  else	
+    FAIL;
+})
+
 ;; Block compare insn.
 ;; Argument 0 is the target (result)
 ;; Argument 1 is the destination
Index: gcc/testsuite/gcc.dg/strcmp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/strcmp-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/strcmp-1.c	(working copy)
@@ -0,0 +1,635 @@ 
+/* Test strcmp builtin expansion for compilation and proper execution.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target ptr32plus } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define RUN_TEST(SZ, ALIGN) test_strcmp_ ## SZ ## _ ## ALIGN ()
+
+#define DEF_TEST(SZ, ALIGN)                                                 \
+static void test_strcmp_ ## SZ ## _ ## ALIGN (void) {     		    \
+  char one[3 * (SZ > 10 ? SZ : 10)];					    \
+  char two[3 * (SZ > 10 ? SZ : 10)];					    \
+  char three[8192] __attribute__ ((aligned (4096)));       		    \
+  char four[8192] __attribute__ ((aligned (4096)));        		    \
+  int i,j;                                                                  \
+  memset(one,0,sizeof(one));				   		    \
+  memset(two,0,sizeof(two));				   		    \
+  memset(three,0,sizeof(three));			   		    \
+  memset(four,0,sizeof(four));				   		    \
+  for (i = 0 ; i < SZ ; i++)			   		            \
+    {							   		    \
+      int r1;					           		    \
+      char *a = one + (i & 1) * ALIGN;			   		    \
+      char *b = two + (i & 1) * ALIGN;			   		    \
+      memset(a, '-', SZ);					   	    \
+      memset(b, '-', SZ);					   	    \
+      a[i] = '1';					   		    \
+      b[i] = '2';					   		    \
+      a[SZ] = 0;							    \
+      b[SZ] = 0;					   		    \
+      if (!((r1 = strcmp (b, a)) > 0))   		   		    \
+	abort ();							    \
+      if (!((r1 = strcmp (a, b)) < 0))			   	            \
+	abort ();							    \
+      b[i] = '1';					   		    \
+      if (!((r1 = strcmp (a, b)) == 0))		   		            \
+	abort ();							    \
+      for(j = i; j < SZ ; j++)			   		            \
+	{						   		    \
+	  a[j] = '1';            			   		    \
+	  b[j] = '2';			                   		    \
+	}						   		    \
+      if (!((r1 = strcmp (b, a)) > 0))		   		            \
+	abort ();							    \
+      if (!((r1 = strcmp (a, b)) < 0))		   		            \
+	abort ();							    \
+      for(j = 0; j < i ; j++)						    \
+        {								    \
+	  memset(a, '-', SZ);						    \
+	  memset(b, '-', SZ);						    \
+	  a[j] = '\0';							    \
+	  a[j+1] = '1';							    \
+	  b[j] = '\0';							    \
+	  b[j+1] = '2';							    \
+	  if ((r1 = strcmp (b, a)) != 0)				    \
+	    abort ();							    \
+	}                                                                   \
+      a = three + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
+      b = four + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
+      memset(a, '-', SZ);					   	    \
+      memset(b, '-', SZ);					   	    \
+      a[i] = '1';					   		    \
+      b[i] = '2';					   		    \
+      a[SZ] = 0;							    \
+      b[SZ] = 0;					   		    \
+      if (!((r1 = strcmp(b, a)) > 0))   		   		    \
+	abort ();							    \
+      if (!((r1 = strcmp(a, b)) < 0))			   	            \
+	abort ();							    \
+      b[i] = '1';					   		    \
+      if (!((r1 = strcmp(a, b)) == 0))		   		            \
+	abort ();							    \
+    }							                    \
+}                                                                
+
+#ifdef TEST_ALL
+DEF_TEST(1,1)
+DEF_TEST(1,2)
+DEF_TEST(1,4)
+DEF_TEST(1,8)
+DEF_TEST(1,16)
+DEF_TEST(2,1)
+DEF_TEST(2,2)
+DEF_TEST(2,4)
+DEF_TEST(2,8)
+DEF_TEST(2,16)
+DEF_TEST(3,1)
+DEF_TEST(3,2)
+DEF_TEST(3,4)
+DEF_TEST(3,8)
+DEF_TEST(3,16)
+DEF_TEST(4,1)
+DEF_TEST(4,2)
+DEF_TEST(4,4)
+DEF_TEST(4,8)
+DEF_TEST(4,16)
+DEF_TEST(5,1)
+DEF_TEST(5,2)
+DEF_TEST(5,4)
+DEF_TEST(5,8)
+DEF_TEST(5,16)
+DEF_TEST(6,1)
+DEF_TEST(6,2)
+DEF_TEST(6,4)
+DEF_TEST(6,8)
+DEF_TEST(6,16)
+DEF_TEST(7,1)
+DEF_TEST(7,2)
+DEF_TEST(7,4)
+DEF_TEST(7,8)
+DEF_TEST(7,16)
+DEF_TEST(8,1)
+DEF_TEST(8,2)
+DEF_TEST(8,4)
+DEF_TEST(8,8)
+DEF_TEST(8,16)
+DEF_TEST(9,1)
+DEF_TEST(9,2)
+DEF_TEST(9,4)
+DEF_TEST(9,8)
+DEF_TEST(9,16)
+DEF_TEST(10,1)
+DEF_TEST(10,2)
+DEF_TEST(10,4)
+DEF_TEST(10,8)
+DEF_TEST(10,16)
+DEF_TEST(11,1)
+DEF_TEST(11,2)
+DEF_TEST(11,4)
+DEF_TEST(11,8)
+DEF_TEST(11,16)
+DEF_TEST(12,1)
+DEF_TEST(12,2)
+DEF_TEST(12,4)
+DEF_TEST(12,8)
+DEF_TEST(12,16)
+DEF_TEST(13,1)
+DEF_TEST(13,2)
+DEF_TEST(13,4)
+DEF_TEST(13,8)
+DEF_TEST(13,16)
+DEF_TEST(14,1)
+DEF_TEST(14,2)
+DEF_TEST(14,4)
+DEF_TEST(14,8)
+DEF_TEST(14,16)
+DEF_TEST(15,1)
+DEF_TEST(15,2)
+DEF_TEST(15,4)
+DEF_TEST(15,8)
+DEF_TEST(15,16)
+DEF_TEST(16,1)
+DEF_TEST(16,2)
+DEF_TEST(16,4)
+DEF_TEST(16,8)
+DEF_TEST(16,16)
+DEF_TEST(17,1)
+DEF_TEST(17,2)
+DEF_TEST(17,4)
+DEF_TEST(17,8)
+DEF_TEST(17,16)
+DEF_TEST(18,1)
+DEF_TEST(18,2)
+DEF_TEST(18,4)
+DEF_TEST(18,8)
+DEF_TEST(18,16)
+DEF_TEST(19,1)
+DEF_TEST(19,2)
+DEF_TEST(19,4)
+DEF_TEST(19,8)
+DEF_TEST(19,16)
+DEF_TEST(20,1)
+DEF_TEST(20,2)
+DEF_TEST(20,4)
+DEF_TEST(20,8)
+DEF_TEST(20,16)
+DEF_TEST(21,1)
+DEF_TEST(21,2)
+DEF_TEST(21,4)
+DEF_TEST(21,8)
+DEF_TEST(21,16)
+DEF_TEST(22,1)
+DEF_TEST(22,2)
+DEF_TEST(22,4)
+DEF_TEST(22,8)
+DEF_TEST(22,16)
+DEF_TEST(23,1)
+DEF_TEST(23,2)
+DEF_TEST(23,4)
+DEF_TEST(23,8)
+DEF_TEST(23,16)
+DEF_TEST(24,1)
+DEF_TEST(24,2)
+DEF_TEST(24,4)
+DEF_TEST(24,8)
+DEF_TEST(24,16)
+DEF_TEST(25,1)
+DEF_TEST(25,2)
+DEF_TEST(25,4)
+DEF_TEST(25,8)
+DEF_TEST(25,16)
+DEF_TEST(26,1)
+DEF_TEST(26,2)
+DEF_TEST(26,4)
+DEF_TEST(26,8)
+DEF_TEST(26,16)
+DEF_TEST(27,1)
+DEF_TEST(27,2)
+DEF_TEST(27,4)
+DEF_TEST(27,8)
+DEF_TEST(27,16)
+DEF_TEST(28,1)
+DEF_TEST(28,2)
+DEF_TEST(28,4)
+DEF_TEST(28,8)
+DEF_TEST(28,16)
+DEF_TEST(29,1)
+DEF_TEST(29,2)
+DEF_TEST(29,4)
+DEF_TEST(29,8)
+DEF_TEST(29,16)
+DEF_TEST(30,1)
+DEF_TEST(30,2)
+DEF_TEST(30,4)
+DEF_TEST(30,8)
+DEF_TEST(30,16)
+DEF_TEST(31,1)
+DEF_TEST(31,2)
+DEF_TEST(31,4)
+DEF_TEST(31,8)
+DEF_TEST(31,16)
+DEF_TEST(32,1)
+DEF_TEST(32,2)
+DEF_TEST(32,4)
+DEF_TEST(32,8)
+DEF_TEST(32,16)
+DEF_TEST(33,1)
+DEF_TEST(33,2)
+DEF_TEST(33,4)
+DEF_TEST(33,8)
+DEF_TEST(33,16)
+DEF_TEST(34,1)
+DEF_TEST(34,2)
+DEF_TEST(34,4)
+DEF_TEST(34,8)
+DEF_TEST(34,16)
+DEF_TEST(35,1)
+DEF_TEST(35,2)
+DEF_TEST(35,4)
+DEF_TEST(35,8)
+DEF_TEST(35,16)
+DEF_TEST(36,1)
+DEF_TEST(36,2)
+DEF_TEST(36,4)
+DEF_TEST(36,8)
+DEF_TEST(36,16)
+DEF_TEST(37,1)
+DEF_TEST(37,2)
+DEF_TEST(37,4)
+DEF_TEST(37,8)
+DEF_TEST(37,16)
+DEF_TEST(38,1)
+DEF_TEST(38,2)
+DEF_TEST(38,4)
+DEF_TEST(38,8)
+DEF_TEST(38,16)
+DEF_TEST(39,1)
+DEF_TEST(39,2)
+DEF_TEST(39,4)
+DEF_TEST(39,8)
+DEF_TEST(39,16)
+DEF_TEST(40,1)
+DEF_TEST(40,2)
+DEF_TEST(40,4)
+DEF_TEST(40,8)
+DEF_TEST(40,16)
+DEF_TEST(41,1)
+DEF_TEST(41,2)
+DEF_TEST(41,4)
+DEF_TEST(41,8)
+DEF_TEST(41,16)
+DEF_TEST(42,1)
+DEF_TEST(42,2)
+DEF_TEST(42,4)
+DEF_TEST(42,8)
+DEF_TEST(42,16)
+DEF_TEST(43,1)
+DEF_TEST(43,2)
+DEF_TEST(43,4)
+DEF_TEST(43,8)
+DEF_TEST(43,16)
+DEF_TEST(44,1)
+DEF_TEST(44,2)
+DEF_TEST(44,4)
+DEF_TEST(44,8)
+DEF_TEST(44,16)
+DEF_TEST(45,1)
+DEF_TEST(45,2)
+DEF_TEST(45,4)
+DEF_TEST(45,8)
+DEF_TEST(45,16)
+DEF_TEST(46,1)
+DEF_TEST(46,2)
+DEF_TEST(46,4)
+DEF_TEST(46,8)
+DEF_TEST(46,16)
+DEF_TEST(47,1)
+DEF_TEST(47,2)
+DEF_TEST(47,4)
+DEF_TEST(47,8)
+DEF_TEST(47,16)
+DEF_TEST(48,1)
+DEF_TEST(48,2)
+DEF_TEST(48,4)
+DEF_TEST(48,8)
+DEF_TEST(48,16)
+DEF_TEST(49,1)
+DEF_TEST(49,2)
+DEF_TEST(49,4)
+DEF_TEST(49,8)
+DEF_TEST(49,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
+#else
+DEF_TEST(3,1)
+DEF_TEST(4,1)
+DEF_TEST(4,2)
+DEF_TEST(4,4)
+DEF_TEST(5,1)
+DEF_TEST(6,1)
+DEF_TEST(7,1)
+DEF_TEST(8,1)
+DEF_TEST(8,2)
+DEF_TEST(8,4)
+DEF_TEST(8,8)
+DEF_TEST(9,1)
+DEF_TEST(16,1)
+DEF_TEST(16,2)
+DEF_TEST(16,4)
+DEF_TEST(16,8)
+DEF_TEST(16,16)
+DEF_TEST(32,1)
+DEF_TEST(32,2)
+DEF_TEST(32,4)
+DEF_TEST(32,8)
+DEF_TEST(32,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
+#endif
+
+int
+main(int argc, char **argv)
+{
+
+#ifdef TEST_ALL
+  RUN_TEST(1,1);
+  RUN_TEST(1,2);
+  RUN_TEST(1,4);
+  RUN_TEST(1,8);
+  RUN_TEST(1,16);
+  RUN_TEST(2,1);
+  RUN_TEST(2,2);
+  RUN_TEST(2,4);
+  RUN_TEST(2,8);
+  RUN_TEST(2,16);
+  RUN_TEST(3,1);
+  RUN_TEST(3,2);
+  RUN_TEST(3,4);
+  RUN_TEST(3,8);
+  RUN_TEST(3,16);
+  RUN_TEST(4,1);
+  RUN_TEST(4,2);
+  RUN_TEST(4,4);
+  RUN_TEST(4,8);
+  RUN_TEST(4,16);
+  RUN_TEST(5,1);
+  RUN_TEST(5,2);
+  RUN_TEST(5,4);
+  RUN_TEST(5,8);
+  RUN_TEST(5,16);
+  RUN_TEST(6,1);
+  RUN_TEST(6,2);
+  RUN_TEST(6,4);
+  RUN_TEST(6,8);
+  RUN_TEST(6,16);
+  RUN_TEST(7,1);
+  RUN_TEST(7,2);
+  RUN_TEST(7,4);
+  RUN_TEST(7,8);
+  RUN_TEST(7,16);
+  RUN_TEST(8,1);
+  RUN_TEST(8,2);
+  RUN_TEST(8,4);
+  RUN_TEST(8,8);
+  RUN_TEST(8,16);
+  RUN_TEST(9,1);
+  RUN_TEST(9,2);
+  RUN_TEST(9,4);
+  RUN_TEST(9,8);
+  RUN_TEST(9,16);
+  RUN_TEST(10,1);
+  RUN_TEST(10,2);
+  RUN_TEST(10,4);
+  RUN_TEST(10,8);
+  RUN_TEST(10,16);
+  RUN_TEST(11,1);
+  RUN_TEST(11,2);
+  RUN_TEST(11,4);
+  RUN_TEST(11,8);
+  RUN_TEST(11,16);
+  RUN_TEST(12,1);
+  RUN_TEST(12,2);
+  RUN_TEST(12,4);
+  RUN_TEST(12,8);
+  RUN_TEST(12,16);
+  RUN_TEST(13,1);
+  RUN_TEST(13,2);
+  RUN_TEST(13,4);
+  RUN_TEST(13,8);
+  RUN_TEST(13,16);
+  RUN_TEST(14,1);
+  RUN_TEST(14,2);
+  RUN_TEST(14,4);
+  RUN_TEST(14,8);
+  RUN_TEST(14,16);
+  RUN_TEST(15,1);
+  RUN_TEST(15,2);
+  RUN_TEST(15,4);
+  RUN_TEST(15,8);
+  RUN_TEST(15,16);
+  RUN_TEST(16,1);
+  RUN_TEST(16,2);
+  RUN_TEST(16,4);
+  RUN_TEST(16,8);
+  RUN_TEST(16,16);
+  RUN_TEST(17,1);
+  RUN_TEST(17,2);
+  RUN_TEST(17,4);
+  RUN_TEST(17,8);
+  RUN_TEST(17,16);
+  RUN_TEST(18,1);
+  RUN_TEST(18,2);
+  RUN_TEST(18,4);
+  RUN_TEST(18,8);
+  RUN_TEST(18,16);
+  RUN_TEST(19,1);
+  RUN_TEST(19,2);
+  RUN_TEST(19,4);
+  RUN_TEST(19,8);
+  RUN_TEST(19,16);
+  RUN_TEST(20,1);
+  RUN_TEST(20,2);
+  RUN_TEST(20,4);
+  RUN_TEST(20,8);
+  RUN_TEST(20,16);
+  RUN_TEST(21,1);
+  RUN_TEST(21,2);
+  RUN_TEST(21,4);
+  RUN_TEST(21,8);
+  RUN_TEST(21,16);
+  RUN_TEST(22,1);
+  RUN_TEST(22,2);
+  RUN_TEST(22,4);
+  RUN_TEST(22,8);
+  RUN_TEST(22,16);
+  RUN_TEST(23,1);
+  RUN_TEST(23,2);
+  RUN_TEST(23,4);
+  RUN_TEST(23,8);
+  RUN_TEST(23,16);
+  RUN_TEST(24,1);
+  RUN_TEST(24,2);
+  RUN_TEST(24,4);
+  RUN_TEST(24,8);
+  RUN_TEST(24,16);
+  RUN_TEST(25,1);
+  RUN_TEST(25,2);
+  RUN_TEST(25,4);
+  RUN_TEST(25,8);
+  RUN_TEST(25,16);
+  RUN_TEST(26,1);
+  RUN_TEST(26,2);
+  RUN_TEST(26,4);
+  RUN_TEST(26,8);
+  RUN_TEST(26,16);
+  RUN_TEST(27,1);
+  RUN_TEST(27,2);
+  RUN_TEST(27,4);
+  RUN_TEST(27,8);
+  RUN_TEST(27,16);
+  RUN_TEST(28,1);
+  RUN_TEST(28,2);
+  RUN_TEST(28,4);
+  RUN_TEST(28,8);
+  RUN_TEST(28,16);
+  RUN_TEST(29,1);
+  RUN_TEST(29,2);
+  RUN_TEST(29,4);
+  RUN_TEST(29,8);
+  RUN_TEST(29,16);
+  RUN_TEST(30,1);
+  RUN_TEST(30,2);
+  RUN_TEST(30,4);
+  RUN_TEST(30,8);
+  RUN_TEST(30,16);
+  RUN_TEST(31,1);
+  RUN_TEST(31,2);
+  RUN_TEST(31,4);
+  RUN_TEST(31,8);
+  RUN_TEST(31,16);
+  RUN_TEST(32,1);
+  RUN_TEST(32,2);
+  RUN_TEST(32,4);
+  RUN_TEST(32,8);
+  RUN_TEST(32,16);
+  RUN_TEST(33,1);
+  RUN_TEST(33,2);
+  RUN_TEST(33,4);
+  RUN_TEST(33,8);
+  RUN_TEST(33,16);
+  RUN_TEST(34,1);
+  RUN_TEST(34,2);
+  RUN_TEST(34,4);
+  RUN_TEST(34,8);
+  RUN_TEST(34,16);
+  RUN_TEST(35,1);
+  RUN_TEST(35,2);
+  RUN_TEST(35,4);
+  RUN_TEST(35,8);
+  RUN_TEST(35,16);
+  RUN_TEST(36,1);
+  RUN_TEST(36,2);
+  RUN_TEST(36,4);
+  RUN_TEST(36,8);
+  RUN_TEST(36,16);
+  RUN_TEST(37,1);
+  RUN_TEST(37,2);
+  RUN_TEST(37,4);
+  RUN_TEST(37,8);
+  RUN_TEST(37,16);
+  RUN_TEST(38,1);
+  RUN_TEST(38,2);
+  RUN_TEST(38,4);
+  RUN_TEST(38,8);
+  RUN_TEST(38,16);
+  RUN_TEST(39,1);
+  RUN_TEST(39,2);
+  RUN_TEST(39,4);
+  RUN_TEST(39,8);
+  RUN_TEST(39,16);
+  RUN_TEST(40,1);
+  RUN_TEST(40,2);
+  RUN_TEST(40,4);
+  RUN_TEST(40,8);
+  RUN_TEST(40,16);
+  RUN_TEST(41,1);
+  RUN_TEST(41,2);
+  RUN_TEST(41,4);
+  RUN_TEST(41,8);
+  RUN_TEST(41,16);
+  RUN_TEST(42,1);
+  RUN_TEST(42,2);
+  RUN_TEST(42,4);
+  RUN_TEST(42,8);
+  RUN_TEST(42,16);
+  RUN_TEST(43,1);
+  RUN_TEST(43,2);
+  RUN_TEST(43,4);
+  RUN_TEST(43,8);
+  RUN_TEST(43,16);
+  RUN_TEST(44,1);
+  RUN_TEST(44,2);
+  RUN_TEST(44,4);
+  RUN_TEST(44,8);
+  RUN_TEST(44,16);
+  RUN_TEST(45,1);
+  RUN_TEST(45,2);
+  RUN_TEST(45,4);
+  RUN_TEST(45,8);
+  RUN_TEST(45,16);
+  RUN_TEST(46,1);
+  RUN_TEST(46,2);
+  RUN_TEST(46,4);
+  RUN_TEST(46,8);
+  RUN_TEST(46,16);
+  RUN_TEST(47,1);
+  RUN_TEST(47,2);
+  RUN_TEST(47,4);
+  RUN_TEST(47,8);
+  RUN_TEST(47,16);
+  RUN_TEST(48,1);
+  RUN_TEST(48,2);
+  RUN_TEST(48,4);
+  RUN_TEST(48,8);
+  RUN_TEST(48,16);
+  RUN_TEST(49,1);
+  RUN_TEST(49,2);
+  RUN_TEST(49,4);
+  RUN_TEST(49,8);
+  RUN_TEST(49,16);
+#else
+  RUN_TEST(3,1);
+  RUN_TEST(4,1);
+  RUN_TEST(4,2);
+  RUN_TEST(4,4);
+  RUN_TEST(5,1);
+  RUN_TEST(6,1);
+  RUN_TEST(7,1);
+  RUN_TEST(8,1);
+  RUN_TEST(8,2);
+  RUN_TEST(8,4);
+  RUN_TEST(8,8);
+  RUN_TEST(9,1);
+  RUN_TEST(16,1);
+  RUN_TEST(16,2);
+  RUN_TEST(16,4);
+  RUN_TEST(16,8);
+  RUN_TEST(16,16);
+  RUN_TEST(32,1);
+  RUN_TEST(32,2);
+  RUN_TEST(32,4);
+  RUN_TEST(32,8);
+  RUN_TEST(32,16);
+#endif
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/strncmp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/strncmp-1.c	(revision 244322)
+++ gcc/testsuite/gcc.dg/strncmp-1.c	(working copy)
@@ -1,4 +1,4 @@ 
-/* Test memcmp builtin expansion for compilation and proper execution.  */
+/* Test strncmp builtin expansion for compilation and proper execution.  */
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-require-effective-target ptr32plus } */
@@ -32,31 +32,32 @@ 
       a[SZ] = 0;							    \
       b[SZ] = 0;					   		    \
       if (!((r1 = strncmp (b, a, SZ)) > 0))   		   		    \
-        {								    \
-	  abort ();							    \
-	}								    \
+	abort ();							    \
       if (!((r1 = strncmp (a, b, SZ)) < 0))			   	    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       b[i] = '1';					   		    \
       if (!((r1 = strncmp (a, b, SZ)) == 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       for(j = i; j < SZ ; j++)			   		            \
 	{						   		    \
 	  a[j] = '1';            			   		    \
 	  b[j] = '2';			                   		    \
 	}						   		    \
-      if (!((r1 = strncmp(b, a, SZ)) > 0))		   		    \
+      if (!((r1 = strncmp (b, a, SZ)) > 0))		   		    \
+	abort ();							    \
+      if (!((r1 = strncmp (a, b, SZ)) < 0))		   		    \
+	abort ();							    \
+      for(j = 0; j < i ; j++)						    \
         {								    \
-	  abort ();							    \
-	}             							    \
-      if (!((r1 = strncmp(a, b, SZ)) < 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}		   						    \
+	  memset(a, '-', SZ);						    \
+	  memset(b, '-', SZ);						    \
+	  a[j] = '\0';							    \
+	  a[j+1] = '1';							    \
+	  b[j] = '\0';							    \
+	  b[j+1] = '2';							    \
+	  if ((r1 = strncmp (b, a, SZ)) != 0)				    \
+	    abort ();							    \
+	}                                                                   \
       a = three + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
       b = four + 4096 - (SZ / 2 + (i & 1) * ALIGN);		   	    \
       memset(a, '-', SZ);					   	    \
@@ -66,18 +67,12 @@ 
       a[SZ] = 0;							    \
       b[SZ] = 0;					   		    \
       if (!((r1 = strncmp(b, a, SZ)) > 0))   		   		    \
-        {								    \
-	  abort ();							    \
-	}								    \
+	abort ();							    \
       if (!((r1 = strncmp(a, b, SZ)) < 0))			   	    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
       b[i] = '1';					   		    \
       if (!((r1 = strncmp(a, b, SZ)) == 0))		   		    \
-        {								    \
-	  abort ();							    \
-	}                            					    \
+	abort ();							    \
     }							                    \
 }                                                                
 
@@ -327,6 +322,11 @@ 
 DEF_TEST(49,4)
 DEF_TEST(49,8)
 DEF_TEST(49,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
 #else
 DEF_TEST(3,1)
 DEF_TEST(4,1)
@@ -350,6 +350,11 @@ 
 DEF_TEST(32,4)
 DEF_TEST(32,8)
 DEF_TEST(32,16)
+DEF_TEST(100,1)
+DEF_TEST(100,2)
+DEF_TEST(100,4)
+DEF_TEST(100,8)
+DEF_TEST(100,16)
 #endif
 
 int