diff mbox

handle integer overflow/wrapping in printf directives (PR 78622)

Message ID dfa942c1-cdc2-fea5-fe49-543d01b19983@gmail.com
State Superseded
Headers show

Commit Message

Martin Sebor Dec. 1, 2016, 3:26 a.m. UTC
In the gimple-ssa-sprintf pass I made the incorrect assumption that
a wider integer argument in some range [X, Y] to a narrower directive
(such as int to %hhi) results in the number of bytes corresponding to
the range bounded by the number of bytes resulting from formatting X
and Y converted to the directive's type (how's that for a clear
description?)

Basically, given an int A in [X, Y], and sprintf("%hhi", A) the logic
was to transform the range to [X', Y'] where X' = (signed char)X and
Y' = (signed char)Y, compute a range of bytes [B, C] that X' and Y'
would format to, and use that as the range for A.  That's wrong when
X or Y are outside the range of the directive's type because of
overflow or wrapping.  It's possible to find B in [X, Y] such that
(signed char)B is outside the range of [X', Y'].

Bug 78622 - [7 Regression] -Wformat-length/-fprintf-return-value
incorrect with overflow/wrapping, derived from a comment on bug
78586 has a test case.

The attached patch fixes this problem.  A happy consequence of
the fix is that it also resolves bug 77721 - -Wformat-length not
uses arg range for converted vars (or at least makes the test
case in the bug pass; there are outstanding limitations due
to poor range info in the pass).

While there, I also fixed a couple of minor cosmetic issues with
the phrasing and formatting of diagnostics (unrelated to the main
problem).

Tested on x86-64.

Thanks
Martin

Comments

Jakub Jelinek Dec. 1, 2016, 7:26 a.m. UTC | #1
On Wed, Nov 30, 2016 at 08:26:04PM -0700, Martin Sebor wrote:
> @@ -795,6 +795,43 @@ get_width_and_precision (const conversion_spec &spec,

>    *pprec = prec;

>  }

>  

> +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual

> +   argument, due to the conversion from either *ARGMIN or *ARGMAX to

> +   the type of the directive's formal argument it's possible for both

> +   to result in the same number of bytes or a range of bytes that's

> +   less than the number of bytes that would result from formatting

> +   some other value in the range [*ARGMIN, *ARGMAX].  This can be

> +   determined by checking for the actual argument being in the range

> +   of the type of the directive.  If it isn't it must be assumed to

> +   take on the full range of the directive's type.

> +   Return true when the range has been adjusted, false otherwise.  */

> +

> +static bool

> +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)

> +{

> +  tree dirmin = TYPE_MIN_VALUE (dirtype);

> +  tree dirmax = TYPE_MAX_VALUE (dirtype);

> +

> +  if (tree_int_cst_lt (*argmin, dirmin)

> +      || tree_int_cst_lt (dirmax, *argmin)

> +      || tree_int_cst_lt (*argmax, dirmin)

> +      || tree_int_cst_lt (dirmax, *argmax))

> +    {

> +      if (TYPE_UNSIGNED (dirtype))

> +	{

> +	  *argmin = dirmin;

> +	  *argmax = dirmax;

> +	}

> +      else

> +	{

> +	  *argmin = integer_zero_node;

> +	  *argmax = dirmin;

> +	}

> +      return true;

> +    }


Isn't this too simplistic?  I mean, if you have say dirtype of signed char
and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
has range 32, 64, while I think your routine will yield -128, 127 (well,
0 as min and -128 as max as that is what you return for signed type).

Can't you subtract argmax - argmin (best just in wide_int, no need to build
trees), and use what you have just for the case where that number doesn't
fit into the narrower precision, otherwise if argmin - (dirtype) argmin
== argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
as the range, and in case that it crosses a boundary figure if you can do
anything than the above?  Guess all cases of signed/unsigned dirtype and/or
argtype need to be considered.

Also, is argmin and argmax in this case the actual range (what should go
into res.arg{min,max}), or the values with shortest/longest representation?
Wouldn't it be better to always compute the range of values that can be
printed and only later on (after all VR_RANGE and VR_VARYING handling)
transform that into the number with shortest/longest representation in that
range?  Perhaps even using different variable names for the latter would
make things clearer (argshortest, arglongest or whatever).

	Jakub
Jakub Jelinek Dec. 1, 2016, 9:15 a.m. UTC | #2
On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:
> Isn't this too simplistic?  I mean, if you have say dirtype of signed char

> and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg

> has range 32, 64, while I think your routine will yield -128, 127 (well,

> 0 as min and -128 as max as that is what you return for signed type).

> 

> Can't you subtract argmax - argmin (best just in wide_int, no need to build

> trees), and use what you have just for the case where that number doesn't

> fit into the narrower precision, otherwise if argmin - (dirtype) argmin

> == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax

> as the range, and in case that it crosses a boundary figure if you can do

> anything than the above?  Guess all cases of signed/unsigned dirtype and/or

> argtype need to be considered.


Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
              || (vr0.type == VR_RANGE
                  && integer_zerop (int_const_binop (RSHIFT_EXPR,
                       int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
                         size_int (TYPE_PRECISION (outer_type)))))))
part that is important here for the narrowing conversion.

	Jakub
diff mbox

Patch

PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping

gcc/ChangeLog:

	PR middle-end/78622
	* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
	rather than res.bounded.
	(adjust_range_for_overflow): New function.
	(format_integer): Always set res.bounded to true unless either
	precision or width is specified and unknown.
	Call adjust_range_for_overflow.
	(format_directive): Remove vestigial quoting.
	(add_bytes): Correct the computation of boundrange used to
	decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

	PR middle-end/78622
	* gcc.c-torture/execute/pr78622.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
	behavior inadvertently introduced in a previous commit.  Tighten
	up final checking.
	* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
	add a final optimization check.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..eceed3e 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -637,7 +637,7 @@  min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res)
   if (HOST_WIDE_INT_MAX <= navail)
     return navail;
 
-  if (1 < warn_format_length || res.bounded)
+  if (1 < warn_format_length || res.knownrange)
     {
       /* At level 2, or when all directives output an exact number
 	 of bytes or when their arguments were bounded by known
@@ -795,6 +795,43 @@  get_width_and_precision (const conversion_spec &spec,
   *pprec = prec;
 }
 
+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted, false otherwise.  */
+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  tree dirmin = TYPE_MIN_VALUE (dirtype);
+  tree dirmax = TYPE_MAX_VALUE (dirtype);
+
+  if (tree_int_cst_lt (*argmin, dirmin)
+      || tree_int_cst_lt (dirmax, *argmin)
+      || tree_int_cst_lt (*argmax, dirmin)
+      || tree_int_cst_lt (dirmax, *argmax))
+    {
+      if (TYPE_UNSIGNED (dirtype))
+	{
+	  *argmin = dirmin;
+	  *argmax = dirmax;
+	}
+      else
+	{
+	  *argmin = integer_zero_node;
+	  *argmax = dirmin;
+	}
+      return true;
+    }
+  return false;
+}
+
 /* Return a range representing the minimum and maximum number of bytes
    that the conversion specification SPEC will write on output for the
    integer argument ARG when non-null.  ARG may be null (for vararg
@@ -989,6 +1026,10 @@  format_integer (const conversion_spec &spec, tree arg)
 
   fmtresult res;
 
+  /* The result is bounded unless width or precision has been specified
+     whose value is unknown.  */
+  res.bounded = width != HOST_WIDE_INT_MIN && prec != HOST_WIDE_INT_MIN;
+
   /* Using either the range the non-constant argument is in, or its
      type (either "formal" or actual), create a range of values that
      constrain the length of output given the warning level.  */
@@ -1010,6 +1051,12 @@  format_integer (const conversion_spec &spec, tree arg)
 	  res.argmax = build_int_cst (argtype, wi::fits_uhwi_p (max)
 				      ? max.to_uhwi () : max.to_shwi ());
 
+	  /* Set KNOWNRANGE if the argument is in a known subrange
+	     of the directive's type (KNOWNRANGE may be reset below).  */
+	  res.knownrange
+	    = (!tree_int_cst_equal (TYPE_MIN_VALUE (dirtype), res.argmin)
+	       && !tree_int_cst_equal (TYPE_MAX_VALUE (dirtype), res.argmax));
+
 	  /* For a range with a negative lower bound and a non-negative
 	     upper bound, use one to determine the minimum number of bytes
 	     on output and whichever of the two bounds that results in
@@ -1020,6 +1067,10 @@  format_integer (const conversion_spec &spec, tree arg)
 	    {
 	      argmin = res.argmin;
 	      argmax = res.argmax;
+
+	      res.knownrange
+		&= !adjust_range_for_overflow (dirtype, &argmin, &argmax);
+
 	      int minbytes = format_integer (spec, res.argmin).range.min;
 	      int maxbytes = format_integer (spec, res.argmax).range.max;
 	      if (maxbytes < minbytes)
@@ -1032,11 +1083,6 @@  format_integer (const conversion_spec &spec, tree arg)
 	      argmin = res.argmin;
 	      argmax = res.argmax;
 	    }
-
-	  /* The argument is bounded by the known range of values
-	     determined by Value Range Propagation.  */
-	  res.bounded = true;
-	  res.knownrange = true;
 	}
       else if (range_type == VR_ANTI_RANGE)
 	{
@@ -1101,6 +1147,10 @@  format_integer (const conversion_spec &spec, tree arg)
       res.argmax = argmax;
     }
 
+  /* Clear KNOWNRANGE if the range has been adjusted to the maximum
+     of the directive.  */
+  res.knownrange &= !adjust_range_for_overflow (dirtype, &argmin, &argmax);
+
   /* Recursively compute the minimum and maximum from the known range,
      taking care to swap them if the lower bound results in longer
      output than the upper bound (e.g., in the range [-1, 0].  */
@@ -1879,7 +1929,7 @@  format_directive (const pass_sprintf_length::call_info &info,
 			fmtres.argmin, fmtres.argmax);
 	      else
 		inform (info.fmtloc,
-			"using the range [%qE, %qE] for directive argument",
+			"using the range [%E, %E] for directive argument",
 			fmtres.argmin, fmtres.argmax);
 	    }
 	}
@@ -2057,7 +2107,7 @@  add_bytes (const pass_sprintf_length::call_info &info,
 	 indicate that the overflow/truncation may (but need not) happen.  */
       bool boundrange
 	= (res->number_chars_min < res->number_chars_max
-	   && res->number_chars_min < info.objsize);
+	   && res->number_chars_min + nbytes <= info.objsize);
 
       if (!end && ((nbytes - navail) == 1 || boundrange))
 	{
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78622.c b/gcc/testsuite/gcc.c-torture/execute/pr78622.c
new file mode 100644
index 0000000..44c9b0b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78622.c
@@ -0,0 +1,39 @@ 
+/* PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value
+   incorrect with overflow/wrapping */
+/* { dg-additional-options "-Wformat-length=2" } */
+
+__attribute__((noinline, noclone)) int
+foo (int x)
+{
+  if (x < 4096 + 8 || x >= 4096 + 256 + 8)
+    return -1;
+  char buf[5];
+  return __builtin_sprintf (buf, "%hhd", x + 1);
+}
+
+int
+bar (unsigned int x)
+{
+  if (x < 64 || x > 2U * __INT_MAX__ - 10)
+    return -1;
+  char buf[1];
+  return __builtin_sprintf (buf, "%d", x + 1);   /* { dg-warn "directive writing between 1 and 11 bytes into a region of size 1" "int32plus" { target { int32plus } } */
+}
+
+int
+main ()
+{
+  if (__SCHAR_MAX__ != 127 || __CHAR_BIT__ != 8 || __SIZEOF_INT__ != 4)
+    return 0;
+  if (foo (4095 + 9) != 1
+      || foo (4095 + 32) != 2
+      || foo (4095 + 127) != 3
+      || foo (4095 + 128) != 4
+      || foo (4095 + 240) != 3
+      || foo (4095 + 248) != 2
+      || foo (4095 + 255) != 2
+      || foo (4095 + 256) != 1)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c
index 4dddccdf..077c33b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c
@@ -75,6 +75,8 @@  EQL (0, 0, "%-s", "");
 EQL (1, 1, "%c",  'x');
 EQL (1, 1, "%-s", "x");
 
+EQL (1, 2, "%c",  'x');
+
 EQL (4, 4, "%4c", 'x');
 
 /* Verify that exceeding the environmental limit of 4095 bytes for
@@ -179,7 +181,7 @@  RNG (4,  4, 32, "%zu", sz)
 
 /* Exercise bug 78586.  */
 RNG (4,  4, 32, "%lu", (unsigned long)i)
-RNG (4,  4, 32, "%lu", (unsigned)u)
+RNG (4,  4, 32, "%lu", (unsigned long)u)
 RNG (4,  4, 32, "%lu", (unsigned long)li)
 RNG (4,  4, 32, "%lu", (unsigned long)lu)
 RNG (4,  4, 32, "%lu", (unsigned long)sz)
@@ -266,14 +268,17 @@  RNG (0,  6,   8, "%s%ls", "1", L"2");
    <bb 2>:
    result_3 = __builtin_sprintf (&MEM[(void *)&buf8k + 8192B], "%c", 32);
    if (result_3 != 0)
-     goto <bb 3>;
+     goto <bb 3>; [50.0%]
    else
-     goto <bb 4>;
+     goto <bb 4>; [50.0%]
 
-   <bb 3>:
+   <bb 3>[50.0%]:
    __builtin_abort ();
 
 */
 
-/* { dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n *__builtin_abort" 124 "optimized" { target { ilp32 || lp64 } } } } */
-/* { dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n *__builtin_abort" 93 "optimized" { target { { ! ilp32 } && { ! lp64 } } } } } */
+/*  Only conditional calls to abort should be made (with any probability:
+    { dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n *__builtin_abort" 124 "optimized" { target { ilp32 || lp64 } } } }
+    { dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n *__builtin_abort" 93 "optimized" { target { { ! ilp32 } && { ! lp64 } } } } }
+    No unconditional calls to abort should be made:
+    { dg-final { scan-tree-dump-not ";\n *__builtin_abort" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-6.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-6.c
index 4c41234..d1013e5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-6.c
@@ -50,6 +50,14 @@  void test_arg_int (int width, int prec, int i, int n)
 
   T ("%i", R (1, 10));
 
+  /* Each of the bounds of the ranges below results in just one byte
+     on output because they convert to the value 9 in type char, yet
+     other values in those ranges can result in up to four bytes.
+     For example, 4240 converts to -112.  Verify that the output
+     isn't folded into a constant.  */
+  T ("%hhi", R (4104, 4360) + 1);
+  T ("%hhu", R (4104, 4360) + 1);
+
   T ("%'i", 1234567);
 
   for (i = -n; i != n; ++i)
@@ -106,4 +114,4 @@  void test_invalid_directive (void)
 
 /* Use 'grep "^ *T (" builtin-sprintf-6.c  | wc -l' to determine
    the count for the directive below.
-   { dg-final { scan-tree-dump-times "snprintf" 42 "optimized"} } */
+   { dg-final { scan-tree-dump-times "snprintf" 44 "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
index fae584e..f49422a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
@@ -1039,8 +1039,8 @@  void test_sprintf_chk_s_nonconst (int w, int p, const char *s)
      is not.  */
   T ( 1, "%*s",  w, "");
   T ( 1, "%*s",  w, "1");       /* { dg-warning "nul past the end" } */
-  T ( 1, "%.*s", w, "");
-  T ( 1, "%.*s", w, "1");       /* { dg-warning "may write a terminating nul" } */
+  T ( 1, "%.*s", p, "");
+  T ( 1, "%.*s", p, "1");       /* { dg-warning "may write a terminating nul" } */
   T ( 1, "%.*s", w, "123");     /* { dg-warning "writing between 0 and 3 bytes into a region of size 1" } */
 
   T ( 1, "%*s", w, "123");      /* { dg-warning "writing 3 or more bytes into a region of size 1" } */
@@ -1294,6 +1294,12 @@  void test_sprintf_chk_int_nonconst (int w, int p, int a)
   T (3, "%2x",          a);
 
   T (1, "%.*d",      p, a);
+
+  T (4, "%i %i",        a, a);
+  /* The following will definitely be "writing a terminating nul past the end"
+     (i.e., not "may write".)  */
+  T (4, "%i %i ",       a, a);      /* { dg-warning "writing a terminating nul past the end" } */
+  T (4, "%i %i %i",     a, a, a);   /* { dg-warning "into a region" }*/
 }
 
 void test_sprintf_chk_e_nonconst (int w, int p, double d)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-6.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-6.c
index 0cb02b7..121ed4e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-6.c
@@ -2,15 +2,18 @@ 
    Test to verify that the correct range information is made available to the
    -Wformat-lenght check to prevent warnings.  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wformat -Wformat-length" } */
+/* { dg-options "-O2 -Wformat -Wformat-length -fdump-tree-optimized" } */
 
+void abort (void);
 int snprintf (char*, __SIZE_TYPE__, const char*, ...);
 
 void fuchar (unsigned char j, char *p)
 {
   if (j > 99)
     return;
-  snprintf (p, 4, "%3hu", j);
+
+  if (3 != snprintf (p, 4, "%3hu", j))
+    abort ();
 }
 
 void fschar (signed char j, char *p)
@@ -20,14 +23,17 @@  void fschar (signed char j, char *p)
   if (k > 99)
     return;
 
-  snprintf (p, 3, "%3hhu", k);   /* { dg-bogus "" "unsigned char" { xfail *-*-* } } */
+  if (3 != snprintf (p, 4, "%3hhu", k))
+    abort ();
 }
 
 void fushrt (unsigned short j, char *p)
 {
   if (j > 999)
     return;
-  snprintf (p, 4, "%3hu", j);
+
+  if (3 != snprintf (p, 4, "%3hu", j))
+    abort ();
 }
 
 void fshrt (short j, char *p)
@@ -37,7 +43,8 @@  void fshrt (short j, char *p)
   if (k > 999)
     return;
 
-  snprintf (p, 4, "%3hu", k);
+  if (3 != snprintf (p, 4, "%3hu", k))
+    abort ();
 }
 
 void fuint (unsigned j, char *p)
@@ -54,13 +61,16 @@  void fint (int j, char *p)
   if (k > 999)
     return;
 
-  snprintf (p, 4, "%3u", k);   /* { dg-bogus "" "unsigned int" { xfail *-*-* } } */
+  /* Range info isn't available here.  */
+  snprintf (p, 4, "%3u", k);
 }
 
 void fulong (unsigned long j, char *p)
 {
   if (j > 999)
     return;
+
+  /* Range info isn't available here.  */
   snprintf (p, 4, "%3lu", j);
 }
 
@@ -71,13 +81,16 @@  void flong (long j, char *p)
   if (k > 999)
     return;
 
-  snprintf (p, 4, "%3lu", k);   /* { dg-bogus "" "unsigned long" { xfail *-*-* } } */
+  /* Range info isn't available here.  */
+  snprintf (p, 4, "%3lu", k);
 }
 
 void fullong (unsigned long long j, char *p)
 {
   if (j > 999)
     return;
+
+  /* Range info isn't available here.  */
   snprintf (p, 4, "%3llu", j);
 }
 
@@ -88,5 +101,7 @@  void fllong (long j, char *p)
   if (k > 999)
     return;
 
-  snprintf (p, 4, "%3llu", k);   /* { dg-bogus "" "unsigned long long" { xfail lp64 } } */
+  snprintf (p, 4, "%3llu", k);
 }
+
+/* { dg-final { scan-tree-dump-not "abort" "optimized" } } */