diff mbox series

[v4,2/4] lib: vsprintf: Fix handling of number field widths in vsscanf

Message ID 20210203165009.6299-2-rf@opensource.cirrus.com
State Superseded
Headers show
Series None | expand

Commit Message

Richard Fitzgerald Feb. 3, 2021, 4:50 p.m. UTC
The existing code attempted to handle numbers by doing a strto[u]l(),
ignoring the field width, and then repeatedly dividing to extract the
field out of the full converted value. If the string contains a run of
valid digits longer than will fit in a long or long long, this would
overflow and no amount of dividing can recover the correct value.

This patch fixes vsscanf to obey number field widths when parsing
the number.

A new _parse_integer_limit() is added that takes a limit for the number
of characters to parse. The number field conversion in vsscanf is changed
to use this new function.

If a number starts with a radix prefix, the field width  must be long
enough for at last one digit after the prefix. If not, it will be handled
like this:

 sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x'
 sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4'

This is consistent with the observed behaviour of userland sscanf.

Note that this patch does NOT fix the problem of a single field value
overflowing the target type. So for example:

  sscanf("123456789abcdef", "%x", &i);

Will not produce the correct result because the value obviously overflows
INT_MAX. But sscanf will report a successful conversion.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 lib/kstrtox.c  | 13 ++++++--
 lib/kstrtox.h  |  2 ++
 lib/vsprintf.c | 88 +++++++++++++++++++++++++++++---------------------
 3 files changed, 64 insertions(+), 39 deletions(-)

Comments

Petr Mladek Feb. 4, 2021, 4:35 p.m. UTC | #1
On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:
> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

> > The existing code attempted to handle numbers by doing a strto[u]l(),

> > ignoring the field width, and then repeatedly dividing to extract the

> > field out of the full converted value. If the string contains a run of

> > valid digits longer than will fit in a long or long long, this would

> > overflow and no amount of dividing can recover the correct value.


> ...

> 

> > +	for (; max_chars > 0; max_chars--) {

> 

> Less fragile is to write

> 

> 	while (max_chars--)


Except that the original was more obvious at least for me.
I always prefer more readable code when the compiler might do
the optimization easily. But this is my personal taste.
I am fine with both variants.

> 

> This allows max_char to be an unsigned type.

> 

> Moreover...

> 

> > +	return _parse_integer_limit(s, base, p, INT_MAX);

> 

> You have inconsistency with INT_MAX vs, size_t above.


Ah, this was on my request. INT_MAX is already used on many other
locations in vsnprintf() for this purpose.

An alternative is to fix all the other locations. We would need to
check if it is really safe. Well, I do not want to force Richard
to fix this historical mess. He already put a lot lot of effort
into fixing this long term issue.

...

> > -	unsigned long long result;

> > +	const char *cp;

> > +	unsigned long long result = 0ULL;

> >  	unsigned int rv;

> >  

> > -	cp = _parse_integer_fixup_radix(cp, &base);

> > -	rv = _parse_integer(cp, base, &result);

> 

> > +	if (max_chars == 0) {

> > +		cp = startp;

> > +		goto out;

> > +	}

> 

> It's redundant if I'm not mistaken.


Also this is more obvious and less error prone from my POV.
But I agree that it is redundant. I am not sure if this
function is used in some fast paths.

Again I am fine with both variants.

> > +	cp = _parse_integer_fixup_radix(startp, &base);

> > +	if ((cp - startp) >= max_chars) {

> > +		cp = startp + max_chars;

> > +		goto out;

> > +	}

> 

> This will be exactly the same, no?


Best Regards,
Petr
Richard Fitzgerald Feb. 5, 2021, 11:28 a.m. UTC | #2
On 04/02/2021 16:35, Petr Mladek wrote:
> On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

>> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

>>> The existing code attempted to handle numbers by doing a strto[u]l(),

>>> ignoring the field width, and then repeatedly dividing to extract the

>>> field out of the full converted value. If the string contains a run of

>>> valid digits longer than will fit in a long or long long, this would

>>> overflow and no amount of dividing can recover the correct value.

> 

>> ...

>>

>>> +	for (; max_chars > 0; max_chars--) {

>>

>> Less fragile is to write

>>

>> 	while (max_chars--)

> 

> Except that the original was more obvious at least for me.

> I always prefer more readable code when the compiler might do

> the optimization easily. But this is my personal taste.

> I am fine with both variants.

> 

>>

>> This allows max_char to be an unsigned type.

>>

>> Moreover...

>>

>>> +	return _parse_integer_limit(s, base, p, INT_MAX);

>>

>> You have inconsistency with INT_MAX vs, size_t above.

> 

> Ah, this was on my request. INT_MAX is already used on many other

> locations in vsnprintf() for this purpose.

> 


I originally had UINT_MAX and changed on Petr's request to be
consistent with other code. (Sorry Andy - my mistake not including
you on the earlier review versions).

But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr
said on his original review, INT_MAX is "big enough".

I don't mind either way.

> An alternative is to fix all the other locations. We would need to

> check if it is really safe. Well, I do not want to force Richard

> to fix this historical mess. He already put a lot lot of effort

> into fixing this long term issue.

> 

> ...

> 

>>> -	unsigned long long result;

>>> +	const char *cp;

>>> +	unsigned long long result = 0ULL;

>>>   	unsigned int rv;

>>>   

>>> -	cp = _parse_integer_fixup_radix(cp, &base);

>>> -	rv = _parse_integer(cp, base, &result);

>>

>>> +	if (max_chars == 0) {

>>> +		cp = startp;

>>> +		goto out;

>>> +	}

>>

>> It's redundant if I'm not mistaken.

> 

> Also this is more obvious and less error prone from my POV.

> But I agree that it is redundant. I am not sure if this

> function is used in some fast paths.

> 

> Again I am fine with both variants.

> 

>>> +	cp = _parse_integer_fixup_radix(startp, &base);

>>> +	if ((cp - startp) >= max_chars) {

>>> +		cp = startp + max_chars;

>>> +		goto out;

>>> +	}

>>

>> This will be exactly the same, no?

> 

> Best Regards,

> Petr

>
Andy Shevchenko Feb. 5, 2021, 12:50 p.m. UTC | #3
On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald
<rf@opensource.cirrus.com> wrote:
> On 04/02/2021 16:35, Petr Mladek wrote:

> > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

> >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:


...

> >>> +   for (; max_chars > 0; max_chars--) {

> >>

> >> Less fragile is to write

> >>

> >>      while (max_chars--)

> >

> > Except that the original was more obvious at least for me.

> > I always prefer more readable code when the compiler might do

> > the optimization easily. But this is my personal taste.

> > I am fine with both variants.


I *slightly* prefer while-loop *in this case* due to less characters
to parse to understand the logic.

> >> This allows max_char to be an unsigned type.

> >>

> >> Moreover...

> >>

> >>> +   return _parse_integer_limit(s, base, p, INT_MAX);

> >>

> >> You have inconsistency with INT_MAX vs, size_t above.

> >

> > Ah, this was on my request. INT_MAX is already used on many other

> > locations in vsnprintf() for this purpose.

>

> I originally had UINT_MAX and changed on Petr's request to be

> consistent with other code. (Sorry Andy - my mistake not including

> you on the earlier review versions).

>

> But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr

> said on his original review, INT_MAX is "big enough".


Some code has INT_MAX, some has UINT_MAX, while the parameter is size_t.
I think all of these inconsistencies should have a comment either in
the code, or in the commit message, or in the cover letter (depending
on the importance).
Or being fixed to be more consistent with existing code. Whichever you
consider better.

> I don't mind either way.

>

> > An alternative is to fix all the other locations. We would need to

> > check if it is really safe. Well, I do not want to force Richard

> > to fix this historical mess. He already put a lot lot of effort

> > into fixing this long term issue.


...

> >>> -   unsigned long long result;

> >>> +   const char *cp;

> >>> +   unsigned long long result = 0ULL;

> >>>     unsigned int rv;

> >>>

> >>> -   cp = _parse_integer_fixup_radix(cp, &base);

> >>> -   rv = _parse_integer(cp, base, &result);

> >>

> >>> +   if (max_chars == 0) {

> >>> +           cp = startp;

> >>> +           goto out;

> >>> +   }

> >>

> >> It's redundant if I'm not mistaken.

> >

> > Also this is more obvious and less error prone from my POV.

> > But I agree that it is redundant. I am not sure if this

> > function is used in some fast paths.

> >

> > Again I am fine with both variants.


I think we don't need a (redundant) code, but rather a comment.

> >>> +   cp = _parse_integer_fixup_radix(startp, &base);

> >>> +   if ((cp - startp) >= max_chars) {

> >>> +           cp = startp + max_chars;

> >>> +           goto out;

> >>> +   }

> >>

> >> This will be exactly the same, no?


-- 
With Best Regards,
Andy Shevchenko
David Laight Feb. 5, 2021, 3:23 p.m. UTC | #4
From: Andy Shevchenko

> Sent: 05 February 2021 12:51

> 

> On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald

> <rf@opensource.cirrus.com> wrote:

> > On 04/02/2021 16:35, Petr Mladek wrote:

> > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

> > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

> 

> ...

> 

> > >>> +   for (; max_chars > 0; max_chars--) {

> > >>

> > >> Less fragile is to write

> > >>

> > >>      while (max_chars--)

> > >

> > > Except that the original was more obvious at least for me.

> > > I always prefer more readable code when the compiler might do

> > > the optimization easily. But this is my personal taste.

> > > I am fine with both variants.

> 

> I *slightly* prefer while-loop *in this case* due to less characters

> to parse to understand the logic.


The two loops are also have different values for 'max_chars'
inside the loop body.

If 'max_chars' is known to be non-zero the do ... while (--max_chars);
loop will probable generate better code.
But there is no accounting for just how odd some decisions gcc
makes are.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andy Shevchenko Feb. 5, 2021, 3:53 p.m. UTC | #5
On Fri, Feb 5, 2021 at 5:23 PM David Laight <David.Laight@aculab.com> wrote:
> From: Andy Shevchenko

> > Sent: 05 February 2021 12:51

> > On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald

> > <rf@opensource.cirrus.com> wrote:

> > > On 04/02/2021 16:35, Petr Mladek wrote:

> > > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

> > > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

> >

> > ...

> >

> > > >>> +   for (; max_chars > 0; max_chars--) {

> > > >>

> > > >> Less fragile is to write

> > > >>

> > > >>      while (max_chars--)

> > > >

> > > > Except that the original was more obvious at least for me.

> > > > I always prefer more readable code when the compiler might do

> > > > the optimization easily. But this is my personal taste.

> > > > I am fine with both variants.

> >

> > I *slightly* prefer while-loop *in this case* due to less characters

> > to parse to understand the logic.

>

> The two loops are also have different values for 'max_chars'

> inside the loop body.


off-by-one to be precise.

> If 'max_chars' is known to be non-zero the do ... while (--max_chars);

> loop will probable generate better code.


What?! while (--x)  and while(x--) are equivalent.

> But there is no accounting for just how odd some decisions gcc

> makes are.


Why should we care about the compiler bugs here?

-- 
With Best Regards,
Andy Shevchenko
Richard Fitzgerald Feb. 8, 2021, 11:47 a.m. UTC | #6
On 04/02/2021 16:35, Petr Mladek wrote:
> On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

>> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

>>> The existing code attempted to handle numbers by doing a strto[u]l(),

>>> ignoring the field width, and then repeatedly dividing to extract the

>>> field out of the full converted value. If the string contains a run of

>>> valid digits longer than will fit in a long or long long, this would

>>> overflow and no amount of dividing can recover the correct value.

> 

>> ...

>>

>>> +	for (; max_chars > 0; max_chars--) {

>>

>> Less fragile is to write

>>

>> 	while (max_chars--)

> 

> Except that the original was more obvious at least for me.

> I always prefer more readable code when the compiler might do

> the optimization easily. But this is my personal taste.

> I am fine with both variants.

> 

>>

>> This allows max_char to be an unsigned type.

>>

>> Moreover...

>>

>>> +	return _parse_integer_limit(s, base, p, INT_MAX);

>>

>> You have inconsistency with INT_MAX vs, size_t above.

> 

> Ah, this was on my request. INT_MAX is already used on many other

> locations in vsnprintf() for this purpose.

> 


Strictly speaking this should be SIZE_MAX because the argument is a
size_t.
Petr Mladek Feb. 8, 2021, 5:56 p.m. UTC | #7
On Fri 2021-02-05 14:50:56, Andy Shevchenko wrote:
> On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald

> <rf@opensource.cirrus.com> wrote:

> > On 04/02/2021 16:35, Petr Mladek wrote:

> > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote:

> > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote:

> > >> This allows max_char to be an unsigned type.

> > >>

> > >> Moreover...

> > >>

> > >>> +   return _parse_integer_limit(s, base, p, INT_MAX);

> > >>

> > >> You have inconsistency with INT_MAX vs, size_t above.

> > >

> > > Ah, this was on my request. INT_MAX is already used on many other

> > > locations in vsnprintf() for this purpose.

> >

> > I originally had UINT_MAX and changed on Petr's request to be

> > consistent with other code. (Sorry Andy - my mistake not including

> > you on the earlier review versions).

> >

> > But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr

> > said on his original review, INT_MAX is "big enough".

> 

> Some code has INT_MAX, some has UINT_MAX, while the parameter is size_t.


Yeah, if I remember correctly I wanted to have INT_MAX everywhere but
I did not want to nitpick about it in the later versions. It looked
like an arbitrary number anyway.

> I think all of these inconsistencies should have a comment either in

> the code, or in the commit message, or in the cover letter (depending

> on the importance).

> Or being fixed to be more consistent with existing code. Whichever you

> consider better.


OK, you made me to do some archaeology. The INT_MAX limit has
been added into vsnprintf() in 2.6.2 by the commit:

Author: Linus Torvalds <torvalds@home.osdl.org>
Date:   Mon Feb 2 21:17:29 2004 -0800

    Warn loudly if somebody passes a negative value as
    the size to "vsnprintf()".

    That's a pretty clear case of overflow.


It might catch problems. And the limit seems to have worked all the time.

IMHO, it would make sense to have INT_MAX limit also in
_parse_integer_limit() and WARN() when a larger value is passed.

By other words, it would mean to add this check and use INT_MAX
everywhere in this patch.

Best Regards,
Petr
diff mbox series

Patch

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index a118b0b1e9b2..e6d14945df4f 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -39,20 +39,22 @@  const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
 
 /*
  * Convert non-negative integer string representation in explicitly given radix
- * to an integer.
+ * to an integer. A maximum of max_chars characters will be converted.
+ *
  * Return number of characters consumed maybe or-ed with overflow bit.
  * If overflow occurs, result integer (incorrect) is still returned.
  *
  * Don't you dare use this function.
  */
-unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
+unsigned int _parse_integer_limit(const char *s, unsigned int base,
+				  unsigned long long *p, size_t max_chars)
 {
 	unsigned long long res;
 	unsigned int rv;
 
 	res = 0;
 	rv = 0;
-	while (1) {
+	for (; max_chars > 0; max_chars--) {
 		unsigned int c = *s;
 		unsigned int lc = c | 0x20; /* don't tolower() this line */
 		unsigned int val;
@@ -82,6 +84,11 @@  unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
 	return rv;
 }
 
+unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
+{
+	return _parse_integer_limit(s, base, p, INT_MAX);
+}
+
 static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
 {
 	unsigned long long _res;
diff --git a/lib/kstrtox.h b/lib/kstrtox.h
index 3b4637bcd254..4c6536f85cac 100644
--- a/lib/kstrtox.h
+++ b/lib/kstrtox.h
@@ -4,6 +4,8 @@ 
 
 #define KSTRTOX_OVERFLOW	(1U << 31)
 const char *_parse_integer_fixup_radix(const char *s, unsigned int *base);
+unsigned int _parse_integer_limit(const char *s, unsigned int base,
+				  unsigned long long *res, size_t max_chars);
 unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *res);
 
 #endif
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 28bb26cd1f67..6afc47575331 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -53,29 +53,47 @@ 
 #include <linux/string_helpers.h>
 #include "kstrtox.h"
 
-/**
- * simple_strtoull - convert a string to an unsigned long long
- * @cp: The start of the string
- * @endp: A pointer to the end of the parsed string will be placed here
- * @base: The number base to use
- *
- * This function has caveats. Please use kstrtoull instead.
- */
-unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
+static unsigned long long simple_strntoull(const char *startp, size_t max_chars,
+					   char **endp, unsigned int base)
 {
-	unsigned long long result;
+	const char *cp;
+	unsigned long long result = 0ULL;
 	unsigned int rv;
 
-	cp = _parse_integer_fixup_radix(cp, &base);
-	rv = _parse_integer(cp, base, &result);
+	if (max_chars == 0) {
+		cp = startp;
+		goto out;
+	}
+
+	cp = _parse_integer_fixup_radix(startp, &base);
+	if ((cp - startp) >= max_chars) {
+		cp = startp + max_chars;
+		goto out;
+	}
+
+	max_chars -= (cp - startp);
+	rv = _parse_integer_limit(cp, base, &result, max_chars);
 	/* FIXME */
 	cp += (rv & ~KSTRTOX_OVERFLOW);
-
+out:
 	if (endp)
 		*endp = (char *)cp;
 
 	return result;
 }
+
+/**
+ * simple_strtoull - convert a string to an unsigned long long
+ * @cp: The start of the string
+ * @endp: A pointer to the end of the parsed string will be placed here
+ * @base: The number base to use
+ *
+ * This function has caveats. Please use kstrtoull instead.
+ */
+unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
+{
+	return simple_strntoull(cp, UINT_MAX, endp, base);
+}
 EXPORT_SYMBOL(simple_strtoull);
 
 /**
@@ -88,7 +106,7 @@  EXPORT_SYMBOL(simple_strtoull);
  */
 unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
 {
-	return simple_strtoull(cp, endp, base);
+	return simple_strntoull(cp, UINT_MAX, endp, base);
 }
 EXPORT_SYMBOL(simple_strtoul);
 
@@ -109,6 +127,19 @@  long simple_strtol(const char *cp, char **endp, unsigned int base)
 }
 EXPORT_SYMBOL(simple_strtol);
 
+static long long simple_strntoll(const char *cp, size_t max_chars, char **endp,
+				 unsigned int base)
+{
+	/*
+	 * simple_strntoull safely handles receiving max_chars==0 in the
+	 * case we start with max_chars==1 and find a '-' prefix.
+	 */
+	if (*cp == '-' && max_chars > 0)
+		return -simple_strntoull(cp + 1, max_chars - 1, endp, base);
+
+	return simple_strntoull(cp, max_chars, endp, base);
+}
+
 /**
  * simple_strtoll - convert a string to a signed long long
  * @cp: The start of the string
@@ -119,10 +150,7 @@  EXPORT_SYMBOL(simple_strtol);
  */
 long long simple_strtoll(const char *cp, char **endp, unsigned int base)
 {
-	if (*cp == '-')
-		return -simple_strtoull(cp + 1, endp, base);
-
-	return simple_strtoull(cp, endp, base);
+	return simple_strntoll(cp, UINT_MAX, endp, base);
 }
 EXPORT_SYMBOL(simple_strtoll);
 
@@ -3449,25 +3477,13 @@  int vsscanf(const char *buf, const char *fmt, va_list args)
 			break;
 
 		if (is_sign)
-			val.s = qualifier != 'L' ?
-				simple_strtol(str, &next, base) :
-				simple_strtoll(str, &next, base);
+			val.s = simple_strntoll(str,
+						field_width > 0 ? field_width : UINT_MAX,
+						&next, base);
 		else
-			val.u = qualifier != 'L' ?
-				simple_strtoul(str, &next, base) :
-				simple_strtoull(str, &next, base);
-
-		if (field_width > 0 && next - str > field_width) {
-			if (base == 0)
-				_parse_integer_fixup_radix(str, &base);
-			while (next - str > field_width) {
-				if (is_sign)
-					val.s = div_s64(val.s, base);
-				else
-					val.u = div_u64(val.u, base);
-				--next;
-			}
-		}
+			val.u = simple_strntoull(str,
+						 field_width > 0 ? field_width : UINT_MAX,
+						 &next, base);
 
 		switch (qualifier) {
 		case 'H':	/* that's 'hh' in format */