diff mbox series

[v2] lib/string.c: implement stpcpy

Message ID 20200815020946.1538085-1-ndesaulniers@google.com
State New
Headers show
Series [v2] lib/string.c: implement stpcpy | expand

Commit Message

Nick Desaulniers Aug. 15, 2020, 2:09 a.m. UTC
LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`. This generally avoids the machinery involved
in parsing format strings.  Calling `sprintf` with overlapping arguments
was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.

`stpcpy` is just like `strcpy` except it returns the pointer to the new
tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
one statement.

`stpcpy` was first standardized in POSIX.1-2008.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with:
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

This optimization was introduced into clang-12.

Cc: stable@vger.kernel.org
Link: https://bugs.llvm.org/show_bug.cgi?id=47162
Link: https://github.com/ClangBuiltLinux/linux/issues/1126
Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html
Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html
Link: https://reviews.llvm.org/D85963
Suggested-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Joe Perches <joe@perches.com>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

---
Changes V2:
* Added Sami's Tested by; though the patch changed implementation, the
  missing symbol at link time was the problem Sami was observing.
* Fix __restrict -> __restrict__ typo as per Joe.
* Drop note about restrict from commit message as per Arvind.
* Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL
* Fix off by one error as per Arvind; I had another off by one error in
  my test program that was masking this.

 include/linux/string.h |  3 +++
 lib/string.c           | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

-- 
2.28.0.220.ged08abb693-goog

Comments

Arvind Sankar Aug. 15, 2020, 2:58 a.m. UTC | #1
On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to

> `sprintf(dest, "%s", str)` where the return value is used to

> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved

> in parsing format strings.  Calling `sprintf` with overlapping arguments

> was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.

> 

> `stpcpy` is just like `strcpy` except it returns the pointer to the new

> tail of `dest`. This allows you to chain multiple calls to `stpcpy` in

> one statement.

> 

> `stpcpy` was first standardized in POSIX.1-2008.

> 

> Implement this so that we don't observe linkage failures due to missing

> symbol definitions for `stpcpy`.

> 

> Similar to last year's fire drill with:

> commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

> 

> This optimization was introduced into clang-12.

> 

> Cc: stable@vger.kernel.org

> Link: https://bugs.llvm.org/show_bug.cgi?id=47162

> Link: https://github.com/ClangBuiltLinux/linux/issues/1126

> Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html

> Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html

> Link: https://reviews.llvm.org/D85963

> Suggested-by: Arvind Sankar <nivedita@alum.mit.edu>

> Suggested-by: Joe Perches <joe@perches.com>

> Reported-by: Sami Tolvanen <samitolvanen@google.com>

> Tested-by: Sami Tolvanen <samitolvanen@google.com>

> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

> ---

> Changes V2:

> * Added Sami's Tested by; though the patch changed implementation, the

>   missing symbol at link time was the problem Sami was observing.

> * Fix __restrict -> __restrict__ typo as per Joe.

> * Drop note about restrict from commit message as per Arvind.

> * Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL

> * Fix off by one error as per Arvind; I had another off by one error in

>   my test program that was masking this.

> 

>  include/linux/string.h |  3 +++

>  lib/string.c           | 23 +++++++++++++++++++++++

>  2 files changed, 26 insertions(+)

> 

> diff --git a/include/linux/string.h b/include/linux/string.h

> index b1f3894a0a3e..7686dbca8582 100644

> --- a/include/linux/string.h

> +++ b/include/linux/string.h

> @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);

>  #ifndef __HAVE_ARCH_STRSCPY

>  ssize_t strscpy(char *, const char *, size_t);

>  #endif

> +#ifndef __HAVE_ARCH_STPCPY

> +extern char *stpcpy(char *__restrict__, const char *__restrict__);

> +#endif

>  

>  /* Wraps calls to strscpy()/memset(), no arch specific code required */

>  ssize_t strscpy_pad(char *dest, const char *src, size_t count);

> diff --git a/lib/string.c b/lib/string.c

> index 6012c385fb31..68ddbffbbd58 100644

> --- a/lib/string.c

> +++ b/lib/string.c

> @@ -272,6 +272,29 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)

>  }

>  EXPORT_SYMBOL(strscpy_pad);

>  

> +#ifndef __HAVE_ARCH_STPCPY

> +/**

> + * stpcpy - copy a string from src to dest returning a pointer to the new end

> + *          of dest, including src's NUL terminator. May overrun dest.

> + * @dest: pointer to end of string being copied into. Must be large enough

> + *        to receive copy.

> + * @src: pointer to the beginning of string being copied from. Must not overlap

> + *       dest.

> + *

> + * stpcpy differs from strcpy in two key ways:

> + * 1. inputs must not overlap.


Looks like you missed my second email: strcpy also does not allow inputs
to overlap. Couple typos below.

> + * 2. return value is the new NULL terminated character. (for strcpy, the

				 ^^ NUL terminator.
> + *    return value is a pointer to src.

				      ^^ dest.)
> + */

> +#undef stpcpy

> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)

> +{

> +	while ((*dest++ = *src++) != '\0')

> +		/* nothing */;

> +	return --dest;

> +}

> +#endif

> +

>  #ifndef __HAVE_ARCH_STRCAT

>  /**

>   * strcat - Append one %NUL-terminated string to another

> -- 

> 2.28.0.220.ged08abb693-goog

> 


The kernel-doc comments in string.c currently have a mix of %NUL and
NUL, but the former seems to be more common. %NUL-terminator appears to
be the preferred wording.
Joe Perches Aug. 15, 2020, 3:42 a.m. UTC | #2
On Fri, 2020-08-14 at 19:09 -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to

> `sprintf(dest, "%s", str)` where the return value is used to

> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved

> in parsing format strings.  Calling `sprintf` with overlapping arguments

> was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.

> 

> `stpcpy` is just like `strcpy` except it returns the pointer to the new

> tail of `dest`. This allows you to chain multiple calls to `stpcpy` in

> one statement.

[]
> diff --git a/include/linux/string.h b/include/linux/string.h

[]
> @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);

>  #ifndef __HAVE_ARCH_STRSCPY

>  ssize_t strscpy(char *, const char *, size_t);

>  #endif

> +#ifndef __HAVE_ARCH_STPCPY

> +extern char *stpcpy(char *__restrict__, const char *__restrict__);


If __restrict__ is used, perhaps it should follow the
kernel style used by attributes like __iomem and __user

extern char *stpcpy(char __restrict *dest, const char __restrict *src);

(though I would lose the extern too)

char *stpcpy(char __restrict *dest, const char __restrict *src);
Kees Cook Aug. 15, 2020, 4:34 p.m. UTC | #3
On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to
> `sprintf(dest, "%s", str)` where the return value is used to
> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> in parsing format strings.  Calling `sprintf` with overlapping arguments
> was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
> 
> `stpcpy` is just like `strcpy` except it returns the pointer to the new
> tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> one statement.

O_O What?

No; this is a _terrible_ API: there is no bounds checking, there are no
buffer sizes. Anything using the example sprintf() pattern is _already_
wrong and must be removed from the kernel. (Yes, I realize that the
kernel is *filled* with this bad assumption that "I'll never write more
than PAGE_SIZE bytes to this buffer", but that's both theoretically
wrong ("640k is enough for anybody") and has been known to be wrong in
practice too (e.g. when suddenly your writing routine is reachable by
splice(2) and you may not have a PAGE_SIZE buffer).

But we cannot _add_ another dangerous string API. We're already in a
terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
needs to be addressed up by removing the unbounded sprintf() uses. (And
to do so without introducing bugs related to using snprintf() when
scnprintf() is expected[4].)

-Kees

[1] https://github.com/KSPP/linux/issues/88
[2] https://github.com/KSPP/linux/issues/89
[3] https://github.com/KSPP/linux/issues/90
[4] https://lore.kernel.org/lkml/20200810092100.GA42813@2f5448a72a42/
kernel test robot Aug. 16, 2020, 7:44 a.m. UTC | #4
Hi Nick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]
[also build test ERROR on kees/for-next/pstore linus/master v5.8 next-20200814]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nick-Desaulniers/lib-string-c-implement-stpcpy/20200816-090542
base:   https://github.com/hnaz/linux-mm master
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "stpcpy" [sound/pci/ac97/snd-ac97-codec.ko] undefined!


---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/include/linux/string.h b/include/linux/string.h
index b1f3894a0a3e..7686dbca8582 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -31,6 +31,9 @@  size_t strlcpy(char *, const char *, size_t);
 #ifndef __HAVE_ARCH_STRSCPY
 ssize_t strscpy(char *, const char *, size_t);
 #endif
+#ifndef __HAVE_ARCH_STPCPY
+extern char *stpcpy(char *__restrict__, const char *__restrict__);
+#endif
 
 /* Wraps calls to strscpy()/memset(), no arch specific code required */
 ssize_t strscpy_pad(char *dest, const char *src, size_t count);
diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..68ddbffbbd58 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,29 @@  ssize_t strscpy_pad(char *dest, const char *src, size_t count)
 }
 EXPORT_SYMBOL(strscpy_pad);
 
+#ifndef __HAVE_ARCH_STPCPY
+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ *          of dest, including src's NUL terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ *        to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ *       dest.
+ *
+ * stpcpy differs from strcpy in two key ways:
+ * 1. inputs must not overlap.
+ * 2. return value is the new NULL terminated character. (for strcpy, the
+ *    return value is a pointer to src.
+ */
+#undef stpcpy
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
+{
+	while ((*dest++ = *src++) != '\0')
+		/* nothing */;
+	return --dest;
+}
+#endif
+
 #ifndef __HAVE_ARCH_STRCAT
 /**
  * strcat - Append one %NUL-terminated string to another