diff mbox

[1/2] posix: execvpe cleanup

Message ID 1453731845-32255-1-git-send-email-adhemerval.zanella@linaro.org
State New
Headers show

Commit Message

Adhemerval Zanella Netto Jan. 25, 2016, 2:24 p.m. UTC
This patch removes all the dynamic allocation on execvpe code and
instead use direct stack allocation.  This is QoI approach to make
it possible use in scenarios where memory is shared with parent
(vfork or clone with CLONE_VM).

For default process spawn (script file without a shebang), stack
allocation is bounded by NAME_MAX plus PATH_MAX plus 1.  Large
file arguments returns an error (ENAMETOOLONG).  This differs than
current GLIBC pratice in general, but it used to limit stack
allocation for large inputs.  Also, path in PATH environment variable
larger than PATH_MAX are ignored.

The shell direct execution exeception, where execve returns ENOEXEC,
might requires a large stack allocation due large input argument list.

Tested on i686, x86_64, powerpc64le, and aarch64.

	* posix/execvpe.c (__execvpe): Remove dynamic allocation.
---
 posix/execvpe.c | 233 +++++++++++++++++++++-----------------------------------
 2 files changed, 92 insertions(+), 145 deletions(-)

-- 
1.9.1

Comments

Adhemerval Zanella Netto Jan. 25, 2016, 4:51 p.m. UTC | #1
On 25-01-2016 12:37, Andreas Schwab wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> 

>> For default process spawn (script file without a shebang), stack

>> allocation is bounded by NAME_MAX plus PATH_MAX plus 1.

> 

> Both NAME_MAX and PATH_MAX depend on the directory and may not be

> defined if there is no system wide limit.

> 

> Andreas.

> 


It is a compromise of allowing multiple different limits for each FS
and limiting potentially issues (stack overflow).

The first option would be just impose no limit on such variables 
(the PATH environment variable and the file argument size) and do
stack/alloca allocations.  This is what uclibc does for MMU targets
(for non-mmu it uses a mmap/unmap).  A side note is uclib does 
enforce a maximum file size returning ENAMETOOLONG for such case.

The second option would be to enforce some minimum limits based on default 
PATH_MAX and NAME_MAX.  This is what musl and BSDs (open, free, net) 
do. This have the downside of not really work on every different
FS we have out there, as you said.

Finally, the third options would is just to drop this patch and rework
the  posix_spawnp path to mimic this new execvpe behaviour with some
memory allocation limits by stack/alloca. I do not see it is as a good 
approach since it is basically a code duplication for an internal
function that can be provided for both implementations and for a QoI
execvpe should not use malloc (to avoid memory leak in vfork/clone
case).

Due GLIBC behaviour of allowing the script call in case of a script
without shebang, a possible stack overflow can happen (since exevpe
must re-construct the arguments for the shell script call). So I am
not sure if first or second option should be the best.
Adhemerval Zanella Netto Jan. 25, 2016, 5:19 p.m. UTC | #2
On 25-01-2016 14:52, Paul Eggert wrote:
> On 01/25/2016 06:24 AM, Adhemerval Zanella wrote:

>> -scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)

>> +maybe_script_execute (const char *path, char *const argv[], char *const envp[])

> 

> Why change the arg name from "file" to "path"? The GNU tradition is to use names like "path" for PATH and the like, not for file names that may contain slashes.

> 


I will change  that.

>> +  int argc = 0;

>> +  while (argv[argc++]);

> 

> Please don't format 'while' loops that way. Use 'continue;'. Also, what happens if argc exceeds INT_MAX?


I will change the loop form.  One option to handle the INT_MAX is to limit argc
counting to ARG_MAX and return E2BIG if it is the case.

> 

>>     /* Construct an argument list for the shell.  */

>> +  char *new_argv[argc];

> 

> Why can't this overflow the stack when ARGC is large? The original code tried to check for this overflow and do the right thing; why remove the check, flawed as it was?

> 

> 


It will if the stack is not large enough and as the main idea is to allow posix_spawnp
use execvpe internally instead of reimplement it itself.  As my previous answer to 
Andreas I see some options on how to handle it:

1. Limit total stack allocation to a predefined value. Linux practically does not
   imposes limits to argument number (although ARG_MAX is defined to 131072, 
   fs/exec.c:_do_execve code itself checks against a value of INT_MAX, 2^32-1)
   so I think practically ARG_MAX is a feasible value which gives us a potentially
   stack allocation of 512 for 32-bit and 1MB for 64-bit.  It could be less for
   this maybe_script_run.

2. Just drop this execve path and document that by using along either vfork/clone
   can lead to memory leak.  I can work on posix_spawn{p} to use another version
   of execvpe with memory limit.

I really would like to no go on 2 for some reason: 1. from a QoI I see that the
dynamic allocation limits the exevpe utilization for only fork model (although
I do see that practically this would be most common use case), and 2. it will
required some code duplication on posix_spawnp.  So I am open to suggestions.
diff mbox

Patch

diff --git a/posix/execvpe.c b/posix/execvpe.c
index 61697a7..625f744 100644
--- a/posix/execvpe.c
+++ b/posix/execvpe.c
@@ -15,7 +15,6 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <alloca.h>
 #include <unistd.h>
 #include <stdarg.h>
 #include <stdbool.h>
@@ -23,22 +22,29 @@ 
 #include <string.h>
 #include <errno.h>
 #include <paths.h>
-
+#include <confstr.h>
 
 /* The file is accessible but it is not an executable file.  Invoke
    the shell to interpret it as a script.  */
 static void
-internal_function
-scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)
+maybe_script_execute (const char *path, char *const argv[], char *const envp[])
 {
+  /* Count the arguments.  */
+  int argc = 0;
+  while (argv[argc++]);
+
   /* Construct an argument list for the shell.  */
+  char *new_argv[argc];
   new_argv[0] = (char *) _PATH_BSHELL;
-  new_argv[1] = (char *) file;
+  new_argv[1] = (char *) path;
   while (argc > 1)
     {
       new_argv[argc] = argv[argc - 1];
       --argc;
     }
+
+  /* Execute the shell.  */
+  __execve (new_argv[0], new_argv, envp);
 }
 
 
@@ -47,170 +53,107 @@  scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)
 int
 __execvpe (const char *file, char *const argv[], char *const envp[])
 {
+  /* We check the simple case first. */
   if (*file == '\0')
     {
-      /* We check the simple case first. */
       __set_errno (ENOENT);
       return -1;
     }
 
-  if (strchr (file, '/') != NULL)
+  /* Don't search when it contains a slash.  */
+  if (strchr (file, '/'))
     {
-      /* Don't search when it contains a slash.  */
       __execve (file, argv, envp);
 
       if (errno == ENOEXEC)
-	{
-	  /* Count the arguments.  */
-	  int argc = 0;
-	  while (argv[argc++])
-	    ;
-	  size_t len = (argc + 1) * sizeof (char *);
-	  char **script_argv;
-	  void *ptr = NULL;
-	  if (__libc_use_alloca (len))
-	    script_argv = alloca (len);
-	  else
-	    script_argv = ptr = malloc (len);
-
-	  if (script_argv != NULL)
-	    {
-	      scripts_argv (file, argv, argc, script_argv);
-	      __execve (script_argv[0], script_argv, envp);
-
-	      free (ptr);
-	    }
-	}
+        maybe_script_execute (file, argv, envp);
+
+      return -1;
     }
-  else
+
+  const char *path = getenv ("PATH");
+  if (!path)
+    path = CS_PATH;
+  /* Although GLIBC does not enforce NAME_MAX, we set it as the maximum
+     size to avoid unbounded stack allocation.  Same applies for
+     PATH_MAX.  */
+  size_t file_len = __strnlen (file, NAME_MAX + 1);
+  if (file_len > NAME_MAX)
     {
-      size_t pathlen;
-      size_t alloclen = 0;
-      char *path = getenv ("PATH");
-      if (path == NULL)
-	{
-	  pathlen = confstr (_CS_PATH, (char *) NULL, 0);
-	  alloclen = pathlen + 1;
-	}
-      else
-	pathlen = strlen (path);
+      errno = ENAMETOOLONG;
+      return -1;
+    }
+  size_t path_len = __strnlen (path, PATH_MAX - 1) + 1;
 
-      size_t len = strlen (file) + 1;
-      alloclen += pathlen + len + 1;
+  const char *subp;
+  bool got_eacces = false;
+  for (const char *p = path; ; p = subp)
+    {
+      char buffer[path_len + file_len + 1];
 
-      char *name;
-      char *path_malloc = NULL;
-      if (__libc_use_alloca (alloclen))
-	name = alloca (alloclen);
-      else
-	{
-	  path_malloc = name = malloc (alloclen);
-	  if (name == NULL)
-	    return -1;
-	}
+      subp = __strchrnul (p, ':');
 
-      if (path == NULL)
+      /* PATH is larger than PATH_MAX and thus potentially larger than
+	 the stack allocation.  */
+      if (subp - p >= path_len)
 	{
-	  /* There is no `PATH' in the environment.
-	     The default search path is the current directory
-	     followed by the path `confstr' returns for `_CS_PATH'.  */
-	  path = name + pathlen + len + 1;
-	  path[0] = ':';
-	  (void) confstr (_CS_PATH, path + 1, pathlen);
+          /* If there is only one path, bail out.  */
+	  if (!*subp)
+	    break;
+	  /* Otherwise skip to next one.  */
+	  continue;
 	}
 
-      /* Copy the file name at the top.  */
-      name = (char *) memcpy (name + pathlen + 1, file, len);
-      /* And add the slash.  */
-      *--name = '/';
+      /* Set current path considered plus a '/'.  */
+      memcpy (buffer, p, subp - p);
+      buffer[subp - p] = '/';
+      /* And the file to execute.  */
+      memcpy (buffer + (subp - p) + (subp > p), file, file_len + 1);
+
+      __execve (buffer, argv, envp);
+
+      if (errno == ENOEXEC)
+        maybe_script_execute (buffer, argv, envp);
 
-      char **script_argv = NULL;
-      void *script_argv_malloc = NULL;
-      bool got_eacces = false;
-      char *p = path;
-      do
+      switch (errno)
 	{
-	  char *startp;
-
-	  path = p;
-	  p = __strchrnul (path, ':');
-
-	  if (p == path)
-	    /* Two adjacent colons, or a colon at the beginning or the end
-	       of `PATH' means to search the current directory.  */
-	    startp = name + 1;
-	  else
-	    startp = (char *) memcpy (name - (p - path), path, p - path);
-
-	  /* Try to execute this name.  If it works, execve will not return. */
-	  __execve (startp, argv, envp);
-
-	  if (errno == ENOEXEC)
-	    {
-	      if (script_argv == NULL)
-		{
-		  /* Count the arguments.  */
-		  int argc = 0;
-		  while (argv[argc++])
-		    ;
-		  size_t arglen = (argc + 1) * sizeof (char *);
-		  if (__libc_use_alloca (alloclen + arglen))
-		    script_argv = alloca (arglen);
-		  else
-		    script_argv = script_argv_malloc = malloc (arglen);
-		  if (script_argv == NULL)
-		    {
-		      /* A possible EACCES error is not as important as
-			 the ENOMEM.  */
-		      got_eacces = false;
-		      break;
-		    }
-		  scripts_argv (startp, argv, argc, script_argv);
-		}
-
-	      __execve (script_argv[0], script_argv, envp);
-	    }
-
-	  switch (errno)
-	    {
-	    case EACCES:
-	      /* Record the we got a `Permission denied' error.  If we end
-		 up finding no executable we can use, we want to diagnose
-		 that we did find one but were denied access.  */
-	      got_eacces = true;
-	    case ENOENT:
-	    case ESTALE:
-	    case ENOTDIR:
-	      /* Those errors indicate the file is missing or not executable
-		 by us, in which case we want to just try the next path
-		 directory.  */
-	    case ENODEV:
-	    case ETIMEDOUT:
-	      /* Some strange filesystems like AFS return even
-		 stranger error numbers.  They cannot reasonably mean
-		 anything else so ignore those, too.  */
-	      break;
-
-	    default:
-	      /* Some other error means we found an executable file, but
-		 something went wrong executing it; return the error to our
-		 caller.  */
-	      return -1;
-	    }
+	  case EACCES:
+	  /* Record the we got a 'Permission denied' error.  If we end
+             up finding no executable we can use, we want to diagnose
+             that we did find one but were denied access.  */
+	    got_eacces = true;
+	  case ENOENT:
+	  case ESTALE:
+	  case ENOTDIR:
+	  /* Those errors indicate the file is missing or not executable
+	     by us, in which case we want to just try the next path
+	     directory.  */
+	  case ENODEV:
+	  case ETIMEDOUT:
+          /* Some strange filesystems like AFS return even
+             stranger error numbers.  They cannot reasonably mean
+             anything else so ignore those, too.  */
+	    break;
+
+          default:
+	  /* Some other error means we found an executable file, but
+	     something went wrong executing it; return the error to our
+	     caller.  */
+	    return -1;
 	}
-      while (*p++ != '\0');
-
-      /* We tried every element and none of them worked.  */
-      if (got_eacces)
-	/* At least one failure was due to permissions, so report that
-	   error.  */
-	__set_errno (EACCES);
 
-      free (script_argv_malloc);
-      free (path_malloc);
+      if (*subp++ == '\0')
+	break;
     }
 
-  /* Return the error from the last attempt (probably ENOENT).  */
+  /* We tried every element and none of them worked.  */
+  if (got_eacces)
+    /* At least one failure was due to permissions, so report that
+       error.  */
+    __set_errno (EACCES);
+
   return -1;
+
 }
+
 weak_alias (__execvpe, execvpe)