[v2] posix: Fix glob with GLOB_NOCHECK returning modified patterns (BZ#10246)

Message ID 1506107512-5013-1-git-send-email-adhemerval.zanella@linaro.org
State New
Headers show
Series
  • [v2] posix: Fix glob with GLOB_NOCHECK returning modified patterns (BZ#10246)
Related show

Commit Message

Adhemerval Zanella Sept. 22, 2017, 7:11 p.m.
This is a follow up of my previous fix of BZ#10246 [1] which addresses
the issues raised by Paul Eggert.  The issue is for glob calls which
internally enabled GLOB_ONLYDIR (essentinally pattern ending with '/')
required for dangling symlinks to actually check if the directory is
accessible.  An extra testcase is added on tst-glob_symlinks.

---

According to POSIX glob with GLOB_NOCHECK should return a list consisting
of only of the input pattern in case of no match.  However GLIBC does not
honor in case of '//<something'.  This is due internally this is handled
and special case and prefix_array (responsable to prepend the directory
name) does not know if the input already contains a slash or not since
either '/<something>' or '//<something>' will be handle in same way.

This patch fix it by using a empty directory name for the latter (since
prefix_array already adds a slash as default for each entry).

Checked on x86_64-linux-gnu and on a build using build-many-glibcs.py
for all major architectures.

	* posix/glob (glob_lstat, glob_opendir, glob_readdir,
	glob_closedir): New function: wrapper for the lstat, opendir,
	readdir, and closedir respectively.
	(convert_dirent, convert_dirent64): Remove function.
	(glob, glob_in_dir):  Handle pattern that do not match and
	start with '/' correctly.
	* posix/globtest.sh: New tests for NOCHECK.
	* posix/tst-glob_symlinks.c: Add tests for dangling directory.

[1] https://sourceware.org/ml/libc-alpha/2017-09/msg00315.html

---
 ChangeLog                 |  11 ++++
 posix/glob.c              | 124 +++++++++++++++++++++++++---------------------
 posix/globtest.sh         |  37 ++++++++++++++
 posix/tst-glob_symlinks.c |   6 +++
 4 files changed, 121 insertions(+), 57 deletions(-)

-- 
2.7.4

Comments

Paul Eggert Sept. 23, 2017, 6:46 a.m. | #1
Adhemerval Zanella wrote:

> +  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)


Use __glibc_unlikely instead of __builtin_expect with 0, here and elsewhere.

> +	 ? (*pglob->gl_lstat) (fname, &ust->st)


No need for the "*" or for the initial pair of parens, both here and elsewhere.

> +	 : __lstat64 (fname, &ust->st64));


The ? and : should be indented one more column than the (.

> +}

> +

> +static void *

> +glob_opendir (glob_t *pglob, int flags, const char *directory)

> +{

> +  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)

> +	  ? (*pglob->gl_opendir) (directory)

> +	: opendir (directory));

>   }


The indenting is not right there: ? and : should have the same indentation.

> @@ -1332,6 +1334,17 @@ glob_in_dir (const char *pattern, const char *directory, int flags,

>   

>   	      if (fnmatch (pattern, d.name, fnm_flags) == 0)

>   		{

> +		  /* We need to certify that the link is a directory.  */

> +		  if (flags & GLOB_ONLYDIR && readdir_result_type (d) == DT_LNK > +		    {

> +		      void *ss = glob_opendir (pglob, flags, d.name);


I'm afraid there are several problems here. GLOB_ONLYDIR does not mean, "return 
only directories". It means, "the caller is interested only in directories, so 
if you can easily determine that the result is not a directory, don't bother to 
return it. However, if it is expensive to check then it is OK to return the 
result anyway, as it is the caller's responsibility to check that returned 
strings are directories." So the comment is wrong here, and the code looks wrong 
too: calling glob_opendir is expensive, so it should not be done here.

Also, why use glob_opendir to test whether d.name is a directory, when the rest 
of the code uses is_dir to do that?

Also, if readdir_result_type (d) is DT_UNKNOWN, d.name might be a directory, so 
the code should check in that case too.

Also, d.name is just a file name component: it should be absolute, or relative 
to the working directory.

Also, the code mistakenly treats leading "//" as if it were "/"; this is not 
correct, and does not work on some platforms. POSIX allows implementations where 
"//" is distinct from "/".

Also, it's not clear to me that the code treats strings of one or more "/"s 
correctly, in the pattern.
Adhemerval Zanella Sept. 23, 2017, 4 p.m. | #2
On 23/09/2017 03:46, Paul Eggert wrote:
> Adhemerval Zanella wrote:

> 

>> +  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)

> 

> Use __glibc_unlikely instead of __builtin_expect with 0, here and elsewhere.


Ack.

> 

>> +     ? (*pglob->gl_lstat) (fname, &ust->st)

> 

> No need for the "*" or for the initial pair of parens, both here and elsewhere.


Ack.

> 

>> +     : __lstat64 (fname, &ust->st64));

> 

> The ? and : should be indented one more column than the (.


Ack.

> 

>> +}

>> +

>> +static void *

>> +glob_opendir (glob_t *pglob, int flags, const char *directory)

>> +{

>> +  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)

>> +      ? (*pglob->gl_opendir) (directory)

>> +    : opendir (directory));

>>   }

> 

> The indenting is not right there: ? and : should have the same indentation.


Ack.

> 

>> @@ -1332,6 +1334,17 @@ glob_in_dir (const char *pattern, const char *directory, int flags,

>>               if (fnmatch (pattern, d.name, fnm_flags) == 0)

>>           {

>> +          /* We need to certify that the link is a directory.  */

>> +          if (flags & GLOB_ONLYDIR && readdir_result_type (d) == DT_LNK > +            {

>> +              void *ss = glob_opendir (pglob, flags, d.name);

> 

> I'm afraid there are several problems here. GLOB_ONLYDIR does not mean, "return only directories". It means, "the caller is interested only in directories, so if you can easily determine that the result is not a directory, don't bother to return it. However, if it is expensive to check then it is OK to return the result anyway, as it is the caller's responsibility to check that returned strings are directories." So the comment is wrong here, and the code looks wrong too: calling glob_opendir is expensive, so it should not be done here.


I tried to defer the call that check if the link is actually pointing to a directory
as late as possible.  Without actually following the possible directory the obtained 
matched name the example you provided:

  ln -s /no-such-file globlink1
  ln -s . globlink2 

Will be matched only for globlink1.

> 

> Also, why use glob_opendir to test whether d.name is a directory, when the rest of the code uses is_dir to do that?


Good point, it can be simplified to use is_dir indeed.

> 

> Also, if readdir_result_type (d) is DT_UNKNOWN, d.name might be a directory, so the code should check in that case too.


I excluded DT_UNKNOWN mainly because posix/tst-gnuglob.c uses a for the synthetic
directory to tests some entries as DT_UNKNOWN and it does not expect them to 
be matches.  If we change to also match DT_UNKNOWN for possible directories
entries it would meant change slight current glob semantic.

> 

> Also, d.name is just a file name component: it should be absolute, or relative to the working directory.


Not really sure what this may lead to issue in current patch, care to
elaborate?

> 

> Also, the code mistakenly treats leading "//" as if it were "/"; this is not correct, and does not work on some platforms. POSIX allows implementations where "//" is distinct from "/".


Indeed, although for *glibc* supported platforms this is not an issue. I don't
consider this a block for glibc inclusion, but I am open to suggestion on how 
to handle correctly on platforms with handle '//' distinct from '/'.

> 

> Also, it's not clear to me that the code treats strings of one or more "/"s correctly, in the pattern.


Same as previous remark, using the same testcase:

  glob_t g;
  int res = glob ("globlink[12]/", 0, NULL, &g);
  assert (res == 0 && g.gl_pathc == 1);
  assert (strcmp (g.gl_pathv[0], "globlink2/") == 0);

Using either 'globlink[12]//' or 'globlink[12]///' will handle the same
results as 'globlink[12]/'.  This still might be not the correct solution for
platform that  handle '//' distinct from '/', but again I do not think it is
block for *glibc*.
Paul Eggert Sept. 23, 2017, 4:31 p.m. | #3
Adhemerval Zanella wrote:

> Without actually following the possible directory the obtained

> matched name the example you provided:

> 

>    ln -s /no-such-file globlink1

>    ln -s . globlink2

> 

> Will be matched only for globlink1.


You're correct that there's a bug here, that needs to be fixed. But the proposed 
patch changes the semantics of GLOB_ONLYDIR contrary to its intent. GLOB_ONLYDIR 
is supposed to be an efficiency flag, to let callers run faster. By making 
GLOB_ONLYDIR run slower, the patch is contrary to this expectation.

The bug needs to be fixed, but not this way. It's the caller's responsibility to 
check that the returned value is indeed a directory, and that's how the fix 
should be written: by fixing the caller to check that the result is indeed a 
directory, not by slowing down the callee.

>> Also, if readdir_result_type (d) is DT_UNKNOWN, d.name might be a directory, so the code should check in that case too.

> 

> I excluded DT_UNKNOWN mainly because posix/tst-gnuglob.c uses a for the synthetic

> directory to tests some entries as DT_UNKNOWN and it does not expect them to

> be matches.


That sounds like a bug in the test case, then. DT_UNKNOWN is returned for file 
systems that do not have the file type in the inode. A DT_UNKNOWN entry might be 
a directory, or a symlink, or anything else.

> If we change to also match DT_UNKNOWN for possible directories

> entries it would meant change slight current glob semantic.


How so? DT_UNKNOWN merely means that glob can't do some optimizations. This 
should not change the semantics, only performance.

>> Also, d.name is just a file name component: it should be absolute, or relative to the working directory.

> 

> Not really sure what this may lead to issue in current patch, care to

> elaborate?


As I recall, d.name is relative to the parent directory: it has no slashes. The 
argument to is_dir is supposed to be the full filename, either absolute or 
relative to the current working directory.

>> Also, the code mistakenly treats leading "//" as if it were "/"; this is not correct, and does not work on some platforms. POSIX allows implementations where "//" is distinct from "/".

> 

> Indeed, although for *glibc* supported platforms this is not an issue.


Fair enough, we can fix that separately.

> Using either 'globlink[12]//' or 'globlink[12]///' will handle the same

> results as 'globlink[12]/'.


Every result returned by globlink[12]/// should end in three slashes. The number 
of  slashes in the result should equal the number of slashes in the pattern. 
This general issue differs from the issue about // vs / at the start of the file 
name, as the trailing slashes are portable to all filesystems.

Patch

diff --git a/posix/glob.c b/posix/glob.c
index c699177..391aa9c 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -128,36 +128,58 @@  readdir_result_type (struct readdir_result d)
 # define GL_READDIR(pglob, stream) ((pglob)->gl_readdir (stream))
 #endif
 
-/* Extract name and type from directory entry.  No copy of the name is
-   made.  If SOURCE is NULL, result name is NULL.  Keep in sync with
-   convert_dirent64 below.  */
-static struct readdir_result
-convert_dirent (const struct dirent *source)
+
+union glob_stat
 {
-  if (source == NULL)
-    {
-      struct readdir_result result = { NULL, };
-      return result;
-    }
-  struct readdir_result result = READDIR_RESULT_INITIALIZER (source);
-  return result;
+  struct stat st;
+  struct_stat64 st64;
+};
+
+static int
+glob_lstat (glob_t *pglob, int flags, const char *fname,
+	    union glob_stat *ust)
+{
+  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
+	 ? (*pglob->gl_lstat) (fname, &ust->st)
+	 : __lstat64 (fname, &ust->st64));
+}
+
+static void *
+glob_opendir (glob_t *pglob, int flags, const char *directory)
+{
+  return (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
+	  ? (*pglob->gl_opendir) (directory)
+	: opendir (directory));
 }
 
-#ifndef COMPILE_GLOB64
-/* Like convert_dirent, but works on struct dirent64 instead.  Keep in
-   sync with convert_dirent above.  */
 static struct readdir_result
-convert_dirent64 (const struct dirent64 *source)
+glob_readdir (glob_t *pglob, int flags, void *stream)
 {
-  if (source == NULL)
+  if (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0))
     {
-      struct readdir_result result = { NULL, };
-      return result;
+      const struct dirent *src = GL_READDIR (pglob, stream);
+      if (src == NULL)
+	return (struct readdir_result) { NULL, };
+      return (struct readdir_result) READDIR_RESULT_INITIALIZER (src);
     }
-  struct readdir_result result = READDIR_RESULT_INITIALIZER (source);
-  return result;
-}
+#ifdef COMPILE_GLOB64
+  const struct dirent *source = __readdir (stream);
+#else
+  const struct dirent64 *source = __readdir64 (stream);
 #endif
+  if (source == NULL)
+    return (struct readdir_result) { NULL, };
+  return (struct readdir_result) READDIR_RESULT_INITIALIZER (source);
+}
+
+static void
+glob_closedir (glob_t *pglob, int flags, void *stream)
+{
+  if (__glibc_unlikely (flags & GLOB_ALTDIRFUNC))
+    (*pglob->gl_closedir) (stream);
+  else
+    closedir (stream);
+}
 
 #ifndef _LIBC
 /* The results of opendir() in this file are not used with dirfd and fchdir,
@@ -271,6 +293,8 @@  glob (const char *pattern, int flags, int (*errfunc) (const char *, int),
   size_t oldcount;
   int meta;
   int dirname_modified;
+  /* Indicate if the directory should be prepended on return values.  */
+  bool dirname_prefix = true;
   int malloc_dirname = 0;
   glob_t dirs;
   int retval = 0;
@@ -494,6 +518,10 @@  glob (const char *pattern, int flags, int (*errfunc) (const char *, int),
       dirname = (char *) "/";
       dirlen = 1;
       ++filename;
+      /* prefix_array adds a separator for each result and DIRNAME is
+	 already '/'.  So we indicate later that we should not prepend
+	 anything for this specific case.  */
+      dirname_prefix = false;
     }
   else
     {
@@ -1085,7 +1113,7 @@  glob (const char *pattern, int flags, int (*errfunc) (const char *, int),
       if (dirlen > 0)
 	{
 	  /* Stick the directory on the front of each name.  */
-	  if (prefix_array (dirname,
+	  if (prefix_array (dirname_prefix ? dirname : "",
 			    &pglob->gl_pathv[old_pathc + pglob->gl_offs],
 			    pglob->gl_pathc - old_pathc))
 	    {
@@ -1166,11 +1194,6 @@  prefix_array (const char *dirname, char **array, size_t n)
   size_t dirlen = strlen (dirname);
   char dirsep_char = '/';
 
-  if (dirlen == 1 && dirname[0] == '/')
-    /* DIRNAME is just "/", so normal prepending would get us "//foo".
-       We want "/foo" instead, so don't prepend any chars from DIRNAME.  */
-    dirlen = 0;
-
 #if defined __MSDOS__ || defined WINDOWS32
   if (dirlen > 1)
     {
@@ -1250,11 +1273,7 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
     }
   else if (meta == GLOBPAT_NONE)
     {
-      union
-      {
-	struct stat st;
-	struct_stat64 st64;
-      } ust;
+      union glob_stat ust;
       size_t patlen = strlen (pattern);
       size_t fullsize;
       bool alloca_fullname
@@ -1273,10 +1292,7 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
       mempcpy (mempcpy (mempcpy (fullname, directory, dirlen),
 			"/", 1),
 	       pattern, patlen + 1);
-      if (((__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
-	   ? (*pglob->gl_lstat) (fullname, &ust.st)
-	    : __lstat64 (fullname, &ust.st64))
-	   == 0)
+      if (glob_lstat (pglob, flags, fullname, &ust) == 0
 	  || errno == EOVERFLOW)
 	/* We found this file to be existing.  Now tell the rest
 	   of the function to copy this name into the result.  */
@@ -1287,9 +1303,7 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
     }
   else
     {
-      stream = (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0)
-		? (*pglob->gl_opendir) (directory)
-		: opendir (directory));
+      stream = glob_opendir (pglob, flags, directory);
       if (stream == NULL)
 	{
 	  if (errno != ENOTDIR
@@ -1305,19 +1319,7 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 
 	  while (1)
 	    {
-	      struct readdir_result d;
-	      {
-		if (__builtin_expect (flags & GLOB_ALTDIRFUNC, 0))
-		  d = convert_dirent (GL_READDIR (pglob, stream));
-		else
-		  {
-#ifdef COMPILE_GLOB64
-		    d = convert_dirent (__readdir (stream));
-#else
-		    d = convert_dirent64 (__readdir64 (stream));
-#endif
-		  }
-	      }
+	      struct readdir_result d = glob_readdir (pglob, flags, stream);
 	      if (d.name == NULL)
 		break;
 
@@ -1332,6 +1334,17 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 
 	      if (fnmatch (pattern, d.name, fnm_flags) == 0)
 		{
+		  /* We need to certify that the link is a directory.  */
+		  if (flags & GLOB_ONLYDIR && readdir_result_type (d) == DT_LNK)
+		    {
+		      void *ss = glob_opendir (pglob, flags, d.name);
+		      if (ss == NULL)
+			{
+			  glob_closedir (pglob, flags, ss);
+			  continue;
+			}
+		    }
+
 		  if (cur == names->count)
 		    {
 		      struct globnames *newnames;
@@ -1452,10 +1465,7 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
   if (stream != NULL)
     {
       save = errno;
-      if (__glibc_unlikely (flags & GLOB_ALTDIRFUNC))
-	(*pglob->gl_closedir) (stream);
-      else
-	closedir (stream);
+      glob_closedir (pglob, flags, stream);
       __set_errno (save);
     }
 
diff --git a/posix/globtest.sh b/posix/globtest.sh
index 73f7ae3..92a8e37 100755
--- a/posix/globtest.sh
+++ b/posix/globtest.sh
@@ -242,6 +242,43 @@  if test $failed -ne 0; then
   result=1
 fi
 
+# Test NOCHECK for specific cases where the pattern used starts
+# with '/' (BZ#10246).
+failed=0
+${test_program_prefix} \
+${common_objpfx}posix/globtest -c "$testdir" "/%" |
+sort > $testout
+cat <<"EOF" | $CMP - $testout >> $logfile || failed=1
+`/%'
+EOF
+if test $failed -ne 0; then
+  echo "No check test failed" >> $logfile
+  result=1
+fi
+
+${test_program_prefix} \
+${common_objpfx}posix/globtest -c "$testdir" "//%" |
+sort > $testout
+cat <<"EOF" | $CMP - $testout >> $logfile || failed=1
+`//%'
+EOF
+if test $failed -ne 0; then
+  echo "No check test failed" >> $logfile
+  result=1
+fi
+
+${test_program_prefix} \
+${common_objpfx}posix/globtest -c "$testdir" "///%" |
+sort > $testout
+cat <<"EOF" | $CMP - $testout >> $logfile || failed=1
+`///%'
+EOF
+if test $failed -ne 0; then
+  echo "No check test failed" >> $logfile
+  result=1
+fi
+
+
 # Test NOMAGIC without magic characters
 failed=0
 ${test_program_prefix} \
diff --git a/posix/tst-glob_symlinks.c b/posix/tst-glob_symlinks.c
index 5c4b4ec..c953204 100644
--- a/posix/tst-glob_symlinks.c
+++ b/posix/tst-glob_symlinks.c
@@ -131,5 +131,11 @@  do_test (void)
   TEST_VERIFY_EXIT (strcmp (gl.gl_pathv[0], dangling_link) == 0);
   globfree (&gl);
 
+  /* glob should not try to match link the dangling directories.  */
+  snprintf (buf, sizeof buf, "/tmp/dangling-symlink-dir-tst-glob*/");
+  fprintf (stderr, "%s\n", buf);
+  TEST_VERIFY_EXIT (glob (buf, 0, NULL, &gl) == GLOB_NOMATCH);
+  globfree (&gl);
+
   return 0;
 }