closedir(3) handling NULL

Thu Jan 30 00:52:12 UTC 2014

On Fri, Jan 24, 2014 at 11:39:18PM +0100, Jilles Tjoelker wrote:
> On Fri, Jan 24, 2014 at 03:34:04PM -0600, Bryan Drewery wrote:
> > On Sat, Jan 25, 2014 at 06:00:08AM +1100, Bruce Evans wrote:
> > > On Fri, 24 Jan 2014, Bryan Drewery wrote:
> > > > I do think that improving portability is important. Even against
> > > > sloppy coding. Applications developed for Linux are fine passing
> > > > NULL to closedir(3), which leads to a style of coding that does
> > > > not reveal itself to be a problem on FreeBSD until an edge case
> > > > comes up.
> 
> > > This unimproves portability.  FreeBSD intentionally does the
> > > opposite for fclose(): from fclose(3):
> 
> > IMHO we should handle NULL gracefully in all places instead of having
> > hidden surprises.
> 
> In many cases (but possibly not this one), handling NULL "gracefully"
> makes it harder to debug the inevitable failure. For example, if
> readdir() did not crash when passed a null pointer, a program that fails
> to check opendir()'s return value would plough on for longer and it
> would be harder to find the problem.

So I've thought about this more and I don't think closedir(NULL)
or fclose(NULL) actually hide any issue except subjective sloppy
code. I still point out that free(NULL) is fine, so it seems
consistent to do the same on other deallocation functions. When I
say we should handle NULL gracefully I really only mean in
deallocations. Thus realpath(3) and time(3) are not in my suggestion.
(It is odd that it accepts NULL though). I do realize that free(NULL)
is a different standard, but that is not relevant from the perspective
of the developer writing the code, or one who is used to Linux
behavior and porting to FreeBSD.  If they are writing it to the
standards they will assume nothing and not call closedir(NULL). But
if they come from Linux they may have people around them reviewing
their code asking why they are checking for NULL before closedir(3)
and fclose(3). I'm not sure what the common knowledge is on it. I
did find that glibc has accepted closedir(NULL) since 1995, and
fclose(NULL) since 1997*.

POSIX states that [EBADF] "may" return on a dirp not referring to
an open stream.  NULL doesn't refer to an open stream, which fits.
However, due to not being able to easily implement closedir(random)
returning [EBADF] it makes sense to me to not return [EBADF] on
NULL as then we are essentially claiming to support that clause.

I noticed that Linux manpage claims to support [EBADF] for invalid
directory stream. I dug into glibc and found that it is just returning
[EBADF] from close(2). Ours does the same as Jilles pointed out.
So technically it can be returned already and we should probably
document that. Incidentally fclose.3 used to document [EBADF] and
no longer does since r40178, despite being a possible error.  And
note that fpurge(3) uses the same logic for returning [EBADF] as
fclose(3) and still has it documented in fpurge.3.  We do keep this
documented in fflush.3 (a different check is done to determine as
fclose(3) does). Undocumented features/flags are doc bugs.

As closedir(NULL) is undefined, then I see it as up to us to decide
how to deal with it. The consensus seems to be that it should crash,
but I am surprised by that and confused why that position is defended.

The purpose of POSIX is to improve portability and lessen the amount
of differences in systems right? We try to be a POSIX compliant
system.  Since POSIX does not define how NULL should be handled,
any choice we pick is still compliant. Right?

If this is not a standards issue, should I take this to arch@?

I would make this argument more about portability with Linux, and
being consistent with free(NULL). glibc has returned [EINVAL] for
closedir(NULL) since at least 1995. glibc's fclose(NULL) would close
all streams up until 1997 when it was changed to return [EINVAL]
on NULL when fcloseall(3) was added.

As surprising as it sounds, I was under the impression free(NULL)
would crash until recently, and I've been writing C for 15 years
or so. Somehow I picked up an application and they had it as their
style and I must have done a free(ptr->obj) at some point and had
a NULL ptr and learned the wrong lesson. So now that I realize
free(NULL) is OK, it is strange to me that we are not OK with other
functions working the same.

Let me lay out some examples here to try to understand why fclose(NULL)
or closedir(NULL) would hinder debugging.

Case 1:

  DIR *dirp;
  struct dirent *dp;

  dirp = opendir("/nonexistent");
  dp = dirent(dirp); /* SIGSEGV, closedir(NULL) never reached */
  closedir(dirp);
  /* Sloppy code gets expected poor results on failures. */

Case 2:

  DIR *dirp;
  struct dirent *dp;

  dirp = opendir("/etc/passwd");
  dp = dirent(dirp);
  closedir(dirp);
  /* Sloppy, but all fine. Similar to case 1, it would crash on line 5
     if file not found. */

Case 3:

  DIR *dirp;
  struct dirent *dp;

  if ((dirp = opendir("/nonexistent")) != NULL) {
    dp = dirent(dirp);
    /* ... */
  }
  closedir(dirp);

  /* All is fine, code does not and should not crash. If we kept
     closedir(NULL) as crashing, the "fix" would be to move closedir()
     up 1 line. It's a hidden surprise. (POLA?) */

Case 4:

  DIR *dirp;
  struct dirent *dp;

  if ((dirp = opendir("/etc/passwd")) != NULL) {
    dp = dirent(dirp);
    /* ... */
  }
  closedir(dirp);

  /* All is fine. */

Case 5:

  DIR *dirp;
  struct dirent *dp;

  if ((dirp = opendir("/etc/passwd")) != NULL) {
    dp = dirent(dirp);
    /* ... */
    closedir(dirp);
  }

  /* All is fine; unaffected by proposed change */

Did I miss a case? For closedir(2) it seems irrelevant for debugging
to crash.

I've reviewed all consumers of DIR* and FILE* in lib/libc for object
dereference. Except for the exceptions below, all do dereference
the object passed to them. It would be strange if they did not.
Thus the program would still crash, and much earlier than closedir(NULL)
or fclose(NULL) would.

Exceptions:

fflush(3) accepts NULL and will flush all buffers if used. I don't
see this as a big deal if somehow it is accidentally passed a NULL.
An earlier call with fwrite(3) etc. should have already crashed
anyhow.

freopen(NULL) will fclose(NULL) and return an error if passed
improper modes. It returns NULL and EINVAL. A subsequent call to
use the FILE* will fail.

setvbuf(NULL) can silently ignore NULL and return EOF if passed
invalid arguments. Invalid mode or size<0. Otherwise it properly
dereferences fp.

The only one I see as having the "keep going" problem is freopen(3),
but even that is when passing in invalid modes and would be pretty
rare.  It's very likely that the next line of code would be trying
to use that object somehow and would then crash anyway.

So, I believe the debugging and "purposeful crash" argument is bogus
and should be removed either way from fclose.3. Perhaps it's *really*
a leftover behavior from fclose(NULL)->fcloseall(). I am not sure
if our fclose(3) behaved like that ever. The BSD4.4 import did not,
but Bruce mentions it and glibc used to behave like that. Regardless,
I can see how that perhaps would cause an invalid behavior, closing
all buffers when just 1 failed to open. But that is very historical.

> 
> Whether a function that frees something silently does nothing when
> passed a null pointer is indeed inconsistent.
> 
> -- 
> Jilles Tjoelker
> _______________________________________________
> freebsd-standards at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-standards
> To unsubscribe, send any mail to "freebsd-standards-unsubscribe at freebsd.org"