svn commit: r237286 - head/lib/libc/gen

Thu Jun 21 02:38:56 UTC 2012

On Wed, 20 Jun 2012, Xin Li wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 06/20/12 14:23, Colin Percival wrote:
>> On 06/20/12 14:15, Warner Losh wrote:
>>> On Jun 20, 2012, at 10:36 AM, Colin Percival wrote:
>>>> On 06/20/12 09:27, Bruce Evans wrote:
>>>>> On Wed, 20 Jun 2012, Eitan Adler wrote:
>>>>>> Log: Don't close an uninitialized descriptor. [1] Add a
>>>>>> sanity check for the validity of the passed fd.
>>>>>
>>>>> Library functions shouldn't use assert() or abort().
>>>>
>>>> Why not?

Because correctly designed APIs handle errors by returning an error
status.  Less correctly designed APIs might handle errors by dumping
core, but then they have to document this.

>>> We've tried to avoid things that make the library dump core...  >>
>> You mean, we avoid it except in the places where we don't?  It
>> seems to me that dumping core is exactly the right way to handle a
>> "can't ever happen" situation inside libc -- just like the ~250
>> instances of assert() in jemalloc.

Doesn't inspire confidence.  I missed this because jemalloc doesn't
include assert.h.  It rolls its own assert() "to reduce the chance
of deadlock during assertion failure".  The library assert is indeed
unusable in malloc() since it calls stdio, which may call malloc().
(It is a standard bug that assert() must write on stderr, since
write() and STDERR_FILENO are not available in plain STDC.  stderr
is under more direct control of the application than is STDERR_FILENO.
Although it is normally unbuffered, the application may set it into
some mode that is buffered with the buffer not yet allocated, or
otherwise uses malloc().  jemalloc's assert() avoids this problem by
writing to STDERR_FILENO).  But even in jemalloc, the assert()s are
in debugging code and are turned off in production.  It is apparently
not ready for production, since MALLOC_PRODUCTION is apparently never
configured.  MALLOC_PRODUCTION also turns off the default malloc options
of AJ.  The default A option gives much larger standards breakage and
core dumps, by aborting for all warnings.  Not configuring
MALLOC_PRODUCTION also results in configuring MALLOC_DEBUG and
MALLOC_STATS.  These result in malloc() using stdio in other ways,
although it has mounds of code to avoid this in the production version.

>> If you mean "passing an invalid parameter to a library function
>> shouldn't result in a core dump", I agree -- but that's not the
>> case here.

Any undefined behaviour can cause a core dump, even when it is an
invalid parameter passed to a library function (examples are null
pointers passed to string and memory functions).  However, the library
should be more carefully written than applications, so it shouldn't
have any undefined behaviours, and if it actually checks for undefined
behaviours then it should try not to abort when it finds one.

> But malloc() is a rare place that we typically consider as "low level"
> enough where, no better remedies are provided from API prospective --
> there is nothing better than crashing the program immediately, since
> that would likely to lead us to where the smoking gun is.  Library
> procedures normally detect and report errors, but don't handle them
> like this.
>
> Also, as Bruce pointed out, it's a case that can never happen and thus
> the explicit assert is just a waste of space.

The only good think about the assert() in closelog() is that it detects
a problem (that can't happen) in some cases and aborts before destroying
the evidence.

Here are all uses of LogFile in syslog.c.  This shows that the assert()
is even more bogus than I thought (since there are several closes of
LogFile, but only an assert() for the recently changed one).  Of course
there are no assert()s for critical uses of LogFile.

% static int	LogFile = -1;		/* fd for log */
% ...
% 	if (send(LogFile, tbuf, cnt, 0) < 0) {
% ...
% 			if (send(LogFile, tbuf, cnt, 0) >= 0) {
% ...

These use LogFile without checking that it is not -1.  This error can
happen -- see below.

% static void
% disconnectlog(void)
% {
% 	/*
% 	 * If the user closed the FD and opened another in the same slot,
% 	 * that's their problem.  They should close it before calling on
% 	 * system services.
% 	 */
% 	if (LogFile != -1) {
% 		_close(LogFile);
% 		LogFile = -1;
% 	}
% 	status = NOCONN;			/* retry connect */
% }

disconnect() is very similar to closelog().  It already had the check to
avoid the unnecessary close when LogFile has not been allocated, but doesn't
have an assert().

% ...
% static void
% connectlog(void)
% {
% 	struct sockaddr_un SyslogAddr;	/* AF_UNIX address of local logger */
% 
% 	if (LogFile == -1) {
% 		if ((LogFile = _socket(AF_UNIX, SOCK_DGRAM, 0)) == -1)
% 			return;
% 		(void)_fcntl(LogFile, F_SETFD, FD_CLOEXEC);
% 	}
% 	if (LogFile != -1 && status == NOCONN) {

Bogus check that LogFile != -1.  If LogFile was -1, then we just called
_socket() succeessfully to make it different from -1, or returned if
_socket() failed.

% ...
% 		if (_connect(LogFile, (struct sockaddr *)&SyslogAddr,
% ...
% 			if (_connect(LogFile, (struct sockaddr *)&SyslogAddr,
% ...
% 			if (_connect(LogFile, (struct sockaddr *)&SyslogAddr,
% ...
% 		if (status == NOCONN) {
% 			(void)_close(LogFile);
% 			LogFile = -1;

connectlog() also returns without allocating LogFile here.

% 		}
% ...
% 	}
% }

Note that we don't bogusly assert() that LogFile >= 0 before we use it.
We don't even check that it is not -1 before using it.  And this seems
to be an error that can actually happen.  The 2 _send()s in the above
are done soon after calling connectlog().  But connectlog() returns
void, and doesn't abort() when it fails to allocate LogFile.  It just
leaves or resets LogFile to -1 and lets the _send()s fail.

% ...
% void
% closelog(void)
% {
% 	THREAD_LOCK();
% 	assert(LogFile >= -1);

The bogus assert().  We check for an error that can't happen (_socket()
returning a negative value that is not -1, or something clobbering
LogFile to a negative value that is not -1).  We don't and can't check
for another error that can't happen (something clobbering LogFile to
a _non-negative_ but still invalid or conflicting value).

% 	if (LogFile != -1) {
% 		(void)_close(LogFile);
% 		LogFile = -1;
% 	}
% 	LogTag = NULL;
% 	status = NOCONN;
% 	THREAD_UNLOCK();
% }

Since closelog() doesn't really use LogFile, handling the case that
can't happen (LogFile < -1) by trying the close and resetting state
was as harmless as possible.  It mainly destroys the evidence that
LogFIle (and perhaps other state) got clobbered.  But it doesn't
detect the problem as early as possible.  If you want that, then
you need to check LogFile every time before it is used.  Don't forget
to check for parity errors in the CPU before every instruction too.

Bruce