BUG: possible NULL pointer dereference in nfs server

Bruce Evans brde at optusnet.com.au
Tue Jan 28 01:02:04 UTC 2014


On Mon, 27 Jan 2014, Rick Macklem wrote:

> Roman Divacky wrote:
>> On Sat, Jan 25, 2014 at 09:05:54PM -0500, Rick Macklem wrote:
>>> Bruce Evans wrote:
>>>> On Sat, 25 Jan 2014, Dimitry Andric wrote:
>>>>
>>>>> On 25 Jan 2014, at 01:38, Rick Macklem <rmacklem at uoguelph.ca>
>>>>> wrote:
>>>>> ...
>>>>> If it inlines this, the result looks approximately like:
>>>>>
>>>>>     1	{
>>>>>     2		fhandle_t *fhp = NULL;
>>>>>     3		struct nfslockfile *new_lfp;
>>>>>     4		int error;
>>>>>     5
>>>>>     6		if (new_stp->ls_flags & NFSLCK_OPEN) {
>>>>>     7			new_lfp = *NULL;
>>>>>     8			fhp = &new_lfp->lf_fh;
>>>>>     9		} else if (&nfh) {
>>>>>    10			fhp = &nfh;
>>>>>    11		} else {
>>>>>    12			panic("nfsrv_getlockfh");
>>>>>    13		}
>>>>>    14		error = nfsvno_getfh(vp, fhp, p);
>>>>>    15		NFSEXITCODE(error);
>>>>>    16		getlckret = error;
>>>>>    17	}
>>>>>
>>>>> The code in line 7 is the problematic part.  Since this is
>>>>> undefined,
>>>>> the compiler inserts a trap instruction here.  I think the
>>>>> problem
>>>>> Roman
>>>>> encountered is that on sparc64, there is no equivalent to x86's
>>>>> ud2
>>>>> instruction, so it inserts a call to abort() instead, and that
>>>>> function
>>>>> is not available in kernel-land.
>>>>
>>>> Compiler bug.  abort() is not available in freestanding
>>>> implementations.
>>>> The behaviour is only undefined if the null pointer is
>>>> dereferenced
>>>> at
>>>> runtime, so it doesn't include failing to link to abort() at
>>>> compile
>>>> time.
>>>>
>>>>> ...
>>>>>> Sorry, I'm not a compiler guy, so I don't know why a compiler
>>>>>> would
>>>>>> generate a trap instruction, but since new_lfpp is never NULL
>>>>>> when
>>>>>> this is executed, I don't see a problem.
>>>>>>
>>>>>> If others feel that this needs to be re-coded, please let me
>>>>>> know
>>>>>> what
>>>>>> you think the code should look like? (A test for non-NULL with
>>>>>> a
>>>>>> panic()
>>>>>> before it is used?)
>>>>>>
>>>>>> Is a trap instruction that never gets executed a problem?
>>>>>
>>>>> It's better to avoid undefined behavior in any case.  Just add
>>>>> a
>>>>> NULL
>>>>> check, that should be sufficient.
>>>>
>>>> That might only add bloat and unimprove debugging.  Since the
>>>> null
>>>> pointer
>>>> case cannot happen, it cannot be handed properly.  It can be
>>>> mishandled in
>>>> the following ways:
>>>> - return an error, so that all callers have to handle the null
>>>> pointer case
>>>>    that can't happen.  If the compiler is too smart, it will
>>>>    notice
>>>>    more
>>>>    undefined behaviour (that can't happen) in callers and "force"
>>>>    you
>>>>    to
>>>>    handle it there too
>>>> - KASSERT() that the pointer cannot be null.  Then:
>>>>    - on production systems where KASSERT() is null, this won't
>>>>    work
>>>>    around
>>>>      the compiler bug.  Use a panic() instead.  To maximize
>>>>      source
>>>>      code
>>>>      bloat, ifdef all of this.
>>>>    - when KASSERT() is not null, it will work around the compiler
>>>>    bug.
>>>>      If the case that can't happen actually happens, then this
>>>>      unimproves
>>>>      the debugging by messing up stack traces and turning
>>>>      restartable
>>>>      null pointer or SIGILL traps to non-restartable panics.
>>>>      Optimizations that replace a large block of code ending with
>>>>      a
>>>>      null pointer trap by a single unimplemented instruction
>>>>      would
>>>>      probably break restarting anyway.
>>>>
>>>> Bruce
>>>>
>>> So Roman, all I can suggest is to try adding something like:
>>>    if (new_lfpp == NULL)
>>>         panic("new_lfpp NULL");
>>> after line#6. If that makes the compiler happy, I can commit it in
>>> April. (Can't do commits before then.)
>>
>> The compiler already inserts "trap" instruction when such a condition
>> happens so this seem superfluous.
>>
> Ok, now I'm confused. I thought the problem was an "abort()" call for
> sparc64. I certainly run the code with the trap instruction in it and
> since it never gets executed, it doesn't bother me on i386.
>
>>> I agree with Bruce, but the check might be a good idea, in case a
>>> future code change introduces a bug where the function is called
>>> with
>>> new_lfpp NULL and NFSLCK_OPEN set.
>>>
>>> If this doesn't make the compiler happy, all I can suggest is to
>>> play around until you come up with something that works.
>>
>> KASSERT() doesnt communicate that it's an assert, because it can
>> just log into console a carry on. Would you be ok with this patch?
>> 
>> Index: fs/nfsserver/nfs_nfsdstate.c
>> ===================================================================
>> --- fs/nfsserver/nfs_nfsdstate.c        (revision 261037)
>> +++ fs/nfsserver/nfs_nfsdstate.c        (working copy)
>> @@ -1384,7 +1384,8 @@
>>          * If we are doing Lock/LockU and local locking is enabled,
>>          sleep
>>          * lock the nfslockfile structure.
>>          */
>> -       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags, NULL,
>> &nfh, p);
>> +       KASSERT((new_stp->ls_flags & NFSLCK_OPEN) == 0,
>> ("nfsrv_lockctrl: calling nfsrv_getlockfh with NFSLCK_OPEN"));
>> +       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags &
>> ~NFSLCK_OPEN, NULL, &nfh, p);
>>         NFSLOCKSTATE();
>>         if (getlckret == 0) {
>>                 if ((new_stp->ls_flags & (NFSLCK_LOCK |
>>                 NFSLCK_UNLOCK)) != 0 &&
>>
>>> Have fun with it, rick
>>> ps: I haven't seen this reported by tinderbox. Is the problem
>>>     specific to your setup?
>>
> Yea, so long as it includes a comment that states this is done to
> work around a stupid compiler bug.

Ugh.

The above has the following bugs:
- gross style bugs (lines longer than 80 characters)
- large code to do nothing
- would be even larger with a comment
- cannot actually work around any compiler bug involving abort(), since it
   has no effect on production kernels where KASSERT() is null.

>> It is present even in your setup :) Just "objdump -d kernel | grep
>> ud2" on kernel compiled
>> by clang.
>>
> I actually use gcc, but I believe you. I'll admit I still don't understand
> why having a trap instruction that never gets executed is a bad thing?

It isn't but trying to link to the noexistent function abort() on
arches that don't have any trap instruction is a bad thing.  According
the above, this occurs on sparc64.  It must be a gcc bug, since sparc64
doesn't have clang.  However, I couldn't get gcc on sparc64 to generate
an abort() for *NULL.  Similarly on x86 (gcc is hard to test on x86,
since it is broken (not installed) on FreeBSD cluster machines for
newer versions of FreeBSD).  clang on x86 warns that it removes *NULL,
removes *NULL, and then sometimes but not always generares ud2 for
reachable code following *NULL.  For the following test code:

% extern int flags;
% 
% int
% main(void)
% {
% 	if (flags & 1)
% 		*(char *)1 = 0;
% 	else {
% 		*(char *)0 = 0;
% 		*(char *)1 = 0;
% 	}
% }

clang generates ud2 for the second *1, but if the second *1 is removed
so that the code falls through to return, it doesn't generate ud2.
-Werror should make the warning fatal, so it is surprising that everyone
doesn't see this compiler bug.

> I can commit the above in April. If for some reason the fix is needed sooner,
> we'll need to find someone else willing to do the commit.

Bruce


More information about the freebsd-fs mailing list