bin/125185: csh(1) exit on signal 11
Nate Eldredge
neldredge at math.ucsd.edu
Mon Jul 28 08:40:05 UTC 2008
The following reply was made to PR bin/125185; it has been noted by GNATS.
From: Nate Eldredge <neldredge at math.ucsd.edu>
To: bug-followup at FreeBSD.org, 666.root at gmail.com
Cc:
Subject: Re: bin/125185: csh(1) exit on signal 11
Date: Mon, 28 Jul 2008 01:18:38 -0700 (PDT)
I tracked this down. Here is the explanation as I understand it.
The traceback from the segfault is as follows, for the record:
#0 0x000000080096cd1e in malloc () from /lib/libc.so.7
#1 0x000000080096cfee in free () from /lib/libc.so.7
#2 0x0000000000448066 in sfree (p=0x427e46)
at /usr/src/bin/csh/../../contrib/tcsh/tc.alloc.c:562
#3 0x0000000000450e79 in bb_cleanup (xbb=0x7fffffffdf70)
at /usr/src/bin/csh/../../contrib/tcsh/tc.str.c:521
#4 0x000000000040d450 in cleanup_until (last_var=0x57b730)
at /usr/src/bin/csh/../../contrib/tcsh/sh.err.c:444
#5 0x0000000000406423 in process (catch=1)
at /usr/src/bin/csh/../../contrib/tcsh/sh.c:2027
#6 0x0000000000404f5f in main (argc=0, argv=0x7fffffffe7d8)
at /usr/src/bin/csh/../../contrib/tcsh/sh.c:1304
However, the source of the bug is actually in the function `dobackp',
sh.glob.c:646. tcsh has a "cleanup stack", where a function can push
things to be cleaned up, and run them later. `dobackp' pushes some things
on the cleanup stack, then detects the parse error and exits by calling
stderror(). The problem is that the whole thing was being run in a
subshell started with vfork(), so the stuff appears on the parent's
cleanup stack, although they have pointers to objects that only existed
for the child. (More specifically, pointers to a piece of the (regular)
stack that is below the parent's current stack pointer, so it can get
overwritten.) When the parent eventually runs its cleanup stack bad
things happen.
If you run csh with the -F option, to use fork() instead of vfork(), it
does not crash.
It would be easy to fix this specific instance of the bug, by calling
cleanup_until() in `dobackp' before calling stderror(). Unfortunately, it
looks like there are lots of places where the code tries to exit without
cleaning up first, and it is not clear when they might be run in a vforked
subshell. Here are some possibilities:
1. Audit the whole source to find and fix all places where a function may
exit without popping the cleanup stack.
2. Set a mark on the stack as soon as vfork() returns in the child, and
add code to xexit() or something to have it pop to that mark before
exiting. I have not thought this through completely and am not sure if it
is safe.
3. Stop using vfork() altogether. tcsh should really not be using it when
there is non-trivial work for the child to do. How significant is the
extra overhead of fork() in this day and age, when we have copy-on-write?
The upstream tcsh people might also have some ideas, but a bit of Googling
did not reveal who they are.
--
Nate Eldredge
neldredge at math.ucsd.edu
More information about the freebsd-bugs
mailing list