apachectl gracefult causes Signal 11 crash after 6.3 to 7.0 upgrade [SOLVED]

Thu Jun 12 07:28:12 UTC 2008

On Thu, Jun 12, 2008 at 04:35:26PM +0930, Daniel O'Connor wrote:
> On Thu, 12 Jun 2008, Jeremy Chadwick wrote:
> > > I don't understand why gethostbyname() would call puts() - and why
> > > that would then crash!
> >
> > I can't explain why it's calling puts() directly either.  Bad RAM
> > could cause something bizarre like this, or a corrupt/broken binary.
> 
> Yeah.. I have rebuilt lots of stuff, although not libc.

Huh?

> This machine has build world, kernel, KDE, etc.. I am pretty sure the hardware is OK as none of the builds had an issue.

libc is part of world.  *Every* program relies (is linked with) on libc.

> > If you go the truss route, be sure to use -a -s 4096.  You'd be able
> > to see what actual string is being output via puts(), assuming it
> > gets as far as to start writing data to the fd.
> 
> Hmm I had a go with gdb but it doesn't work properly.. I got this..
> [midget 16:33] /tmp/work/usr/ports/www/apache13-modssl/work/apache_1.3.41 >sudo gdb src/httpd
> Password:
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd"...
> (gdb) run -X
> Starting program: /data/tmp/work/usr/ports/www/apache13-modssl/work/apache_1.3.41/src/httpd -X
> [New LWP 100212]
> [New Thread 0x819d300 (LWP 100212)]
> [New LWP 100212]
> suspend error: generic error
> [Switching to LWP 100212]
> Stopped due to shared library event
> (gdb) info thread
> Cannot find new threads: generic error
> (gdb) bt
> #0  0x2807fda0 in r_debug_state () from /libexec/ld-elf.so.1
> #1  0x2808367d in dlclose () from /libexec/ld-elf.so.1
> #2  0x28706164 in zend_hash_apply_deleter ()
>    from /usr/local/libexec/apache/libphp5.so
> #3  0x287063a8 in zend_hash_graceful_reverse_destroy ()
>    from /usr/local/libexec/apache/libphp5.so
> #4  0x286fc89e in zend_shutdown () from /usr/local/libexec/apache/libphp5.so
> #5  0x286bb5bf in php_module_shutdown () from /usr/local/libexec/apache/libphp5.so
> #6  0x286bb66b in php_module_shutdown_wrapper ()
>    from /usr/local/libexec/apache/libphp5.so
> #7  0x28776aaa in apache_php_module_shutdown_wrapper ()
>    from /usr/local/libexec/apache/libphp5.so
> #8  0x080524d9 in ap_clear_pool (a=0x8106010) at alloc.c:1937
> #9  0x0805f0f6 in standalone_main (argc=Variable "argc" is not available.
> ) at http_main.c:5480
> #10 0x08060c1f in main (argc=-716130182, argv=0x1) at http_main.c:5883

I can't say much about this, but I'm willing to bet it's the result of
some Apache + PHP weirdness.  I've never known gdb on FreeBSD to be as
reliable/useful as, say, on Linux or Solaris.  Always odd/strange things
happening with gdb on FreeBSD.

> I tried truss and it seemed to be taking a long time (5-10 minutes) and
> generating a lot of seemingly identical logging :(

Okay, let's backtrack here.

The OP states that he can induce a segfault of httpd when doing
"apachectl graceful".  Is that the exact problem you're seeing, or are
you seeing problems where PHP/Apache segfaults during operation?  I just
want to be clear.

If the latter, then truss "generating lots of seemingly identical
logging" is probably expected.  I'm guessing it's select() or poll()
or something related to kqueue/kevent, as it'd be waiting for I/O on the
HTTP socket.  You'd have to submit the HTTP request to the PHP script to
get it to crash.

In either case, you may have to resort to using ktrace + kdump, which
may or may not help narrow this down.

Use "ktrace -i -t+ httpd -X" (I hope that'll work; I'm not sure if
ktrace allows you to pass arguments to a command), which will start
populating a file called ktrace.out.  You should then do the "apachectl
graceful" in another window (or if the latter, submit the HTTP request),
and ktrace may exit when the segfault happens (I'm not sure about this;
it may sit there indefinitely).

In the case it doesn't exit, and you've confirmed the core happened
(check "dmesg"), you should ^C the ktrace and then do "ktrace -C" just
to be sure nothing got wedged.

You'll then have to use kdump to decode the contents of ktrace.out.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |