threads/95127: MySQL 4.1 SIGBUS failiure with FreeBSD 6.0

Glenn Nielsen glenn at more.net
Thu Mar 30 21:20:24 UTC 2006


>Number:         95127
>Category:       threads
>Synopsis:       MySQL 4.1 SIGBUS failiure with FreeBSD 6.0
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-threads
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 30 21:20:22 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Glenn Nielsen
>Release:        6.0-RELEASE-p5
>Organization:
MOREnet
>Environment:
FreeBSD roadrash.spg.more.net 6.0-RELEASE-p5 FreeBSD 6.0-RELEASE-p5 #1: Wed Mar 22 16:42:04 CST 2006     root at roadrash.spg.more.net:/usr/obj/usr/src/sys/QUOTAS_SMP  i386
>Description:
There are two different ways I have found to cause a signal 10, sigbus error
when using native threads with Mysql 4.1.

Here is information about the first, the second was found while trying
to reproduce this bug. The second is documented further down.

MySQL 4.1 SIGBUS failiure with FreeBSD 6.0

We recently put four FreeBSD 6.0 servers with MySQL 4.1 into
production. MySQL on three of these servers is moderately busy.
Each requiring between 60-200 threads to handle anywhere from 6-40
queries per second during business hours. One is a Master DB,
the other two are RO Slaves. The fourth where we haven't seen
any problems is a hotbackup slave.

Since going into production three days ago the Master DB has
failed twice. The RO Slaves are failing 1-4 times per hour.
In all cases the failure is a signal 10 for a SIGBUS error.
The mysqld_safe script successfully restarts mysqld after each
failure.

I enabled core files with the "--core-file" argument to the
start script.

I have examined six core files now. All look very similar.

Core was generated by `mysqld'.
Program terminated with signal 10, Bus error.
start script.

(gdb) bt
#0  0x285292b7 in pthread_testcancel () from /usr/lib/libpthread.so.2
#1  0x285190a2 in sigaction () from /usr/lib/libpthread.so.2
#2  0x2851318d in pthread_kill () from /usr/lib/libpthread.so.2
#3  0x0815a1bf in ?? ()
#4  0x0841b000 in ?? ()
#5  0x0000000a in ?? ()
#6  0xbfbfdee8 in ?? ()
#7  0x0000000a in ?? ()
#8  0xbfbfe230 in ?? ()
#9  0x0000000a in ?? ()
#10 0xbfbfdef8 in ?? ()
#11 0x080aaf95 in ?? ()
#12 0x0000000a in ?? ()
#13 0x0835b8da in ?? ()
#14 0x000403fb in ?? ()
#15 0x00000000 in ?? ()
#16 0xbfbfdf30 in ?? ()
#17 0x2852c4b4 in ?? () from /usr/lib/libpthread.so.2
#18 0xbfbfdf28 in ?? ()
#19 0x28517252 in sigaction () from /usr/lib/libpthread.so.2

(gdb) info threads
* 166 Thread 0x841b000 (LWP 100330)  0x285292b7 in pthread_testcancel () from
+/usr/lib/libpthread.so.2
  165 Thread 0x841bc00 (LWP 100188)  0x28529277 in pthread_testcancel () from
+/usr/lib/libpthread.so.2
  164 Thread 0x841be00 (LWP 100206)  0x28529337 in pthread_testcancel () from
+/usr/lib/libpthread.so.2
  163 Thread 0xb22a400 (LWP 100205)  0x28573833 in read () from /lib/libc.so.6
  162 Thread 0xb22a600 (LWP 100175)  0x28529277 in pthread_testcancel () from
+/usr/lib/libpthread.so.2
  161 Thread 0xb22a800 (LWP 100211)  0x28529277 in pthread_testcancel () from
+/usr/lib/libpthread.so.2
  160 Thread 0xb22aa00 (LWP 100204)  0x28573833 in read () from /lib/libc.so.6
  159 Thread 0xb22ac00 (LWP 100257)  0x28573833 in read () from /lib/libc.so.6
  158 Thread 0xb22ae00 (LWP 100258)  0x28573833 in read () from /lib/libc.so.6
  157 Thread 0xb2eb000 (LWP 100259)  0x28573833 in read () from /lib/libc.so.6
  ...

In every case the SIGBUS error is occurring when mysql is killing off
some threads.

MySQL has a varible for setting the thread_cache_size. By default it is 0,
meaning mysql creates threads as needed and kills off threads which are no
longer needed almost immediately.

We configured the thread_cache_size=50 so that mysql would maintain a pool
of threads to reduce the frequency at which it did a thread kill. This has
reduced the frequency of failures down to less than once an hour.

We are using mysql-server-4.1.18_2 from ports with native threads.

Here is some information on the system hardware and kernel config.

vai and blackmore:

Dell PowerEdge 1850
2x2.8 Ghz P4
1024 MB
2 x 36 GB disks (RAID1)

FreeBSD vai.kinetic.more.net 6.0-RELEASE FreeBSD 6.0-RELEASE #0: Mon Jan 30 15:40:24 CST 2006
FreeBSD blackmore.kinetic.more.net 6.0-RELEASE-p5 FreeBSD 6.0-RELEASE-p5 #1: Tue Mar 21 13:44:15 CST 2006

hendrix and satriani:

Dell PowerEdge 2650
2 x 2.8 Ghz P4
2048 MB
2 x 36 GB disks (RAID1)

FreeBSD satriani.kinetic.more.net 6.0-RELEASE-p4 FreeBSD 6.0-RELEASE-p4 #0: Tue Mar  7 14:32:43 CST 2006

FreeBSD hendrix.kinetic.more.net 6.0-RELEASE-p5 FreeBSD 6.0-RELEASE-p5 #0: Wed Mar 15 10:54:10 CST
+2006

The kernel on both platforms is the same (we include GENERIC and add
a couple of options):

# QUOTAS_SMP

include GENERIC

ident           MOREnet-SMP

maxusers        0

options        IPFILTER                #ipfilter support
options        IPFILTER_LOG            #ipfilter logging

# To make an SMP kernel, the next line is needed
options         SMP                     # Symmetric MultiProcessor
Kernel

options        QUOTA

Hyper Threading is disabled in the BIOS on all four servers.

On a test box I was able to reproduce the above. I found a second way to
generate the signal 10, sigbus error, here is the backtrace:

#0  0x28524619 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2
#1  0x28524985 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2
#2  0x2851443c in pthread_create () from /usr/lib/libpthread.so.2
#3  0x080af118 in ?? ()
#4  0xbf8fdf3c in ?? ()
#5  0x0840762c in __isthreaded ()
#6  0x080aa810 in ?? ()
#7  0x0000000f in ?? ()
#8  0x00000000 in ?? ()
#9  0x00000000 in ?? ()
#10 0x00000000 in ?? ()
#11 0x0000000f in ?? ()
#12 0x33343231 in ?? ()
#13 0x00000a39 in ?? ()
#14 0x00000000 in ?? ()
#15 0x00000000 in ?? ()
#16 0x00000000 in ?? ()
#17 0x00000000 in ?? ()
#18 0x00000000 in ?? ()
#19 0x00000000 in ?? ()
#20 0x00026005 in ?? ()
#21 0x00000000 in ?? ()
#22 0x00000000 in ?? ()
#23 0x00000000 in ?? ()
#24 0x0000000a in ?? ()
#25 0xbf8fdf94 in ?? ()
#26 0x00000000 in ?? ()
#27 0x2852c4b4 in ?? () from /usr/lib/libpthread.so.2
#28 0x00000002 in ?? ()
#29 0x00000000 in ?? ()
#30 0xbf8fdfec in ?? ()
#31 0x28521dac in pthread_mutexattr_init () from /usr/lib/libpthread.so.2

(gdb) info threads
  4 Thread 0x841b000 (runnable)  0x285730b3 in select () from /lib/libc.so.6
  3 Thread 0x841b800 (LWP 100193)  0x28529277 in pthread_testcancel ()
   from /usr/lib/libpthread.so.2
* 2 Thread 0x841ba00 (LWP 100266)  0x28524619 in pthread_mutexattr_init ()
   from /usr/lib/libpthread.so.2
  1 Thread 0x9a24200 (LWP 100269)  0x28529277 in pthread_testcancel ()
   from /usr/lib/libpthread.so.2




>How-To-Repeat:
Here is a perl script named "testbug.pl" which I used to reproduce each of the bugs.

Start the script as follows to reproduce the first bug:
testbug.pl 1
Start the script as follows to reproduce the second bug:
testbug.pl 0

#!/usr/local/bin/perl

use DBI;

$sleep = $ARGV[0];
$sleep = 0 unless $sleep =~ /^\d+$/;

$forknum = 0;
$ppid = $$;

$account = "";
$passwd = "";
$dbname = "test";
$hostname = "localhost";
$port = 3306;

$dsn = "DBI:mysql:database=$dbname;host=$hostname;port=$port";
$dbh = DBI->connect($dsn, $account, $passwd, {'RaiseError' => 1});
$sth = $dbh->prepare("DROP TABLE IF EXISTS testbug");
$sth->execute;
$sth->finish;
$sth = $dbh->prepare("CREATE TABLE testbug ( id int(10), stuff varchar(255) )");
$sth->execute;
$sth->finish;

while ($forknum < 150) {
  $forknum++;
  fork();
  if ($$ != $ppid) {
    longtest();
  }
  sleep($sleep);
}
while ($forknum < 300) {
  $forknum++;
  fork();
  if ($$ != $ppid) {
    shorttest();
  }
  sleep($sleep);
}

exit;

sub longtest {
  my $dbh = DBI->connect($dsn, $account, $passwd, {'RaiseError' => 1});
  my $sth1 = $dbh->prepare("INSERT INTO testbug values(?,?)");
  my $sth2 = $dbh->prepare("SELECT * FROM testbug where id=?");

  $sth1->execute($$,$$ . "-" . $forknum);
  $sth1->finish;
  for (0..30) {
    sleep(10);
    $sth2->execute($PID);
    ($pid,$stuff) = $sth2->fetchrow_array;
    $sth2->finish;
  }
  $dbh->disconnect;
  exit;
}

sub shorttest {
  for(0..200) {
    sleep(1);
    doshorttest();
  }
  exit;
}

sub doshorttest {
  my $dbh = DBI->connect($dsn, $account, $passwd, {'RaiseError' => 1});
  my $sth = $dbh->prepare("SELECT * FROM testbug where id=?");
  $sth->execute($PID-100);
  if ($sth->rows > 0) {
    ($pid,$stuff) = $sth2->fetchrow_array;
  }
  $sth->finish;
  $dbh->disconnect;
}

Here is the my.cnf configuration:
[mysqld]
user=mysql
port=3306
socket=/tmp/mysql.sock
tmpdir=/var/db/mysql/tmp
#log=/var/db/mysql/roadrash_debug.log
set-variable = key_buffer=8m
set-variable = tmp_table_size=8m
set-variable = sort_buffer=8m
set-variable = record_buffer=1m
set-variable = max_connect_errors=1000
set-variable = max_connections=600
set-variable = max_allowed_packet=2M
skip-innodb
warnings

Here is the /etc/rc.conf
mysql_enable="YES"
mysql_args="--tmpdir=/var/db/mysql/tmp --core-file"

>Fix:
Building mysql with linux threads fixes the problem.
The mysql41-server port uses FreeBSD native threads by default.
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-threads mailing list