More on MySQL -- Fatal trap 12

Tue Feb 17 14:52:29 PST 2004

Hey Everyone,

I've been trying to create a simple program to simulate
the load my production environment puts on MySQL.

I'm not sure if I'm creating exactly the same problems
as I was seeing in production (and that I've described
on this list), but I have found some pretty interesting
things.

What I seem to be seeing is a bogging down of MySQL
when new threads are being created in bursts.  This
causes MySQL to temporarily become unresponsive,
and will sometimes crash the whole system.

Here's what my test program is doing:

- Fork X number of child processes, each opening
Y number of connections to the database.

- Each child process loops through the Y connections it
has open, executing one select statement for each, then
starting over from the first.

- If a particular database connection drops, it will enter
a loop attempting to reconnect, forever.

When using 45 child processes and 20 connections for
each, everything is fine.  (900 threads)

If I bump it up to 90 children and 20 connections, I
start to see problems.  The database is unable to
serve the incoming connections fast enough, and
existing connections become slow or entirely
unresponsive.  However, if I leave it alone, eventually
things "catch up."*  That is, as the database server
slowly manages to create new threads, all of the
incoming connect requests eventually succeed
(remember, they're looping).  Once everything is
reconnected, I see 1800 threads in MySQL, and the
same query/second rate that I saw with 900 threads.

* Okay, not always.  About half of the time, once
MySQL falls behind the incoming connections, and
connect attempts start to fail, the system will crash
with a "fatal trap 12: page fault while in kernel mode"

In the X=90, Y=20 scenario (1800 threads), if the
test is allowed to continue until everything catches
up (about 5-10 minutes with KSE), I can stop and
start the test, triggering the burst of connection
attempts, but I see only a handful of connect errors.
However, if I stop and start mysql, I'll see the 10
minutes of connect errors again.

This seems to imply that somehow these threads
are being cached, or something is happening that
allows us to skip whatever bottleneck was causing
things to bog down.

Does this look like a fixable problem with KSE
to anyone on this list?

Let me know if you'd like a copy of the perl script
I've written to try out all of these things.

Kris Gale