Memory Error using Mailman on FreeBSD. How to debug?
xfb52 at dial.pipex.com
Fri Feb 8 12:41:51 UTC 2008
Lachlan Michael wrote:
>>Real puzzler. I'm surprised not to have at least one process growing,
>>though. Maybe it's not using much CPU and you're not spotting it.
>Following you advice, as far as I can tell, the mailman qrunner process
> /usr/local/bin/python2.5 /usr/local/mailman/bin/qrunner
>is the one that crashes: all other mailman processes are unaffected. I
>couldn't see it increase much in size (maybe it went from 8.5M to 12.5M),
>then it just bombed and a new process was spawned (easy to tell by the
>large increase in PID).
All I can think us that qrunner asks for such a large amount of memory
in one go, that it bombs out without ever growing. That fits with the
ktrace output as well. Regretably, I don't think you can tell *how*
much memory was asked for. (The normal pattern with out of memory
errors is for the process to grow and grown and grow and die; but it's
not the only one).
>>Other things to try: Up the stack size
>> ulimit -s 262144
>>inside the mailman startup. Again, I've had processes in the past which
>Ok, I am going to gradually try different limits. It seems as though setting
>and so on in /boot/loader.conf will allow me to increase the limits.
>Having to reboot is a pain, though. How far can I go? 512M? (Physical
>memory is 1GB)
Certainly not more than physical memory :-) To be honest, if 256M
doesn't do it then this probably isn't the problem. I'm not
particularly hopeful that this will do it, but in your circumstance I
would try it.
At the same time, you could also increase the data size (maxdsiz?) to
1Gb (yours looks like 0.5Gb, half your physical memory).
My limit settings (also 1Gb) look like:
datasize 1048576 kbytes
stacksize 262144 kbytes
which come from trying to set 256Mb and 1024Mb in the kernel config (old
FreeBSD - no sysctls).
Keep the ulimit -a in the mailman startup script so you can confirm that
you really get these numbers.
>>Can you email a file of the size your are
>>trying not through mailman? Maybe your MTA (sendmail/postfix etc) has a
>>limit that somehow causes mailman to get this error.
>This is definitely not the case. Users can receive (and send) similar
>sized large attachments individually, so the MTA (sendmail in this case)
>is not the cause.
OK - rule that out. The ktrace showing qrunner failing a break pretty
much does that too.
>>The final suggestion is to try to trace (ktrace, strace from ports) the
>>process that is dying,
>I'll admit it is my first time to try a ktrace, but after noting which
>process it was that crashed I could identify the newly spawned PID, and
>obtained a ktrace.out (binary) and a kdump (called
>mailman_process_log.txt) when the problems occurs by sending another large
>mail attachment. I'll leave the files up for a couple of days. (Both
>files are about 2MB in size)
>Not that I can properly interpret the results, but it seems the mail file
>is completely read, but whatever happens next causes the memory error.
> 52506 python2.5 RET read 354/0x162
> 52506 python2.5 CALL break(0x8add000)
> 52506 python2.5 RET break 0
> 52506 python2.5 CALL break(0x8cc3000)
> 52506 python2.5 RET break -1 errno 12 Cannot allocate memory
The kdump output is the only useful bit, really. Your analysis seems
correct to me.
You are also getting a stack trace from python when it exits with the
"out of memory" error. ktrace is just showing python printing the stuff
- it may be that the error also ends up in a log file somewhere - don't
know where mailman logs, sorry. From that stack trace it should be
possible to figure out which line of the python is actually causing that
memory request. My bet is on one of the cPickle lines, but it would be
nice to see the stack trace "raw" so to speak. Maybe that stack trace
would help someone on the mailman list suggest something else.
Did you already try sending a different kind of attachment that's the
same kind of size (a bit bigger would be better). Maybe it's something
about the attachment itself that's causing the issue?
As a final resort, if none of the above resolves or leads to clues, I
would try uninstalling python2.5 and installing python2.4 *just in
case*. I'm assuming that you only have python for mailman. (If you
have real python users then it's trickier. You can install multiple
versions of python but possibly not from ports. But python always
compiled cleanly from tarball on FreeBSD for me. I can offer some help
with that process if you really need it).
I can't help thinking that 500Kb is a very small attachment and I can't
really see why it would legitimately cause a request for so much memory
that your settings aren't handling it.
A quick look at the mailman web site shows that you can run qrunner from
the command line - couldn't immediately find the man page though. If
you could somehow queue up the email with Mailman switched off, you
could run qrunner by hand and then you'd definitely get the python
backtrace. Maybe the mailman list, or a mailman admin here, can help
with that, if you need it.
Running out of ideas.
More information about the freebsd-questions