Memory Error using Mailman on FreeBSD. How to debug?

Alex Zbyslaw xfb52 at dial.pipex.com
Fri Feb 8 12:41:51 UTC 2008


Lachlan Michael wrote:

>>Real puzzler.  I'm surprised not to have at least one process growing,
>>though.  Maybe it's not using much CPU and you're not spotting it.
>>    
>>
>Following you advice, as far as I can tell, the mailman qrunner process
>
> /usr/local/bin/python2.5 /usr/local/mailman/bin/qrunner
>--runner=IncomingRunner:0:1 -s
>
>is the one that crashes: all other mailman processes are unaffected. I
>couldn't see it increase much in size (maybe it went from 8.5M to 12.5M),
>then it just bombed and a new process was spawned (easy to tell by the
>large increase in PID).
>  
>
All I can think us that qrunner asks for such a large amount of memory 
in one go, that it bombs out without ever growing.  That fits with the 
ktrace output as well.  Regretably, I don't think you can tell *how* 
much memory was asked for.  (The normal pattern with out of memory 
errors is for the process to grow and grown and grow and die; but it's 
not the only one). 

>>Other things to try:  Up the stack size
>>    ulimit -s 262144
>>
>>inside the mailman startup.  Again, I've had processes in the past which
>>needed this.
>>    
>>
>Ok, I am going to gradually try different limits. It seems as though setting
>kern.maxssiz="256M"
>and so on in /boot/loader.conf will allow me to increase the limits.
>Having to reboot is a pain, though. How far can I go? 512M? (Physical
>memory is 1GB)
>  
>
Certainly not more than physical memory :-)  To be honest, if 256M 
doesn't do it then this probably isn't the problem.  I'm not 
particularly hopeful that this will do it, but in your circumstance I 
would try it. 

At the same time, you could also increase the data size (maxdsiz?) to 
1Gb (yours looks like 0.5Gb, half your physical memory).

My limit settings (also 1Gb) look like:

datasize     1048576 kbytes
stacksize    262144 kbytes

which come from trying to set 256Mb and 1024Mb in the kernel config (old 
FreeBSD - no sysctls).

Keep the ulimit -a in the mailman startup script so you can confirm that 
you really get these numbers.

>>Can you email a file of the size your are
>>trying not through mailman?  Maybe your MTA (sendmail/postfix etc) has a
>>limit that somehow causes mailman to get this error.
>>    
>>
>
>This is definitely not the case. Users can receive (and send) similar
>sized large attachments individually, so the MTA (sendmail in this case)
>is not the cause.
>  
>
OK - rule that out.  The ktrace showing qrunner failing a break pretty 
much does that too.

>>The final suggestion is to try to trace (ktrace, strace from ports) the
>>process that is dying, 
>>
>I'll admit it is my first time to try a ktrace, but after noting which
>process it was that crashed I could identify the newly spawned PID, and
>obtained a ktrace.out (binary) and a kdump  (called
>mailman_process_log.txt) when the problems occurs by sending another large
>mail attachment.  I'll leave the files up for a couple of days. (Both
>files are about 2MB in size)
>
>http://lachlan.lkla.org/tmp/mailman_memory_error/
>
>Not that I can properly interpret the results, but it seems the mail file
>is completely read, but whatever happens next causes the memory error.
>
> 52506 python2.5 RET   read 354/0x162
> 52506 python2.5 CALL  break(0x8add000)
> 52506 python2.5 RET   break 0
> 52506 python2.5 CALL  break(0x8cc3000)
> 52506 python2.5 RET   break -1 errno 12 Cannot allocate memory
>  
>
The kdump output is the only useful bit, really.  Your analysis seems 
correct to me.

You are also getting a stack trace from python when it exits with the 
"out of memory" error.  ktrace is just showing python printing the stuff 
- it may be that the error also ends up in a log file somewhere - don't 
know where mailman logs, sorry.  From that stack trace it should be 
possible to figure out which line of the python is actually causing that 
memory request.  My bet is on one of the cPickle lines, but it would be 
nice to see the stack trace "raw" so to speak.  Maybe that stack trace 
would help someone on the mailman list suggest something else.


Did you already try sending a different kind of attachment that's the 
same kind of size (a bit bigger would be better).  Maybe it's something 
about the attachment itself that's causing the issue?


As a final resort, if none of the above resolves or leads to clues, I 
would try uninstalling python2.5 and installing python2.4 *just in 
case*.  I'm assuming that you only have python for mailman.  (If you 
have real python users then it's trickier.  You can install multiple  
versions of python but possibly not from ports.  But python always 
compiled cleanly from tarball on FreeBSD for me.  I can offer some help 
with that process if you really need it).


I can't help thinking that 500Kb is a very small attachment and I can't 
really see why it would legitimately cause a request for so much memory 
that your settings aren't handling it.


A quick look at the mailman web site shows that you can run qrunner from 
the command line - couldn't immediately find the man page though.  If 
you could somehow queue up the email with Mailman switched off, you 
could run qrunner by hand and then you'd definitely get the python 
backtrace.  Maybe the mailman list, or a mailman admin here, can help 
with that, if you need it.


Running out of ideas.

--Alex



More information about the freebsd-questions mailing list