PostgreSQL stats collector eats all CPU time

Krzysztof Jędruczyk kjedruczyk at ramfasto.com
Tue Nov 4 04:24:33 PST 2008


Recently postgresql on our database server started showing some sort of
problems: after running for some time stats collector process eats 100%
cpu time - exactly as someone reported here:
http://groups.google.com/group/pgsql.general/browse_thread/thread/6dfea591d243e987

No solution is provided there though... kernel/libc bug is suggested

I'm not sure how relevant it is - problem appeared first time about a
day or two after server has been upgraded with additional processor: now
it is 2x dual core opteron with 8GB of RAM. For some reason we didn't
see this problem back when it was just one dual core opteron with 4GB of
RAM. It is amd64 version of freebsd of course...

As the person who reported the problem previously on postgresql mailing
list showed - the stats collector busy-loops in interrupted poll call -
kdump contains output like this:
    878 postgres 0.009643 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009671 RET   poll -1 errno 4 Interrupted system call
    878 postgres 0.009675 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009687 RET   poll -1 errno 4 Interrupted system call
    878 postgres 0.009691 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009700 RET   poll -1 errno 4 Interrupted system call

I also grabbed core dump of the postmaster process and the backtrace
seems a little weird to me:

#0  0x00000008012186cc in poll () from /lib/libc.so.7
[New Thread 0x801601120 (LWP 100209)]
[New LWP 54785]
(gdb) bt
#0  0x00000008012186cc in poll () from /lib/libc.so.7
#1  0x000000080107c85e in poll () from /lib/libthr.so.3
#2  0x0000000000578bd0 in pgstat_start ()
#3  0x000000000057d2b5 in PostmasterMain ()
#4  <signal handler called>
#5  0x0000000801268cdc in select () from /lib/libc.so.7
#6  0x000000080107c574 in select () from /lib/libthr.so.3
#7  0x000000000057aaa3 in ClosePostmasterPorts ()
#8  0x000000000057be9e in PostmasterMain ()
#9  0x00000000005358fe in main ()

If I'm reading it right the constantly interrupted poll function is
being called from the signal handler?

Any suggestions what else to do to identify the problem? It seems that 
the situation will be reproducible - after server restart it happened 
again within one day.

-- 
Best regards,
   Krzysztof Jędruczyk


More information about the freebsd-stable mailing list