Tracking down a problem with php on FreeBSD

Ruslan Mahmatkhanov cvs-src at yandex.ru
Sat Feb 5 18:44:41 UTC 2011


Hi, Ivan!

Thank you much for response and sorry for late answer. We was able to 
collect some data about the issue to make discussion more objective. See 
below.

24.01.2011 16:54, Ivan Voras пишет:
> On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote:
>>
>> Good day!
>>
>> We are using custom php application on FreeBSD 8.1R amd64. It is started
>> with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend.
>> Several times per day this app is making self unavailable.
>
> I think it would be more appropriate to ask this on the stable@ list.

I believe that this issue is not related to stable only and problem 
resolution will involve some debugging and stuff, so this was the reason 
why i post this to hackers at .

>
>> Simple php-fpm restart solves the problem, but i need to track it down
>> to the cause of this situation and ask for your assistance and
>> instructions on how to debug it. Some facts about this:
>
> On one hand, FPM is said to be very experimental...
>
> Personally, I've been using apache22-worker or apache22-event +
> mod_fcgid for years without trouble.

We prefer to avoid using apache at all, because in this it's just adds 
yet another unneeded link and complexity.

> It looks very application-specific, possibly not really an OS problem
> (or maybe a problem of different expectations from the OS when porting
> from Linux).
>
>> - `top -mio` shows very high (80000-90000 for VCSW) VCSW/IVCSW values
>> for php-fpm processes and LA is more than 120
>
> How many "real" user request are in these 120? Do any users at the time
> of problem (this doesn't look like a "crash") receive valid responses?

This problem is seen on development server too with < 5 users online. 
And nobody receives valid response, they all get this message:

Maximum execution time of 30 seconds exceeded in Unknown on line 0.

>
>> - user seeing http 502 error code in browser
>> - php-fpm log has many of this lines in time of crash:
>> Jan 23 17:56:58.176425 [WARNING] [pool world] server reached
>> max_children setting (100), consider raising it
>
> Did you try raising it? Does the error happen ONLY when this limit is
> reached?

Yes, we did. Nothing changes.
I'm not sure about second question, i believe it's not related.

> These are some very varied errors, not especially consistent with each
> other.
>
> Did you try some generic socket & TCP tuning like described in
> http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel
> ?

Yes we did some on initial stage when server was built. And recently 
tried to raise kern.ipc.somaxconn, but without success.

There is full current sysctl.conf:

security.bsd.see_other_uids = 0
security.bsd.see_other_gids = 0
security.bsd.unprivileged_proc_debug=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.conservative_signals=1
security.bsd.hardlink_check_uid=1
security.bsd.hardlink_check_gid=1
vfs.usermount=0
net.inet.tcp.drop_synfin=1
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
net.inet.icmp.icmplim=100
net.link.ether.inet.max_age=800
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
kern.ps_showallprocs=0
net.inet.ip.random_id=1
kern.corefile=/tmp/%U.%N.%P.core

> Other than that, you will probably have to debug the php-fpm processes.
> Start by observing in which state they are (top without "-mio"). If the
> processes are blocking, try "procstat -k <pid>" on them.

Thay are not in blocking state:
   PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
39405 www           1 101    0   108M 14832K RUN    1   0:54 25.39% php-fpm
39544 www           1 101    0   108M 14832K CPU1   1   0:42 25.39% php-fpm
39395 www           1 101    0   108M 14828K CPU1   1   0:57 23.39% php-fpm
39401 www           1 101    0   108M 14832K CPU3   3   0:55 23.19% php-fpm
39481 www           1 101    0   108M 14832K RUN    3   0:55 23.10% php-fpm
13024 www           1 101    0   112M 47156K RUN    0   1:30 23.00% php-fpm
39398 www           1   4    0   108M 14832K accept 0   1:00 13.48% php-fpm
39373 www           1   4    0   108M 14832K accept 0   1:00 10.60% php-fpm
39385 www           1   4    0   108M 14832K accept 1   1:00  9.28% php-fpm
22130 www           1   4    0   111M 40200K accept 1   1:32  7.57% php-fpm

There is procstat -k/-t for couple of processes from this list:

PID      TID COMM             TDNAME     KSTACK
39405 100110 php-fpm             -       dmapbase
   PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39405 100110 php-fpm             -        1   185 run      -

   PID    TID COMM             TDNAME     KSTACK
39398 100154 php-fpm          -          tdq_cpu
   PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39398 100154 php-fpm          -           0   88 sleep   accept

When attaching to any hanging php-fpm proccess with truss, than i see a 
lot of this calls:
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)

Thank you in advance for any tips.

-- 
Regards,
Ruslan


More information about the freebsd-hackers mailing list