System perforamance 4.x vs. 5.x and 6.x
Brett Bump
bbump at rsts.org
Thu Feb 14 22:27:32 UTC 2008
On Thu, 14 Feb 2008, Kris Kennaway wrote:
> We are going to need more information about your system. What do you
> mean by "peak activity"? What is running on the system when it performs
> badly (check top -S, ps, gstat, vmstat -w, vmstat -i). What is your
> kernel configuration, dmesg and relevant aspects of the system
> configuration?
>
> Kris
>
I would call 120 processes with a load average of 0.03 and 99.9 idle
with 10-20 sendmail processes and 30 apache jobs nothing to write home
about. But when that jumps to 250 processes, a load average of 30 with
50% idle (5-10 second waits on single character ssh echo) a bit busy.
That usually means my heavy pop3 users are checking in at the same time
someone (or 2 or 3) have sent email to the large volume listservs. Proc
stat doesn't show as much as gstat and iostat. Gstat alwasy shows my
drive with /var/mail being 97-100% busy and iostat will always show hi
tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).
Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).
On Thu, 14 Feb 2008, Bill Moran wrote:
> What _is_ the hardware?
Dell PowerEdge 1750 1U, 146Gig U320s. The Broadcoms seem to be a change
from the earlier 1550s with intel pro/100s (I prefer the intel's).
On Thu, 14 Feb 2008, Kris Kennaway wrote:
> All it takes is a single bug (e.g. in a driver) to affect performance on
> a certain specific configuration. However, bugs tend to get fixed over
> time. Maybe that is the case for you. It is well worth verifying
> whether the problem persists on the most up-to-date sources, so that
> everyone's time is not wasted in tracking down a problem that is already
> fixed. You can just do a source upgrade from 6.2, which will be quite
> straightforward.
Agreed. I have a 2nd machine that is identical to this one I could put
6.3 on to test this.
> It is pretty unusual for applications to be aborting, but usually they
> do it because they fail an application-specific run-time check. What
> diagnostics are logged by the applications? You may need to increase
> their respective verbosity/debug levels.
>
> Kris
>
I was suspicious that maybe we needed more memory but swap has barely even
been touched (232k used...with 1400meg inactive).
On Thu, 14 Feb 2008, Mike Tancsa wrote:
> No, but you havent given the list much to go on as to what the
> problems are or what hardware you are using, or really quantified the
> issue. By "slow" is the disk blocking on IO ? or are processes
> blocking on network IO etc etc. 6.2 was not a "bad" release, but 6.3
> is better than 6.2. By starting with a more contemporary release,
> less effort by developers and other users need to be exerted in
> figuring out if the problem(s) you are running into have already been
> fixed.
It appears to me that disk access is extremely slow. I can transfer
large files between the machines faster than making a duplicate copy
on disk.
> Because the drivers have changed since 4.10. "improvements" could
> have introduced regressions... Change in the driver to support newer
> versions of a chipset might break older chipsets.
Any known issues with the Dell PERC RAID driver that anyone is aware
of? I can start there.
> bge is a good example of a driver that has had a lot of changes and
> hasnt worked all that well at times.... hence the suggestion to try
> 6.3 as there have been many bug fixes. Whether or not it fixes your
> problem its hard to say, but start there to see if things are faster
> and stable for you etc.
> e.g.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c
>
> You should also post a full dmesg of the box as well as kernel config
> etc...
There kernel is generic with ipfirewall, quota and smp.
Feb 14 02:53:37 mail sm-mta[33143]: m1E9qKLZ033143: SYSERR(root): collect: I/O error on connection from astro.pryor.com, from=<CUSTOMERSERVICE at EM.PRYOR.COM>pid 31611 (milter-greylist), uid 25: exited on signal 3
Feb 14 03:17:08 mail sshd[34844]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(host-200-6-102-230.iia.cl, AF_INET) failed
Feb 14 03:17:08 mail sshd[34844]: refused connect from 200.6.102.230 (200.6.102.230)
Feb 14 03:36:30 mail sshd[35944]: refused connect from 202.129.44.218 (202.129.44.218)
Feb 14 03:45:21 mail sshd[36667]: refused connect from 202.129.44.218 (202.129.44.218)
Feb 14 03:52:01 mail sm-mta[33092]: m1E9peX3033092: SYSERR(root): collect: read timeout on connection from astro.pryor.com, from=<CUSTOMERSERVICE at EM.PRYOR.COM>
Feb 14 07:24:01 mail sshd[52723]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(42.215.6.200.intelnet.net.gt, AF_INET) failed
Feb 14 07:24:01 mail sshd[52723]: refused connect from 200.6.215.42 (200.6.215.42)
Feb 14 07:28:56 mail sm-mta[52866]: m1EEPPLC052866: SYSERR(root): collect: I/O error on connection from astro.pryor.com, from=<CUSTOMERSERVICE at EM.PRYOR.COM>
Feb 14 07:29:15 mail sshd[53465]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(42.215.6.200.intelnet.net.gt, AF_INET) failed
Feb 14 07:29:15 mail sshd[53465]: refused connect from 200.6.215.42 (200.6.215.42)
Feb 14 08:01:57 mail sshd[58183]: refused connect from mail.rsib.net (12.46.46.98)
Feb 14 08:07:22 mail sshd[59017]: refused connect from mail.rsib.net (12.46.46.98)
Feb 14 09:50:00 mail su: bbump to root on /dev/ttyp0
pid 43464 (httpd), uid 80: exited on signal 6
pid 86995 (imapd), uid 2151: exited on signal 6
pid 85706 (httpd), uid 80: exited on signal 6
pid 87600 (imapd), uid 1376: exited on signal 6
pid 45621 (httpd), uid 80: exited on signal 6
pid 45617 (httpd), uid 80: exited on signal 6
Feb 14 11:28:36 mail inetd[48076]: imap4 from 208.107.161.82 exceeded counts/min (limit 60/min)
Feb 14 11:28:38 mail last message repeated 2 times
Feb 14 11:52:34 mail sm-mta[99563]: m1EHqX9u099563: SYSERR(root): collect: read timeout on connection from fulltimeconsult.com, from=<AARPMembership at wlq.fulltimsgeconsult.com>
Feb 14 13:06:27 mail su: bbump to root on /dev/ttyp0
pid 45995 (imapd), uid 3115: exited on signal 6
pid 46407 (imapd), uid 1873: exited on signal 6
pid 46418 (imapd), uid 2769: exited on signal 6
pid 46402 (imapd), uid 1873: exited on signal 6
pid 46651 (imapd), uid 2769: exited on signal 6
pid 46653 (imapd), uid 2769: exited on signal 6
pid 44499 (httpd), uid 80: exited on signal 6
pid 47035 (imapd), uid 1873: exited on signal 6
pid 46083 (httpd), uid 80: exited on signal 6
pid 46395 (httpd), uid 80: exited on signal 6
pid 46604 (httpd), uid 80: exited on signal 6
pid 46603 (httpd), uid 80: exited on signal 6
> what does
> netstat -ni
> give
-bash-2.05b$ netstat -ni
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
bge0 1500 <Link#1> 00:0f:1f:66:0e:e6 12511748 902 12025487 0 0
bge0 1500 208.107.160/2 208.107.161.82 17011211 - 16533277 - -
bge1 1500 <Link#2> 00:0f:1f:66:0e:e8 3523091 586 4089056 0 0
bge1 1500 10.1.1/24 10.1.1.1 3516790 - 4087415 - -
lo0 16384 <Link#3> 4659734 0 4659733 0 0
lo0 16384 fe80:3::1/64 fe80:3::1 0 - 0 - -
lo0 16384 ::1/128 ::1 2772 - 2772 - -
lo0 16384 127 127.0.0.1 147255 - 147255 - -
> and what options do you have on ifconfig ? Are the errors seen on
> your switch port as well or just in netstat -ni ?
ifconfig_bge0="inet 208.107.161.82 netmask 255.255.254.0 media 100baseTX mediaopt full-duplex"
ifconfig_bge1="inet 10.1.1.1 netmask 255.255.255.0 media 100baseTX mediaopt full-duplex"
No, the switch shows clear, they only show up as input errors on this box.
The box sitting under this one has an uptime of 621 days with 1 Oerr.
> Why are the processes sigabrting ? Is there anything in the
> application logs to indicate why they are exiting ?
>
> ---Mike
>
[Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
httpd in malloc(): error: recursive call
[Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
httpd in free(): error: recursive call
[Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
httpd in free(): error: recursive call
Memory. This is why I was willing to throw another 2gig of memory in it,
but why am I only seeing 268K of swap used?
Brett
More information about the freebsd-performance
mailing list