amr performance woes and a bright side [UPDATE]

Sven Willenberger sven at dmv.com
Wed Mar 23 13:45:39 PST 2005


On Tue, 2005-03-15 at 09:02 -0500, Sven Willenberger wrote:
> On Mon, 2005-03-14 at 21:59 -0700, Scott Long wrote:
> > Sven Willenberger wrote:
> > > I have been testing a new box for utlimate use as a postgresql server:
> > > dual opteron (2.2GHz), 8G RAM, LSI 320-2x megaraid (battery-backed
> > > memory) with 2 single 73 GB drives and an 8x146GB RAID 0+1 array
> > > (hitachi U320 10k RPM). In doing so I have also tested the amd64
> > > 5.3-stable release againt gentoo x86_64 and Fedora FC3/x86_64.
> > > 
> > > First the bad news:
> > > 
> > > The linux boxen were configured with the postgres data drives on the
> > > raid 0+1 using XFS with a separate pg_xlog on a different drive. Both
> > > gentoo and FC3 were using 2.6.x kernels using the x86_64 distro.
> > > pgbench was initialized using no scaling factor (1 mill rows), scaling
> > > 10 (10 million) and 100. 
> > > With no scaling the linux boxen hit about 160 tps using 10 connections
> > > and 1000 -2000 transactions.
> > > The BSD system hit 100-120 tps. This is a difference I could potentially
> > > live with. Now enter the scaled tables:
> > > Linux systems hit marks of 450+ tps when pgbenching againt millions of
> > > rows while the BSD box stayed at 100 tps or worse .. dipping as low as
> > > 90 tps.
> > > 
> > > Bonnie benchmarks:
> > > Linux:
> > > Sequential output: Per Char = 65000 K/sec, Block = 658782 K/sec, Rewrite
> > > = 639654 K/sec
> > > Sequential input: Per Char = 66240 K/sec, Block = 1278993 K/sec
> > > Sequential create: create 641/sec , read n/a, delete 205/sec
> > > Random create: create 735/sec, read n/a, delete 126/sec
> > > 
> > > BSD:
> > > Sequential output: Per Char = 370K/sec (!!), block = 132281 K/sec,
> > > Rewrite = 124070 K/sec
> > > Sequential input: Per Char = 756 K/sec, block = 700402 K/sec
> > > Sequential create: create 139/sec, read 6308/seec, delete n/a
> > > Random create: create 137/sec, read 5877/sec, delete n/a
> > > 
> > > the bonnie tests were run several times with similar results.
> > > 
> > > It would seem to me that the pgbench marks and tests are being hampered
> > > by comparatively poor I/O to the raid array and disks under the amr
> > > driver control. I am hoping there are some tweaks that I could do or
> > > perhaps some patches to the driver in -CURRENT that could be
> > > applied/backported/MFC'ed to try and improve this performance.
> > > 
> > > Oh, the "bright" side? FreeBSD is the only OS here that didn't kernel
> > > Oops due to memory allocation issues, or whatever caused them (the
> > > backtrace showed kmalloc). That may be because of the XFS file system (I
> > > didn't try EXT3 or its kin) or because of issues with LSI and the linux
> > > kernel or who knows what. I am hoping to get the stability and OS
> > > performance of FreeBSD and the raw disk performance witnessed in the
> > > Linux systems all rolled up into one. Help?
> > > 
> > > Sven
> > > 
> > 
> > First of all, are you using the same hardware and just switching the OS?
> > Are you sure that the RAID and disk cache settings are identical?
> > Second, some of the Linux numbers are very hard to believe; PCI-X has a 
> > theoretical bandwidth of 1066MB/sec, so it's highly unlikely that you're 
> > going to get 1249MB/sec out of it in the block read test.  bonnie is an 
> > excellent tool for testing the randomness of cache effects and memory 
> > bandwidth, it's not so good at testing actual I/O performance =-)
> > 
> > So setting aside the bonnie tests, the PQSQL stats do indeed show a 
> > problem.  Is PGSQL threaded?  If so, you might be running into some of
> > the threading performance problems that are well known and are being 
> > worked on.  I don't know a whole lot about PGSQL or the tests that you 
> > are talking about, but if you had an easy recipe for duplicating your 
> > test environment, I'd like to experiment some myself.
> > 
> > Scott
> 
> Yes, these tests were done on the same hardware, with the same hardware
> raid configuration with fresh OS installs for each battery of tests. The
> bonnie numbers do seem a bit out of whack upon closer scrutiny.
> 
> As far as setting up PGSQL, in each case it was set up from packages
> (FreeBSD ports, Gentoo emerge, FC3 yum) and the postgresql.conf file was
> adjusted to use the same set of values for memory, etc.
> 
> pgbench (postgresql-contrib) was run as follows for testing:
> pgbench -i -U postgres/pgsql pgtest (where the User is either postgres
> or pgsql depending on platform and pgtest is the test db set up using
> createdb)
> pgbench -c 10 -t 1000 -U pgsql pgtest
> pgbench -c 10 -t 4000 -U pgsql pgtest
> pgbench -c 10 -t 10000 -U pgsql pgtest
> pgbench -i -s 10 -U pgsql pgtest (scaling factor of 10 to increase the
> table sizes for benchmarking)
> pgbench -c 10 -t 1000 etc ......
> pgbench -i -s 100 -U pgsql pgtest
> 
> Sven
> 
Just thought I would share this: After moving around to a few more OSes
(including Solaris 10 which witnessed performance numbers simlar to
those of FreeBSD) one of the hard drives in the array failed; I suspect
it was a factory defective drive as it was brand new -- not sure if that
impacted performance or not (in the original tests) but it may have as
the controller tried to keep skipping past bad sectors.

Anyway, now with a battery backed raid controller in place and 6
functioning drives (raid striped across 3 pairs of mirrors) and
write-back enabled I am seeing the pgbench give me readings of 650+
tps!! :-) 

These numbers would indicate to me (as did the odd bonnie++ numbers)
that the 2.6 linux kernel with xfs was using some heavy duty write
caching. I suspect the kernel panics I was seeing were the result of
poor handling of the defective drive when trying to fsync or memory
mismanagement of the cacched write data.

At any rate, I am glad I can stay with the FreeBSD option now.

Sven




More information about the freebsd-amd64 mailing list