7.1-PRERELEASE: arcmsr write performance problem

Paul MacKenzie bsdlist at cogeco.ca
Mon Dec 15 10:10:26 PST 2008


> Replying to my own post ...
>
> I have done a test on the same machine comparing 6.3-p1 to 7.1-PRE. 
> The performance is the expected ~6MB/s (because of the lack of cache)
> on 6.3-p1, so the BIOS change doesn't seem to be at fault.
>
> This seems to be a regression somewhere between 6.3 to 7.1.  The Areca
> driver is the same in 6.3 and 7.1, so the problem seems to be elsewhere.
>
> I think this is more than just a "performance" problem.  The
> observations with gstat showing extremely high ms/w values (I have
> seen them as high as 22000) makes it look like IO completion
> interrupts are being lost.
>
> Any suggestions on where to look next?  Are there obvious candidates?
Hi,

I too am having a terrible time with this. I have a server with 7.0
(FreeBSD 7.0-STABLE#0: Sun Jun 29 00:32:38 EDT 2008) that I do not
believe the system has the same problem so the 7.1 STABLE and I am
looking into downgrading to this version as a possible short term solution.

To give you an idea of how slow it gets the two systems are nearly
identical in hardware and a "make buildworld" on the 7.1 (FreeBSD
7.1-PRERELEASE #0: Sun Dec 14 09:25:21 EST 2008) took over 10 hours!!!
On the machine without the problem and which has a much higher load it
took about 2 hours. Both systems have 16gb of RAM with 2 quad core cpus
(8 cpus@ 2.33GHz) on a S5000PAL (SR2500ALBRPNA) motherboard. There is a
slight difference in the raid card but both use the same raid driver
(arcmsr). The system with the problems has a ARC-1130 with a 1 GIG cache
chip in a Raid1+0 with 4 drives. So one system took over 8 hours longer
to build the world and it was visually slow on the console when building.

No errors are tracked at all in the raid card or in the S5000PAL
motherboard for the hardware.

After weeks of working on this I now believe that anything that taxes
the writing to the hard drives causes the system CPU numbers to spike
through the roof (approx 80% usage) and the server grinds to a halt. And
I also see wide swings in the System CPU usage. It reminds me of the
QUOTA issue I had with 6.2 where the system usage was really high and it
was the QUOTA code that was broken.

I have Polling, Quota, and the Lagg system enabled on both of the
systems and have tried to make them as similar as possible in the setup.

If I can do any diagnosing on this assuming someone is interested in
looking into this issue then let me know what to look for and test.
Otherwise, I will have to shortly move back to 7.0 or 6.4 to try to
solve the issue. I am very worried about updating to 7.1 in the future
and on the other machine as I really have not seen a lot of other people
with these problems being discussed. It effectively makes the system
mostly unusable as a webserver when it happens as apache ends up needing
to be stopped and restarted to get it going again. To this point I  have
shut off anything I can shut off to try to limit how often it happens.

Thanks,

Paul


More information about the freebsd-stable mailing list