Oliver Brandmueller ob at e-Gitt.NET
Fri Feb 25 11:05:45 GMT 2005


On Fri, Feb 25, 2005 at 09:33:39AM +0100, Francois Tigeot wrote:
> On Fri, Feb 25, 2005 at 12:31:22AM +0100, Oliver Brandmueller wrote:
> > We had problems here with 3ware + 72GB Raptors (10 krpm), so we moved to 
> What sort of problems ?
> I was planning to use some sort of 3Ware/Raptor combination with amd64
> -STABLE machines in the near future, and I am very interested by your
> experience...

Under heavy load (I/O load on the disks constantly over 200 tps, average
at about 250 tps, peaks over 600 tps) a random drive disconnects from
the RAID 10. After removing the drive from the config and rescanning the
bus, the drive does not show up anymore. The only way to get the drive
back is to unplug the drive (or switch the computer off, so that power
is removed).  After that there is no problem to rebuild the RAID with
the drive.

-> It's not reproducable. The error occurs under high load, sometimes
   three times a week, sometimes it does not happen in 3 months.

-> It happens only with the Raptors.

-> It's always a random drive, there's no drive, that disconnects more

-> It happens with 8506 and 9500 type of SATA 3ware Escalade

-> It does not depend on the firmware of the controller, we tried
   different versions

-> With the same drives, same OS, same motherboard, same drive bays
   but an ICP controller we never saw the error.

-> FBSD 5.1-CURRENT up to 5-STABLE as of mid january

What we did not yet try:

- other OS

- other drives (in fact, the raptors are the only SATA drives with
  10 krpm available - or at least were when we bought the machines).
  slower drives are not an options here.

We did not see this dureing testing, but the testing phase was very
short (only 2 weeks). During the tests we let dd's run, bonnie++ and
different other things, but none of the usual tools obviously put enough
load constantly on the disks. The machines are spamfilters. As long as
we have more machines working (meaning lower workload for each machine)
or the load goes down due to other reasons, the errors don't occur
anymore (we almost never see a failed drive on a weekend, but during the
week between 10 and 12 local time we see it more often). So I guess, 
that most people won't see this error in their setups, especially when 
they need the disk performance only during peaks.

My experience with the ICP Vortex controllers is very well up to now. 
They are fast and the management software is very comfortable. The only 
thing I'm missing is the simplicity of tw_cli (the management tool for 
the 3wares), which allowed to request status of the RAID by a simple 
script. The ICP software ("srcd") is more flexible, but only gives you 
the opportunity to execute a program on an event or send an SNMP trap. 
Both is fine, but is a little bit more complicated to include in nagios 
for example.

- Oliver

| Oliver Brandmueller | Offenbacher Str. 1  | Germany       D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: |
|               Ich bin das Internet. Sowahr ich Gott helfe.               |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |

More information about the freebsd-stable mailing list