Very slow disk speed / mpt0: LSILogic SAS/SATA Adapter

Wed Jun 17 12:49:21 UTC 2009

On Wed, Jun 17, 2009 at 01:35:52PM +0200, Matej ?erc wrote:
> Hi,
> 
> we have a HP ProLiant server with RAID 0/1 controller onboard. It is
> detected as mpt0 (I have attached a part of dmesg output at the end of the
> mail). As reported by some already (
> http://www.mail-archive.com/freebsd-performance@freebsd.org/msg02446.html),
> we are also getting extremely slow write speeds. I read somewhere that there
> are some improvements which could solve the situation in 7.2 (our system has
> 7.1 installed and I am currently unable to turn it off and it will stay so
> for at least 3 months).
> 
> There are some information that setting hw.mpt.enable_sata_wc=1 solves the
> write speed (it actually does as I tested!), but I would like to know more
> about how danger that option is. We are using softupdates and now have this
> hw.mpt.enable_sata_wc=0, after reading that it might be very dangerous when
> using sata_wc=1.

Not very dangerous at all, as long as you are not using background fsck.
The problem with write caching on standard IDE/SATA drives is that they
report that a write operation is finished even if it has only reached the
disk's cache.  This means that some of the guarantees that softupdates is
supposed to provide regarding which order data is written to the disk,
cannot be fulfilled.

This essentially means that if you lose power to the machine unexpectedly
you might have some filesystem inconsistencies afterward that you would not
have had without the disks' cache being enabled. (A normal reset would not
cause this problem since the disks would still retain the contents of their
caches.)

If you are using background fsck this could be a big problem, since for
background fsck to work properly the only inconsistencies on the filesystem
must be that some blocks are marked as in use when they actually are not.
(That is one of the guarantees that softupdates is supposed to provide, but
may not be able to provide due to the behaviour of the disks' cache.)  If
you do have other inconsistencies on the filesystem the whole system may
throw a kernel panic when it encounters one of them.
(A normal foreground fsck would fix all such inconsistencies before the
system starts running for real.)

It is also the case that if your system is really busy writing to the disks
(with write caching enabled) and you lose power at exactly the wrong time
you could potentially lose a lot of data from the filesystem, since any
given write could theoretically get delayed indefinitely before it hits the
disk's platters.  (If the write that gets delayed is the creation of a
directory in which lots of writes happen later you could lose all of them.)
If you have write caching disabled you will not lose more than the last 30
seconds or so of updates.

Using an UPS is one obvious way of drastically reducing the number of times
the machine loses power unexpectedly, and if it is so important that this
server is not taken down I assume you already have an UPS, in which case
enabling the write caching is essentially riskfree.

> 
> I am really looking forward to getting more information about this, it is
> actually driving me nuts. We have a number of other servers and there are no
> problems with RAID controllers at all. And as I said, I cannot actually turn
> of this machine and bring it back to reinstall new OS.
> 
> Thank you very much for your comments and thoughts,
> Matej
> 
> 
> The server model is ML110G5.
> 
> mpt0: <LSILogic SAS/SATA Adapter> port 0xd000-0xd0ff mem
> 0xfcefc000-0xfcefffff,0xfcee0000-0xfceeffff irq 16 at device 0.0 on pci5
> mpt0: [ITHREAD]
> mpt0: MPI Version=1.5.16.0
> mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
> mpt0: 1 Active Volume (2 Max)
> mpt0: 3 Hidden Drive Members (10 Max)

-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013 at student.uu.se