10.1 RC4 r273903 - zpool scrub on ssd mirror - ahci command timeout
Kai Gallasch
k at free.de
Tue Dec 9 08:34:19 UTC 2014
Am Thu, 06 Nov 2014 01:20:47 +0000
schrieb Steven Hartland <killing at multiplay.co.uk>:
> Try recabling and re-seating, if it still happens try to identify if
> its the disk or backplane by moving it in the chassis. We had a
> machine here recently where it was backplane issue and simply
> replacing it fixed the issue.
Steven.
In the last weeks I took some time to single out the reason for the AHCI
timeouts with the two Samsung SSD drives.
Just for the record, my original post on the FreeBSD mailing
list archive:
http://lists.freebsd.org/pipermail/freebsd-stable/2014-November/080914.html
I changed / tried the following to get rid of the AHCI timouts, but no
chance, they still show :-/
Hardware:
- Changed all four SATA cables with cables of an identical spare server
- Changed all four SATA cables with certified SATA3 cables
- Replaced the 2.5" -> 3.5" drive converters with ones of another
manufacturer
- Replaced the drive backplane of the server
- Directly hooking the two SSDs up to the SATA connectors on the
mainboard
- Experimentally put an LSI 9212-4i4e PCIe SATA/SAS Controller into the
server and and connected the SATA cables to it.
- Same as before, but using the certified SATA3 cables
- Same as before, but this time connecting the two SSDs directly to the
9212-4i4e
- Same as before, connecting the two SSD directly to the 9212-4i4e, but
this time with the original SATA cables
BIOS:
- Temporarily disabled Power Management
- Tried disabling "Enable Hot Plug" Option
The difference between using the SATA connectors of the mainboard and
using the LSI 9212-4i4e is, that the LSI controller seems to be more
picky about CRC errors on the SATA bus and bus problems even show
without starting a zfs scrub. When doing a scrub using the LSI
controller there are plenty of timeouts and in one test, one of the SSD
drives even disappeard from the SATA bus.
Of course all the time during testing the two Hitachi non-SSD SATA
drives did not show any problems at all - although also connected to
the mainboard or the LSI controller during the testing.
So I now think the whole problem centers around the Samsung 850 PRO
512GB SSDs. Too bad I do not have the budget to just buy two Intel (or
other) SSDs of similar size and see if the timeouts disappear..
I wonder if this is a firmware issue with the drive or just some
misguided fancy energy saving feature of this particular drive
model causing the whole trouble.
Both drives have serial numbers not far apart and smartctl claims there
are no errors on the SSDs.
Any ideas (left) ?
Regards,
Kai.
--
PGP-KeyID = 0xE401B671927D4A5C
I am not a robot.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20141209/d0461fbf/attachment.sig>
More information about the freebsd-stable
mailing list