LSI SAS 9300-8i weird ZFS checksum errors
George Kontostanos
gkontos.mail at gmail.com
Thu Dec 25 21:03:10 UTC 2014
On Thu, Dec 25, 2014 at 9:31 PM, Steven Hartland <killing at multiplay.co.uk>
wrote:
>
> On 25/12/2014 14:39, George Kontostanos wrote:
>
>> Hello, list and Merry Christmas to all
>>
>> I am facing some weird checksum errors during scrub. The configuration is
>> the following:
>>
>> Board: Supermicro Motherboard X10DRi-T4+ (
>> http://www.supermicro.com/products/motherboard/xeon/c600/x10dri-t4_.cfm)
>> Controller: LSI SAS 9300-8i (
>> http://www.lsi.com/products/host-bus-adapters/pages/lsi-sas-9300-8i.aspx)
>> HDD: 21X6TB Western Digital WD60EFRX
>> HDD: 2XIntel SATA 600GB Solid-State Drive SSDSC2BB600G401 DC S3500
>> (SWAP, ZIL, CACHE)
>> Chassis: Supermicro 847BE1C-R1K28LPB 4U Storage Chassis
>> RAM: 64 GB
>>
>> I installed initially FreeBSD 10.1-RELEASE created one pool consistent by
>> 3
>> X7disk VDEVs in RAIDZ3. I used NFS to start copying some data. After
>> copying around 3TB I initiated a scrub.
>> The result was the following: http://pastebin.com/rswgCY2A and
>> http://pastebin.com/DQ2urGXk
>>
>> I tried to flash the controller but the LSI utility did not recognize the
>> controller. I installed FreeBSD 9.3-RELEASE and used LSI's mpslsi3 driver.
>> I was able to flash the latest bios and firmware that way.
>>
>> LSI Corporation SAS3 Flash Utility
>> Version 07.00.00.00 (2014.08.14)
>> Copyright (c) 2008-2014 LSI Corporation. All rights reserved
>>
>> Adapter Selected is a LSI SAS: SAS3008(C0)
>>
>> Controller Number : 0
>> Controller : SAS3008(C0)
>> PCI Address : 00:82:00:00
>> SAS Address : 500605b-0-06ce-27e0
>> NVDATA Version (Default) : 06.03.00.05
>> NVDATA Version (Persistent) : 06.03.00.05
>> Firmware Product ID : 0x2221 (IT)
>> Firmware Version : 06.00.00.00
>> NVDATA Vendor : LSI
>> NVDATA Product ID : SAS9300-8i
>> BIOS Version : 08.13.00.00
>> UEFI BSD Version : 02.00.00.00
>> FCODE Version : N/A
>> Board Name : SAS9300-8i
>> Board Assembly : H3-25573-00E
>> Board Tracer Number : SV32928040
>>
>> I recreated the pool again and started writing data via NFS again. After 3
>> TB of data I started a scrub and I am still getting checksum errors though
>> there are no messages regarding the drives anymore in /var/log/messages
>>
>> pool: Pool
>> state: ONLINE
>> status: One or more devices has experienced an unrecoverable error. An
>> attempt was made to correct the error. Applications are unaffected.
>> action: Determine if the device needs to be replaced, and clear the errors
>> using 'zpool clear' or replace the device with 'zpool replace'.
>> see: http://illumos.org/msg/ZFS-8000-9P
>>
>> scan: scrub in progress since Thu Dec 25 08:46:21 2014
>> 2.28T scanned out of 5.54T at 816M/s, 1h9m to go
>> 11.9M repaired, 41.26% done
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> Pool ONLINE 0 0 0
>> raidz3-0 ONLINE 0 0 0
>> gpt/WD-WX41D94RN5A3 ONLINE 0 0 15 (repairing)
>> gpt/WD-WX41D948YE1U ONLINE 0 0 14 (repairing)
>> gpt/WD-WX41D94RN879 ONLINE 0 0 16 (repairing)
>> gpt/WD-WX21D947NC83 ONLINE 0 0 24 (repairing)
>> gpt/WD-WX21D947NT77 ONLINE 0 0 15 (repairing)
>> gpt/WD-WX41D948YAKV ONLINE 0 0 19 (repairing)
>> gpt/WD-WX21D9421SCV ONLINE 0 0 20 (repairing)
>> raidz3-1 ONLINE 0 0 0
>> gpt/WD-WX21D9421F6F ONLINE 0 0 16 (repairing)
>> gpt/WD-WX41D948YPN4 ONLINE 0 0 14 (repairing)
>> gpt/WD-WX21D947NE2K ONLINE 0 0 22 (repairing)
>> gpt/WD-WX41D948Y2PX ONLINE 0 0 19 (repairing)
>> gpt/WD-WX41D94RNAX7 ONLINE 0 0 17 (repairing)
>> gpt/WD-WX21D947N1RP ONLINE 0 0 12 (repairing)
>> gpt/WD-WX21D94216X7 ONLINE 0 0 20 (repairing)
>> raidz3-2 ONLINE 0 0 0
>> gpt/WD-WX41D948YAHP ONLINE 0 0 25 (repairing)
>> gpt/WD-WX21D947N06F ONLINE 0 0 18 (repairing)
>> gpt/WD-WX21D947N3T1 ONLINE 0 0 21 (repairing)
>> gpt/WD-WX41D94RNT7D ONLINE 0 0 5 (repairing)
>> gpt/WD-WX41D948Y9VV ONLINE 0 0 18 (repairing)
>> gpt/WD-WX41D94RNS62 ONLINE 0 0 24 (repairing)
>> gpt/WD-WX21D9421ZP9 ONLINE 0 0 28 (repairing)
>> logs
>> mirror-3 ONLINE 0 0 0
>> gpt/zil0 ONLINE 0 0 0
>> gpt/zil1 ONLINE 0 0 0
>> cache
>> gpt/cache0 ONLINE 0 0 0
>> gpt/cache1 ONLINE 0 0 0
>>
>> errors: No known data errors
>>
>> This is really driving me crazy since smartmon tools do not display any
>> errors on the drives.
>>
>> Any suggestions are most welcomed!!!
>>
>> Check for bad hardware, first guess would be memory, next would be
> hotswap backplane.
>
> Regards
> Steve
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
Hi Steve,
Memory looks good in memtest. I am not sure what you mean regarding hotswap
backplane.
--
George Kontostanos
---
More information about the freebsd-fs
mailing list