ZFS melting under postgres...

Benjamin Close Benjamin.Close at clearchain.com
Wed Dec 12 20:22:37 PST 2007


Hugo Silva wrote:
> Benjamin Close wrote:
>> Peter Losher wrote:
>>> Hi,
>>>
>>> As part of our testing 7.0/ZFS we tried putting it thru it's paces
>>> having ZFS act as our storage medium for some test pgsql db's (like for
>>> sqlgrey, etc) and in both BETA2 and BETA4 (amd64) we get the same
>>> results with a RAIDZ2 container:
>>>
>>> -=-
>>> Dec 12 14:24:12 nsa sqlgrey: fatal: setconfig error at
>>> /usr/local/sbin/sqlgrey line 186.
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad4 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad6 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad8 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad10 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad12 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad14 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad16 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad18 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad4 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad6 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad8 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad10 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad12 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad14 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad16 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad18 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad4 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad6 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad8 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad10 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad12 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad14 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad16 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad18 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa postgres[50527]: [5-1] PANIC:  could not write to
>>> log file 2, segment 53 at offset 7864320, length 8192: Input/output 
>>> error
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad4 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad6 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad8 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad10 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad12 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad14 offset=3665128448 size=22016
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad16 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
>>> path=/dev/ad18 offset=3665128448 size=21504
>>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86
>>> Dec 12 16:49:53 nsa postgres[50596]: [1-1] FATAL:  the database system
>>> is starting up
>>> Dec 12 16:49:53 nsa kernel: pid 50527 (postgres), uid 70: exited on
>>> signal 6 (core dumped)
>>> -=-
>>>
>>> It basically corrupts the container from the inside until it fails
>>> completely (usually withing 24-48 hours depending on how busy the db 
>>> is)
>>>
>>> I had thought it was a bad SATA replicator/controller, but we had that
>>> replaced w/ one from Supermicro.  So it's either the disks, or 
>>> something
>>> in ZFS.  Anyone used ZFS to backend any db's (mysql or pgsql?)
>>>
>>> If you need more info, let me know...
>>>
>>>   
>> Try turning of zil, whilst I don't use a db, I have zfs under high 
>> load. I've found without zil turned off I see checksum corruption as 
>> well:
>>
>> /boot/loader.conf
>>
>> vfs.zfs.zil_disable=1
>>
>> Cheers,
>>    Benjamin
>
> Wouldn't it be a bad idea to disable ZIL ?
>
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 
>

A good read is:

http://blogs.sun.com/perrin/entry/the_lumberjack

Which shows why zil exists.

Cheers,
    Benjamin


More information about the freebsd-current mailing list