ZFS Kernel Panic on 10.0-RELEASE

Mike Carlson mike at bayphoto.com
Wed Jun 4 20:25:06 UTC 2014


Thanks Steve, I'll write up a summary for the openzfs-developers mailing 
list.

FWIW, the other server that is identical to this working-1 system was 
built with a fresh 10.0-RELEASE install, and we haven't had any issues, 
and its been up and running for months now.

On 6/4/2014 12:23 PM, Steven Hartland wrote:
> You mention mfi and 9.1, which rings alarm bells.
>
> They shouldn't be, but if your drives are > 2^32 sectors you'll
> have corruption:
> http://svnweb.freebsd.org/base?view=revision&revision=242497
>
> In addition to this I did a large number of fixes to mfi after
> this point which could result in all sorts of issues, but that
> doesn't explain issues with mps.
>
> Upgrading shouldn't have removed the cache file so I'm guessing
> that your initial install was already missing this.
>
> zdb is picky about havin a cache file, which is something we
> should fix at some point as IIRC the changes avg or mav made,
> I can't remember which, means that FreeBSD doesn't rely on the cache 
> file being present as much as it did.
>
> Back to the corruption, unfortunately this could be any number
> of things so its almost impossible to tell at which point the
> issue originally occured :(
>
> It might well be worth emailing a summary of the issue to the
> openzfs mailing list see if someone on there has any ideas
> where the DVA corruption could have occured.
>
>    Regards
>    Steve
>
> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
> To: <freebsd-fs at freebsd.org>
> Sent: Wednesday, June 04, 2014 7:46 PM
> Subject: Re: ZFS Kernel Panic on 10.0-RELEASE
>
>
> Top-posting... sorry
>
> I'm going to have to roll this particular server back into production, 
> so I'll be rebuilding it from scratch
>
> That is okay with this particular system, the other server that 
> exhibited the same issue will have to have all 19TB of its usable data 
> streamed off to temp storage (if we can get it) and rebuilt as well.
>
> Thank you Steve for being so helpful, and patient with me stumbling 
> through kgdb :)
>
>
> I have some lingering questions about the entire situation:
>
> First, these servers perform regular zpool scrubs (once a month), and 
> have ECC memory. According the the additional logging information I 
> was able to get from Steve's patch, it seems that even with these 
> safeguards data was still corrupted. A scub after the initial panic 
> did not report any errors.
>
> Second, these two servers had an extra anomaly, and that was the 
> missing zpool.cache. I say missing, because zdb was unable to access 
> the zpool, it was not until I ran "zpool set 
> cachefile=/boot/zfs/zpool.cache <pool>". This was previously not an 
> issue.
>
> The two servers were upgraded fro 9.1 to 10 on the same morning, 
> within minutes of each other. That is about it as far as 
> commonalities. Both have different drive types (900GB SAS vs 2TB 
> SATA), different controllers (Dell PERC (mfi) vs LSI (mps)), Dell vs 
> SuperMicro boards...
>
> We do use the aio kernel module, and as well as some sysctl and 
> loader.conf tuning. I've backed all of those out, so we're just 
> running a stock OS.
>
> Ideally, I would like to never run into this situation again. However, 
> I don't have any evidence to point to an upgrade misstep or some 
> catastrophic configuration error (kernel parameters, zpool create).
>
>
> Thank everyone,
> Mike C


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6054 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140604/710ffc06/attachment.bin>


More information about the freebsd-fs mailing list