Spurious reboot in 8.1-RELEASE when reading from ZFS pool with
> 9 disks
Sean Thomas Caron
scaron at umich.edu
Wed Oct 20 20:45:34 UTC 2010
Hi Jeremy,
Thanks for the very helpful response!
I added all debugging options that you specified to my kernel and
rebuilt; then set the kernel parameters as you mention (I was being a
bit lazy earlier when I called them sysctls; I always tuned them in
loader.conf; just that you can view their values with sysctl).
Rebooted the system with the new kernel and set up a 11-disk zraid2
pool again then started beating on it. At first it seemed to be a bit
more resilient with this set of kernel parameters but eventually it
too failed out.
Again I just got a straight up reboot, no debugger, no output to the
console flashed by as far as I can tell.
I don't have a serial console hooked up right now but it's probably
possible to do so through the ILOM or equivalent; I will have to look
into that further.
This is pretty wierd.
I am thinking there might be some memory starting to go in this
system; never seen failing memory in an ECC box cause reboots this
consistently and only under such specific conditions but I suppose it
isn't completely out of the question. I'll talk to my customer and see
what they can do about the hardware; maybe they have some spares.
I will also try 8.1-STABLE when I have a chance and see if that works better.
But it's definitely helpful to know that folks have > 9 disk raidz
pools up and running on FreeBSD 8.x with no trouble - that it "should
work". And the list of tunables is very useful; nice to have something
to work with that I can have a bit more confidence in outside of my
own guessing :)
I will report back to the list when I have more information.
Thanks!
-Sean
Quoting Jeremy Chadwick <freebsd at jdc.parodius.com>:
> There are users here using FreeBSD ZFS with *lots* of disks (I think
> someone was using 32 disks at one point) reliably. Some of them post
> here regularly (with other issues that don't consist of sporadic
> reboots).
>
> The kernel options may not be sufficient. I'm used to using these:
>
> # Debugging options
> options BREAK_TO_DEBUGGER # Sending a serial BREAK drops to DDB
> options KDB # Enable kernel debugger support
> options KDB_TRACE # Print stack trace
> automatically on panic
> options DDB # Support DDB
> options GDB # Support remote GDB
>
> And in /etc/rc.conf, setting:
>
> ddb_enable="yes"
>
> Next: arc_max isn't "technically" a sysctl, meaning it can't be changed
> in real-time, so I'm not sure how you managed to do that. Validation:
>
> sysctl: oid 'vfs.zfs.arc_max' is a read only tunable
> sysctl: Tunable values are set in /boot/loader.conf
>
> Your system may be reporting something relating to kmem exhaustion but
> is then auto-rebooting so fast that you can't see the message on VGA
> console. Do you have serial console?
>
> Please try setting the following tunables in /boot/loader.conf and
> reboot the machine, then see if the same problem persists.
>
> vm.kmem_size="16384M"
> vfs.zfs.arc_max="14336M"
> vfs.zfs.prefetch_disable="1"
> vfs.zfs.zio.use_uma="0"
> vfs.zfs.txg.timeout="5"
>
> I would also advocate you try 8.1-STABLE as there have been many changes
> in ZFS since then (and I'm not just referring to the v15 import),
> including how the ARC gets sized/adjusted. CURRENT is highly
> bleeding-edge, so I would start or stick with STABLE.
>
> Finally, there's always the possibility that the PSU has some sort of
> load problem with that many disks all being accessed at the same time.
> I imagine the power draw of that system is quite high. I can't imagine
> Sun shipping a box with a insufficient PSU, but then again power draw
> changes depending on the RPM of the disks used and many other things.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, USA |
> | Making life hard for others since 1977. PGP: 4BD6C0CB |
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
>
>
More information about the freebsd-stable
mailing list