Spurious reboot in 8.1-RELEASE when reading from ZFS pool with > 9 disks

Sean Thomas Caron scaron at umich.edu
Wed Oct 20 20:45:34 UTC 2010


Hi Jeremy,

Thanks for the very helpful response!

I added all debugging options that you specified to my kernel and  
rebuilt; then set the kernel parameters as you mention (I was being a  
bit lazy earlier when I called them sysctls; I always tuned them in  
loader.conf; just that you can view their values with sysctl).

Rebooted the system with the new kernel and set up a 11-disk zraid2  
pool again then started beating on it. At first it seemed to be a bit  
more resilient with this set of kernel parameters but eventually it  
too failed out.

Again I just got a straight up reboot, no debugger, no output to the  
console flashed by as far as I can tell.

I don't have a serial console hooked up right now but it's probably  
possible to do so through the ILOM or equivalent; I will have to look  
into that further.

This is pretty wierd.

I am thinking there might be some memory starting to go in this  
system; never seen failing memory in an ECC box cause reboots this  
consistently and only under such specific conditions but I suppose it  
isn't completely out of the question. I'll talk to my customer and see  
what they can do about the hardware; maybe they have some spares.

I will also try 8.1-STABLE when I have a chance and see if that works better.

But it's definitely helpful to know that folks have > 9 disk raidz  
pools up and running on FreeBSD 8.x with no trouble - that it "should  
work". And the list of tunables is very useful; nice to have something  
to work with that I can have a bit more confidence in outside of my  
own guessing :)

I will report back to the list when I have more information.

Thanks!

-Sean


Quoting Jeremy Chadwick <freebsd at jdc.parodius.com>:

> There are users here using FreeBSD ZFS with *lots* of disks (I think
> someone was using 32 disks at one point) reliably.  Some of them post
> here regularly (with other issues that don't consist of sporadic
> reboots).
>
> The kernel options may not be sufficient.  I'm used to using these:
>
> # Debugging options
> options         BREAK_TO_DEBUGGER       # Sending a serial BREAK drops to DDB
> options         KDB                     # Enable kernel debugger support
> options         KDB_TRACE               # Print stack trace  
> automatically on panic
> options         DDB                     # Support DDB
> options         GDB                     # Support remote GDB
>
> And in /etc/rc.conf, setting:
>
> ddb_enable="yes"
>
> Next: arc_max isn't "technically" a sysctl, meaning it can't be changed
> in real-time, so I'm not sure how you managed to do that.  Validation:
>
> sysctl: oid 'vfs.zfs.arc_max' is a read only tunable
> sysctl: Tunable values are set in /boot/loader.conf
>
> Your system may be reporting something relating to kmem exhaustion but
> is then auto-rebooting so fast that you can't see the message on VGA
> console.  Do you have serial console?
>
> Please try setting the following tunables in /boot/loader.conf and
> reboot the machine, then see if the same problem persists.
>
> vm.kmem_size="16384M"
> vfs.zfs.arc_max="14336M"
> vfs.zfs.prefetch_disable="1"
> vfs.zfs.zio.use_uma="0"
> vfs.zfs.txg.timeout="5"
>
> I would also advocate you try 8.1-STABLE as there have been many changes
> in ZFS since then (and I'm not just referring to the v15 import),
> including how the ARC gets sized/adjusted.  CURRENT is highly
> bleeding-edge, so I would start or stick with STABLE.
>
> Finally, there's always the possibility that the PSU has some sort of
> load problem with that many disks all being accessed at the same time.
> I imagine the power draw of that system is quite high.  I can't imagine
> Sun shipping a box with a insufficient PSU, but then again power draw
> changes depending on the RPM of the disks used and many other things.
>
> --
> | Jeremy Chadwick                                   jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
>
>




More information about the freebsd-stable mailing list