More than 32 CPUs under 8.4-P

Mon May 20 02:55:15 UTC 2013

Minutes after I typed that message 2x16 the system paniced with the
following back trace:

kdb_backtrace
panic
vdev_deadman
vdev_deadman
vdev_deadman
spa_deadman
softclock
intr_event_execute_handlers
ithread_loop
fork_exit
fork_trampoline

I had just created a memory disk when that happened:

root at iirc:~ # mdconfig -a -t swap -s 1g -u 1
root at iirc:~ # newfs -U /dev/md1
root at iirc:~ # mount /dev/md1 /mnt
root at iirc:~ # cp -p procstat kgdb /mnt
root at iirc:~ # cd /rescue/
root at iirc:/rescue # cp -p * /mnt

On Sun, 2013-05-19 at 18:45 -0700, Dennis Glatting wrote:
> On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote:
> > On May 19, 2013, at 11:51 AM, Dennis Glatting <freebsd at pki2.com> wrote:
> > 
> > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
> > > not hang under 8.4. This (and one other 4 socket) is a production
> > > system.
> > 
> > 	Can you be more specific, I have been running 9.0 and 9.1 systems with
> > multi-CPU and all ZFS with no (CPU related*) issues.
> > 
> 
> I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket
> populated. All are AMD CPUs of the 6200 series. Two of those
> multi-socketed systems are simply workstations and don't do much file
> I/O, so I have yet to see them fault.
> 
> The remaining three perform significant I/O in the 1-8TB (simultaneous)
> file range, including sorting, compression, backup, etc (ZFS compression
> is enabled on some data sets as is dedup on a few minor data sets). I
> also do iSCSI and NFS from one of these systems.
> 
> Simply, if I run 9.1 on those three busy systems ZFS will eventually
> hang under load (within ten hours to a few days) whereas it does not
> under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8
> cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII
> files generally causes a hang within 10-20 hours.
> 
> "Hang" means the system is alive and on the network but disk I/O has
> stopped. Run any command except statically linked executables on a
> memory volume and they will not run (no output or return to command
> prompt). This includes "reboot," which never really reboots.
> 
> The volumes where work is performed are typically 12-33TB RAIDz2
> volumes. For example:
> 
> root at mc:~ # zpool list disk-1
> NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> disk-1  16.2T  5.86T  10.4T    36%  1.32x  ONLINE  -
> 
> root at mc:~ # zpool status disk-1
>   pool: disk-1
>  state: ONLINE
>   scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55
> 2013
> config:
> 
> 	NAME        STATE     READ WRITE CKSUM
> 	disk-1      ONLINE       0     0     0
> 	  raidz2-0  ONLINE       0     0     0
> 	    da2     ONLINE       0     0     0
> 	    da3     ONLINE       0     0     0
> 	    da4     ONLINE       0     0     0
> 	    da7     ONLINE       0     0     0
> 	    da5     ONLINE       0     0     0
> 	    da6     ONLINE       0     0     0
> 	cache
> 	  da0       ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> > * I say no CPU related issues because I have run into SATA timeout
> > issues with an external SATA enclosure with 4 drives (I know, SATA port
> > expanders are evil, but it is my best option here). Sometimes the zpool
> > hangs hard, sometimes just becomes unresponsive for a while. My "fix",
> > such as it is, is to tune the zfs per vdev queue depth as follows:
> > 
> > vfs.zfs.vdev.min_pending="3"
> > vfs.zfs.vdev.max_pending="5"
> > 
> 
> I've not tried those. Currently, these are mine:
> 
> vfs.zfs.write_limit_override="1G"
> vfs.zfs.arc_max="8G"
> vfs.zfs.txg.timeout=15
> vfs.zfs.cache_flush_disable=1
> 
> # Recommended from the net
> # April, 2013
> vfs.zfs.l2arc_norw=0			# Default is 1
> vfs.zfs.l2arc_feed_again=0		# Default is 1
> vfs.zfs.l2arc_noprefetch=0		# Default is 0
> vfs.zfs.l2arc_feed_min_ms=1000		# Default is 200
> 
> 
> > The defaults are 5 and 10 respectively, and when I run with those I
> > have the timeout issues, but only under very heavy I/O load. I only
> > generate such load when migrating large amounts of data, which
> > thankfully does not happen all that often.
> > 
> 
> Two days ago when the 9.1 system hanged I was able to run a static
> procstat where it inadvertently(?) printed that da0 wasn't responsive on
> the console. Unfortunately I didn't have a static camcontrol ready so I
> was unable to query it.
> 
> That said, according to the criteria from
> https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS
> problem, yet hung it was.
> 
> I have since (today) updated the firmware of most of the devices in that
> system and it is currently running some tasks. Most of the disks in that
> system are Seagate but the un-updated devices include three WD disks
> (RAID1 OS and a swap disk) -- unupdated because I haven't been able to
> figure WD firmware download out) and a SSD where the manufacturer
> indicates the firmware diff is minor, though I plan to go back and flash
> it anyway.
> 
> If my 4x16 system ever finishes I will be updating its device's firmware
> too but it is an 8.4-P system and doesn't give me any trouble. Another
> 4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P
> it has been stable as a rock for the past 22 days often under heavy
> load.
> 
> 
> 
> 
> 
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"

-- 
Dennis Glatting <dg at pki2.com>