zpool create on md hangs

Curtis Villamizar curtis at ipv6.occnc.com
Sun Nov 16 05:23:10 UTC 2014

In message <546084FE.80300 at multiplay.co.uk>
Steven Hartland writes:
> On 10/11/2014 06:48, Andreas Nilsson wrote:
> > On Mon, Nov 10, 2014 at 7:37 AM, Curtis Villamizar <curtis at ipv6.occnc.com>
> > wrote:
> >
> >> The following shell program produces a hang.  Its reproducible (hangs
> >> every time).
> >>
> >>      #!sh
> >>
> >>      set -e
> >>      set -x
> >>
> >>      truncate -s `expr 10 \* 1024 \* 1024 \* 1024` /image-file
> >>      md_unit=`mdconfig -a -n -t vnode -f /image-file`
> >>      echo "md device is /dev/md$md_unit"
> >>      zpool create test md$md_unit
> >>
> >> The zpool command hangs.  Kill or kill -9 has no effect.  All
> >> filesystems are unaffected but any other zpool or zfs command will
> >> hang and be unkillable.  A reboot is needed.
> >>
> >> This is running on:
> >>
> >>     FreeBSD 10.0-STABLE (GENERIC) #0 r270645: Wed Aug 27 00:54:29 EDT 2014
> >>
> >> When I get a chance, I will try again with a 10.1 RC3 kernel I
> >> recently built.  If this still doesn't work, I'll build an r11 kernel
> >> since the code differs from 10.1, not having the svm code merged in.
> >> I'm asking before poking around further in case anyone has insights
> >> into why this might happen.
> >>
> >> BTW- The reason to create a zfs filesystem on an vnode type md is to
> >> create an image that can run under bhyve using a zfs root fs.  This
> >> works quite nicely for combinations geom types (gmirror, gstripe,
> >> gjournal, gcache) but zpool hangs when trying this with zfs.
> >>
> >> Curtis
> >>
> >> ps- please keep me on the Cc as I'm not subscribed to freebsd-fs.
> >>
> > Freezes here on 10.1-RC2-p1 (amd64) as well.
> > ^T says:
> > load: 0.21  cmd: zpool 74063 [zio->io_cv] 8.84r 0.00u 0.00s 0% 3368k
> >
> I suspect your just seeing the delay as it trim's the file and it will 
> complete in time.
> Try setting vfs.zfs.vdev.trim_on_init=0 before running the create and 
> see if it completes quickly after that.
> I tested this on HEAD and confirmed it was the case there.
>      Regards
>      Steve


Thanks for the hint.

I'm doing some testing so I'm doing this quite a bit but its
automated.  For a while I was continuing to just let it take 4-10
minutes.  The symptoms during that time when the trim is happenning
are any zpool or zfs commands hang and don't respond to a kill or even
a kill -9.  I've also had a few cases where a "shutdown -r now"
flushed buffers but wouldn't get to the reboot and had to be powered
off plus I had one apparent hang of the entire disk subsystem.

I'm currently using FreeBSD 10.1-PRERELEASE #0 r274470.

All of these symptons go away with vfs.zfs.vdev.trim_on_init=0 so I
put it in my sysctl.conf files.  Maybe it should be the default given
the severity of behavior with vfs.zfs.vdev.trim_on_init=1 (too late
for 10.1).  Comments in the code call it an "optimization".  Does
anyone know exactly what the trim does?  Anything useful or necessary?


[fyi- unrelated]  I'm performance testing configurations of disk with
a vm under a compile load and using make -j maxjobs with various
values of maxjobs.  So far under bhyve it takes about 50% longer than
native.  With native combinations of stripe, mirror, journal, cache,
and zfs vs ufs make little difference, about 5%.  Within a vm, a disk
stripe runs faster than mirror (as expected) and I'm still early in
testing but trying doing the mirror or stripe on the host vs on the vm
and other permutations.  I've been running mirrored disks since about
1994 (with the original vinum, then gvinum, then geom mirror, then zfs
mirror) but I've never taken the time to check performance.  I see a
15:1 difference with the CPUs I have (old single core inel no longer
used vs 4 core atom vs 4 core i3) but so far only 50% penalty for big
compiles in a vm vs same processor native.  Above -j 4 there is a
small performance gain with zfs (which is generally slightly slower)
but none for the others.  I did a fair amount of testing for native
disk.  I've only started testing vm disk permutations but in doing
this testing I'm learning a lot about bhyve and geom and zfs quirks.

More information about the freebsd-fs mailing list