ZFS buggy in CURRENT? Stuck in [zio->io_cv] forever!

O. Hartmann ohartman at zedat.fu-berlin.de
Mon Oct 28 19:00:33 UTC 2013


On Sun, 27 Oct 2013 16:32:13 -0000
"Steven Hartland" <killing at multiplay.co.uk> wrote:


Hello all,

after a third attempt, I realised that some remnant labels seem to
cause the problem.

Those labels didn't go away with "zpool create -f" or "zfs clearlabel
 provider", I had to issue "zfs destroy -F provider" to ensure that
 everything is cleared out.

After the last unsuccessful attempt, I waited 14 hours for the "busy
drives" as reported and they didn't stop doing something after that
time, so I rebooted the box.

Besides the confusion about how to proper use ZFS (I miss a
documentation as a normal user and not being considered a core
developer when using ZFS, several BLOGs have outdated data), there is
still this issue with this nasty blocking of the whole system, only
solveable by a hard reset.

After the pool has been successfully created and after a snapshot has
been received via -vdF option, a reimport of the pool wasn't possible
as described below and any attempt to have pools listed for import
(zfs import) ended up in a stuck console, uninteruptable by no kill or
Ctrl-C. The demaged pool's drives showed some action, but even the
pools considered unharmed didn't show up.

This total-blockade also prevented the system from properly rebooting
- a "shutdown -r" or "reboot" ended up in waiting for eternity after
  the last block has been synchronised - only power off or full reset
  could bring the box to life again. I think this is not intended and
  can be considered a bug?

Thanks for the patience.

oh


> 
> ----- Original Message ----- 
> From: "O. Hartmann" <ohartman at zedat.fu-berlin.de>
> > 
> > I have setup a RAIDZ pool comprised from 4 3TB HDDs. To maintain 4k
> > block alignment, I followed the instructions given on several sites
> > and I'll sketch them here for the protocol.
> > 
> > The operating system is 11.0-CURRENT AND 10.0-BETA2.
> > 
> > create a GPT partition on each drive and add one whole-covering
> > partition with the option
> > 
> > gpart add -t freebsd-zfs -b 1M -l disk0[0-3] ada[3-6]
> > 
> > gnop create -S4096 gtp/disk[3-6]
> > 
> > Because I added a disk to an existing RAIDZ, I exported the former
> > ZFS pool, then I deleted on each disk the partition and then
> > destroyed the GPT scheme. The former pool had a ZIL and CACHE
> > residing on the same SSD, partioned. I didn't kill or destroy the
> > partitions on that SSD. To align 4k blocks, I also created on the
> > existing gpt/log00 and gpt/cache00 via 
> > 
> > gnop create -S4096 gpt/log00|gpt/cache00
> > 
> > the NOP overlays.
> > 
> > After I created a new pool via zpool create POOL gpt/disk0[0-3].nop
> > log gpt/log00.nop cache gpt/cache00.nop
> 
> You don't need any of the nop hax in 10 or 11 any more as it has
> proper sector size detection. The caviate for this is when you have a
> disk which adervtises 512b sectors but is 4k and we dont have a 4k
> quirk in the kernel for it yet.
> 
> If you anyone comes across a case of this feel free to drop me the
> details from camcontrol <identify|inquiry> <device>
> 
> If due to this you still need to use the gnop hack then you only need
> to apply it to 1 device as the zpool create uses the largest ashift
> from the disks.
> 
> I would then as the very first step export and import as at this point
> there is much less data on the devices to scan through, not that
> this should be needed but...
> 
> 
> > I "received" a snapshot taken and sent to another storage array,
> > after I the newly created pool didn't show up any signs of illness
> > or corruption.
> > 
> > After ~10 hours of receiving the backup, I exported that pool
> > amongst the backup pool, destroyed the appropriate .nop device
> > entries via 
> > 
> > gnop destroy gpt/disk0[0-3]
> > 
> > and the same for cache and log and tried to check via 
> > 
> > zpool import
> > 
> > whether my pool (as well as the backup pool) shows up. And here the
> > nasty mess starts!
> > 
> > The "zpool import" command issued on console is now stuck for hours
> > and can not be interrupted via Ctrl-C! No pool shows up! Hitting
> > Ctrl-T shows a state like
> > 
> > ... cmd: zpool 4317 [zio->io_cv]: 7345.34r 0.00 [...]
> > 
> > Looking with 
> > 
> > systat -vm 1
> > 
> > at the trhoughput of the CAM devices I realise that two of the four
> > RAIDZ-comprising drives show activities, having 7000 - 8000 tps and
> > ~ 30 MB/s bandwidth - the other two zero!
> > 
> > And the pool is still inactive, the console is stuck.
> > 
> > Well, this made my day! At this point, I try to understand what's
> > going wrong and try to recall what I did the last time different
> > when the same procedure on three disks on the same hardware worked
> > for me.
> > 
> > Now after 10 hours copy orgy and the need for the working array I
> > start believing that using ZFS is still peppered with too many
> > development-like flaws rendering it risky on FreeBSD. Colleagues
> > working on SOLARIS on ZFS I consulted never saw those
> > stuck-behaviour like I realise this moment.
> 
> While we only run 8.3-RELEASE currently, as we've decided to skip 9.X
> and move straight to 10 once we've tested, we've found ZFS is not only
> very stable but it now become critical to the way we run things.
> 
> > I don not want to repeat the procedure again. There must be a
> > possibility to import the pool - even the backup pool, which is
> > working, untouched by the work, should be able to import - but it
> > doesn't. If I address that pool, while this crap "zpool import"
> > command is still blocking the console, not willing to die even with
> > "killall -9 zpool", I can not import the backup pool via "zpool
> > import BACKUP00". The console gets stuck immediately and for the
> > eternity without any notice. Htting Ctrl-T says something like 
> > 
> > load: 3.59  cmd: zpool 46199 [spa_namespace_lock] 839.18r 0.00u
> > 0.00s 0% 3036k
> > 
> > which means I can not even import the backup facility and this means
> > really no fun.
> 
> I'm not sure there's enough information here to determine where any
> issue may lie, but as a guess it could be that ZFS is having issues
> locating the one change devices and is scanning the entire disk to
> try and determine that. This would explain the IO on the one device
> but not the others.
> 
> Did you per-chance have one of the disks in use for something else
> and hence it may have old label information in it that wasn't cleaned
> down?
> 
>     Regards
>     Steve
> 
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd.
> and the person or entity to whom it is addressed. In the event of
> misdirection, the recipient is prohibited from using, copying,
> printing or otherwise disseminating it or any information contained
> in it. 
> 
> In the event of misdirection, illegible or incomplete transmission
> please telephone +44 845 868 1337 or return the E.mail to
> postmaster at multiplay.co.uk.
> 
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe at freebsd.org"


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20131028/ad5bd9d7/attachment.sig>


More information about the freebsd-current mailing list