(ZFS?): panic: lockmgr: locking against myself
peter.schuller at infidyne.com
Tue Jan 1 10:18:02 PST 2008
(quoting last post for convenience; more history at
> > vnode 0xffffff00037473e0: tag devfs, type VDIR
> > usecount 0, writecount 0, refcount 1 mountedhere 0xffffff0003745ca0
> > flags (VV_ROOT)
> > lock type devfs: EXCL (count 1) by thread 0xffffff00010e6680 (pid 1)
> Some additional facts:
> Looking at the printouts, there is always a sequence of three or more
> (three at least twice; more than three at least once) vrele():s of the same
> vnode, in both the successful case and the panicing case. There are no
> vrele():s of any other vnodes in either case.
> Inserting enter/exit debug printouts in mountcheckdirs() confirms that all
> calls occur within the bounds of a single call to mountcheckdirs(). Does
> not this imply there is some locking mismatch in the non-ZFS specific code?
> I must admit I find the locking confusing; with several locking/unlocking
> functions/macros intermixed at different levels in the callstack. My
> (incorrect) reading was that this panic should always be happening, which
> is obviously not the case.
> Running with vfs.zfs.debug=1 confirms that vdev_geom open/attach/detach is
> happening prior to any vrele() even in the panicing case (i.e., zfs pool
> discovery seems to complete).
> In the case of an expected provider not being found, vd->vdev_devid is NULL
> in vdev_geom_open(), based on the "provider not found" debug printout
> (perhaps normal?).
I *think* I just experienced the same problem on 7.0-BETA3, except the kernel
does not have WITNESS/INVARIANTS so I just get a hack instead of a panic. I
wanted to post with the information I have for completeness; I realize what
follows is a bunch of anecdotal mumbo-jumbo.
The boot-up process hangs right before the would-be 'trying to mount root
from....", after all the glabel tasting has completed.
This was on a completely different system than the one in the original post,
but it also has root-on-zfs (this time on a 5 disk raidz2). It's a dual core
amd64 machine with a low-end mobo and low-end SATA controllers (SiI and some
built-in nVidia chipset).
It all started when I was booting back into FreeBSD after having Windows
booted for a while. It wouldn't boot. If fiddled some wiht vfs.zfs.debug=1,
removing a cd ion the drive (in case it affected timing), but it did not
help. I did not try the boot-7-live cd trick this time as I did originally on
the other machine.
I looked carefully to make sure all drives were detected, including geom
tasting on all but one of them that are in the zfs pool. The I/O indicator
leds on the respective drives that ar part of the zfs pool did not indicate
any I/O after the hang. I waited 5+ minutes at least once in the hope that it
was a drive timing out.
After several attempts I turned off the machine and let it do a cold boot - at
this point the system booted fine.
This is different from before, in that previously the behavior was seemingly
triggered by changes in system configuration (loss of a drive, etc). This
time it was just a reboot. I *did* touch a bunch of cables in between, and
blew some air on components (for reasons not relating to this) which I
originally figured could explain the problem.
Before this incident, the system has booted with root-on-zfs several times (at
least 25, probably more like 50+) without any kind of problem, ever.
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller at infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 187 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20080101/99ad8bf4/attachment-0001.pgp
More information about the freebsd-current