Reproducable ZFS crash when starting a jail in 8-stable

Martin Matuska mm at FreeBSD.org
Sat Jun 29 22:07:49 UTC 2013


Fixed in r252380 (head), MFC scheduled for July 2, 2013

On 2013-06-29 12:54, Andreas Longwitz wrote:
> The problem occurs after an update of 8-stable from r248120 to r252111.
>
> My server has system disks da0, da1 with gmirror/gjournal for rootfs,
> usr, var and home partitions and glabeled data disks da2, da3 with zfs
> for prod and backup. Applications run in two jails on the zfs disks with
> nullfs mounts:
>
> At boot the servers crashs when /etc/rc.d/jail tries to start the jails,
> on the console I see (I use ddb.conf to handle crash by ddb):
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x0
> fault code              = supervisor read instruction, page not present
> instruction pointer     = 0x20:0x0
> stack pointer           = 0x28:0xffffff8245853930
> frame pointer           = 0x28:0xffffff82458539e0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 4411 (initial thread)
> [thread pid 4411 tid 100460 ]
> Stopped at      0:      *** error reading from address 0 ***
> db:0:kdb.enter.default> watchdog
> No argument provided, disabling watchdog
> db:0:kdb.enter.default>  call doadump
> Dumping 472 out of 8179 MB:..4%..11%..21%..31%..41%..51%..61%..72%..82%..92%
> Dump complete
>
> From kgdb output I see pid 4411 is the zfs/initial thread:
>
> (kgdb) where
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:266
> #1  0xffffffff801f877c in db_fncall (dummy1=<value optimized out>,
> dummy2=<value optimized out>,
>     dummy3=<value optimized out>, dummy4=<value optimized out>) at
> /usr/src/sys/ddb/db_command.c:548
> #2  0xffffffff801f8a2d in db_command (last_cmdp=0xffffffff8086b5c0,
> cmd_table=<value optimized out>, dopager=0)
>     at /usr/src/sys/ddb/db_command.c:445
> #3  0xffffffff801fd0e3 in db_script_exec (scriptname=0xffffffff80657b9e
> "kdb.enter.default", warnifnotfound=0)
>     at /usr/src/sys/ddb/db_script.c:302
> #4  0xffffffff801fd1de in db_script_kdbenter (eventname=<value optimized
> out>) at /usr/src/sys/ddb/db_script.c:325
> #5  0xffffffff801fadc4 in db_trap (type=<value optimized out>,
> code=<value optimized out>)
>     at /usr/src/sys/ddb/db_main.c:230
> #6  0xffffffff80432981 in kdb_trap (type=12, code=0,
> tf=0xffffff8245853880) at /usr/src/sys/kern/subr_kdb.c:654
> #7  0xffffffff805dbbed in trap_fatal (frame=0xffffff8245853880,
> eva=<value optimized out>)
>     at /usr/src/sys/amd64/amd64/trap.c:844
> #8  0xffffffff805dbf6e in trap_pfault (frame=0xffffff8245853880,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
> #9  0xffffffff805dc32b in trap (frame=0xffffff8245853880) at
> /usr/src/sys/amd64/amd64/trap.c:457
> #10 0xffffffff805c2534 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:228
> #11 0x0000000000000000 in ?? ()
>
> (kgdb) info thread
> * 412 Thread 100460 (PID=4411: zfs/initial thread)  doadump () at
> /usr/src/sys/kern/kern_shutdown.c:266
>   411 Thread 100461 (PID=4404: sh)  sched_switch (td=0xffffff009e26d000,
> newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
> ....
> 221 Thread 100272 (PID=7: zfskern/txg_thread_enter)  sched_switch
> (td=0xffffff0002c95000, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   220 Thread 100271 (PID=7: zfskern/txg_thread_enter)  sched_switch
> (td=0xffffff0002c95470, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   219 Thread 100069 (PID=7: zfskern/l2arc_feed_thread)  sched_switch
> (td=0xffffff0002957000, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   218 Thread 100068 (PID=7: zfskern/arc_reclaim_thread)  sched_switch
> (td=0xffffff0002957470,
>     newtd=<value optimized out>, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1932
> ...
>  156 Thread 100270 (PID=0: kernel/zfs_vn_rele_taskq)  sched_switch
> (td=0xffffff0002c958e0, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   155 Thread 100269 (PID=0: kernel/zio_ioctl_intr)  sched_switch
> (td=0xffffff0002d92470, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   154 Thread 100268 (PID=0: kernel/zio_ioctl_issue)  sched_switch
> (td=0xffffff0002d9e000, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   153 Thread 100267 (PID=0: kernel/zio_claim_intr)  sched_switch
> (td=0xffffff0002d9e8e0, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
>   152 Thread 100266 (PID=0: kernel/zio_claim_issue)  sched_switch
> (td=0xffffff0002d9a8e0, newtd=<value optimized out>,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
> ....
>
> From the kerneldump I can give backtraces of all threads or any other
> information.
>
> Some more informations:
>
> === root at serv07 (pts/1) -> gmirror status
>          Name    Status  Components
> mirror/gmsv07  COMPLETE  da0 (ACTIVE)
>                          da1 (ACTIVE)
>
> === root at serv07 (pts/1) -> glabel status
>           Name  Status  Components
> label/9241A7D4     N/A  da2
> label/C2477N17     N/A  da3
>
> === root at serv07 (pts/2) -> zpool status
>   pool: mpool
>  state: ONLINE
>   scan: none requested
> config:
>
>         NAME                STATE     READ WRITE CKSUM
>         mpool               ONLINE       0     0     0
>           mirror-0          ONLINE       0     0     0
>             label/9241A7D4  ONLINE       0     0     0
>             label/C2477N17  ONLINE       0     0     0
>
> errors: No known data errors
>
> === root at serv07 (pts/2) -> zpool list
> NAME                    USED  AVAIL  REFER  MOUNTPOINT
> mpool                   108G   806G    31K  /mpool
> mpool/backup            545M   806G   485M  /backup
> mpool/jail_deb_backup    33K   806G    33K  /backup/jail/deb
> mpool/jail_deb_prod    32,6G   806G  32,6G  /prod/jail/deb
> mpool/jail_pvz_backup    32K   806G    32K  /backup/jail/pvz
> mpool/jail_pvz_prod    54,7G   806G  54,7G  /prod/jail/pvz
> mpool/prod             20,0G   806G  20,0G  /prod
>
> cat /etc/fstab.deb (fstab.pvz analogue):
> # Device           Mountpoint        FStype  Options   Dump    Pass#
> /usr/jail/deb      /jail/deb/usr     nullfs  rw        0       0
> /var/jail/deb      /jail/deb/var     nullfs  rw        0       0
> /home/jail/deb     /jail/deb/home    nullfs  rw        0       0
> /tmp/jail/deb      /jail/deb/tmp     nullfs  rw        0       0
> /prod/jail/deb     /jail/deb/prod    nullfs  rw        0       0
> /backup/jail/deb   /jail/deb/backup  nullfs  rw        0       0
>
> On server without zfs the problem does not exist, jails on r252111
> are running fine.
>
> Andreas Longwitz
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list