ZFS and deadlock with {nullfs,NFS}
Dan Mack
mack at macktronics.com
Wed Jun 20 16:57:09 UTC 2007
On Wed, 20 Jun 2007, Kris Kennaway wrote:
<snip>
>
> 404 at the moment, but look for processes involving zil* in the
> backtrace. I had to disable zil (vfs.zfs.zil_disable=1 tunable) to
> prevent low-memory deadlocks on my machines. Since then it's been
> fine.
>
> You may also wish to use my patches (see the archives) to improve
> performance and low-memory behaviour.
>
> Kris
>
Does someone have these recommended sysctls embodied in an example
/boot/loader.conf yet? Here is mine, does it look reasonable to keep ZFS
from running into the kmem_ memory panics? I have no idea if I found all
of your recommendations so it would be nice if they were summarized in one
place.
| # /boot/loader.conf i386 / 1GB memory / SMP
| kern.maxvnodes="50000"
| vm.kmem_size_max="268435456"
| vfs.zfs.prefetch_disable="1"
| vfs.zfs.zil_disable="1"
+--------------------------
I just added vfs.zfs.zil_disable="1" today and we'll see if that helps.
I am running 7.0-CURRENT on a Dual Processor Dell 2450 (P3-1GHz) and
1024MB of real memory:
CPU: Intel Pentium III (993.33-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x686 Stepping = 6
Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,
MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory = 1073733632 (1023 MB)
avail memory = 1041432576 (993 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID: 1
cpu1 (AP): APIC ID: 0
I have 14 fibre channel drives on one loop being used by ZFS. I have a
single raidz and 3 mirrors. So far, I've not lost any data and I have a
bunch of things on ZFS including /usr/obj, /var/crash (hehe .. I might as
well use ZFS to store it's own crash dumps) and my postgres database which
I split the logs and data per the recommendations on Sun's ZFS site. I am
running some ruby on rails apps on the box as well and so far, no data
corruption issues.
Last night I had my first drive failure (see below) and will attempt
sparing it out tonight when I get home. And the end of this message, I've
attached the summary of the crashes I've had. I am hoping that the
vfs.zfs.zil_disable="1" sysctl will help with reducing these panics.
ZFS Configuration Info:
borg# zfs list
NAME USED AVAIL REFER MOUNTPOINT
m0 1.64G 31.8G 18K /m0
m0/ports 1.14G 31.8G 1.14G /usr/ports
m0/usr_src 510M 31.8G 510M /usr/src
m1 16.2M 16.8G 18K /m1
m1/pg_log 16.1M 16.8G 16.1M /pg/data/pg_xlog
m2 35.6M 33.4G 18K /m2
m2/pg_data 35.5M 33.4G 35.5M /pg/data
rz0 1.54G 165G 29.9K /rz0
rz0/crash 654M 165G 654M /var/crash
rz0/home 35.2M 165G 35.2M /home
rz0/usr_local 228M 165G 228M /usr/local
rz0/usr_obj 658M 165G 658M /usr/obj
borg# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
m0 34G 1.64G 32.4G 4% ONLINE -
m1 17.1G 16.3M 17.1G 0% ONLINE -
m2 34G 35.6M 34.0G 0% ONLINE -
rz0 204G 1.85G 202G 0% ONLINE -
borg# zpool status
pool: m0
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
m0 ONLINE 0 0 0
mirror ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 60 0 0
errors: No known data errors
pool: m1
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
m1 ONLINE 0 0 0
mirror ONLINE 0 0 0
da8 ONLINE 0 0 0
da9 ONLINE 0 0 0
errors: No known data errors
pool: m2
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
m2 ONLINE 0 0 0
mirror ONLINE 0 0 0
da10 ONLINE 0 0 0
da11 ONLINE 0 0 0
errors: No known data errors
pool: rz0
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rz0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
errors: No known data errors
---
I have been bit by kmem_map too small panic about 5 times. It sometimes
happens during buildworld:
borg# cat info.*
Dump header from device /dev/da14s1b
Architecture: i386
Architecture Version: 2
Dump Length: 345595904B (329 MB)
Blocksize: 512
Dumptime: Thu Jun 14 19:02:33 2007
Hostname: borg.macktronics.com
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 7.0-CURRENT #0: Thu Jun 14 12:40:34 CDT 2007
root at borg.macktronics.com:/usr/obj/usr/src/sys/BORG
Panic String: kmem_malloc(16384): kmem_map too small: 268419072 total
allocated
Dump Parity: 183602033
Bounds: 0
Dump Status: good
Dump header from device /dev/da14s1b
Architecture: i386
Architecture Version: 2
Dump Length: 343498752B (327 MB)
Blocksize: 512
Dumptime: Tue Jun 19 13:49:00 2007
Hostname: borg.macktronics.com
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 7.0-CURRENT #0: Thu Jun 14 12:40:34 CDT 2007
root at borg.macktronics.com:/usr/obj/usr/src/sys/BORG
Panic String: kmem_malloc(131072): kmem_map too small: 266354688 total
allocated
Dump Parity: 1409491284
Bounds: 1
Dump Status: good
Dan
More information about the freebsd-current
mailing list