ZFS crashes on heavy threaded environment

Tue Nov 18 10:43:01 PST 2008

For what's worth it, I have similar problems on a comparable system  
(amd64/8GB,
7.1-PRERELEASE #3: Sun Nov 16 13:39:43), which I wouldn't call
heavilly threaded yet (as there is only one mysql51 running,
and courier-mta/imap, max 15 users now).
Perhaps worth a note: Bjoern's multi-IP jail patches are applied on
this system.

The setup is so that one zfs filesystem is mounted into a jail handling
only mail (and for that: just the root of the mail files), and a  
script on
the main host rotates snapshots hourly (making a new one, and
destroying the oldest).

After about 8-24 hours of production:

- mysqld is stuck in sbwait state;
- messages start filling up with
   kernel: vm_thread_new: kstack allocation failed
- almost any attempt to fork a process fails with
   Cannot allocate memory.

No panic so far, at least since I've introduced  
vfs.zfs.prefetch_disable="1".
Before that, I experienced several panics upon shutdown.

If I still have an open shell, I can send around some -TERMs and
-KILLs and halfway get back control; after that, if I zfs umount -a
kernel memory usage drastically drops down, and I can resume
the services. However, not for long. After about 1-2 hrs of production
it starts whining again in the messages about kstack allocation failed,
and soon thereafter it all repeats. Only rebooting gives back another
12-24hrs of operation.

What I've tracked down so far:
- zfs destroy'ing old snapshots definitively makes those failures
pop up earlier
- I've been collecting some data shortly around the memory
problems, which I post below.

Since this is a production machine (I know, I shoudn't - but hey,
you made us lick blood and now we ended up wanting more! So,
yes, I confirm, you definitively _are_ evil! ;)), I'm almost
ready to move that back to UFS.

But if it can be useful for debugging, I would be willing to set up a  
zabbix
agent or such to track whichever values could be useful over time for  
a day or two.
If on the other hand these bugs (leaks, or whatever) are likely to
be solved in the recent commit, I'll just move back to UFS until
they're ported to -STABLE.

Here follows some data about memory usage (strangely, I never
saw this even halfway reaching 1.5 GB, but it's really almost
voodoo to me so I leave the analysis up to others):

TEXT=`kldstat | tr a-f A-F | awk 'BEGIN {print "ibase=16"}; NR > 1  
{print $4}' | bc | awk '{a+=$1}; END {print a}'`
DATA=`vmstat -m | sed 's/K//' | awk '{a+=$3}; END {print a*1024}'`
TOTAL=`echo $DATA $TEXT | awk '{print $1+$2}'`

TEXT=13102280, 12.4953 MB
DATA=470022144, 448.248 MB
TOTAL=483124424, 460.743 MB

vmstat -m | grep vnodes
kern.maxvnodes: 100000
kern.minvnodes: 25000
vfs.freevnodes: 2380
vfs.wantfreevnodes: 25000
vfs.numvnodes: 43982

As said, the box has 8 GB of RAM, the following loader.conf,
and at the time of the lockups there were about 5GB free
userland memory available.

my loader.conf:
vm.kmem_size="1536M"
vm.kmem_size_max="1536M"
vfs.zfs.arc_min="512M"
vfs.zfs.arc_max="768M"
vfs.zfs.prefetch_disable="1"

as for the filesystem, I only changed the recordsize and
the mountpoint, the rest is default:

[horkheimer:lopez] root# zfs get all hkpool/mail
NAME         PROPERTY       VALUE                  SOURCE
hkpool/mail  type           filesystem             -
hkpool/mail  creation       Fri Oct 31 13:28 2008  -
hkpool/mail  used           5.50G                  -
hkpool/mail  available      386G                   -
hkpool/mail  referenced     4.33G                  -
hkpool/mail  compressratio  1.05x                  -
hkpool/mail  mounted        yes                    -
hkpool/mail  quota          none                   default
hkpool/mail  reservation    none                   default
hkpool/mail  recordsize     4K                     local
hkpool/mail  mountpoint     /jails/mail/mail       local
hkpool/mail  sharenfs       off                    default
hkpool/mail  checksum       on                     default
hkpool/mail  compression    on                     local
hkpool/mail  atime          on                     default
hkpool/mail  devices        on                     default
hkpool/mail  exec           on                     default
hkpool/mail  setuid         on                     default
hkpool/mail  readonly       off                    default
hkpool/mail  jailed         off                    local
hkpool/mail  snapdir        hidden                 default
hkpool/mail  aclmode        groupmask              default
hkpool/mail  aclinherit     secure                 default
hkpool/mail  canmount       on                     default
hkpool/mail  shareiscsi     off                    default
hkpool/mail  xattr          off                    temporary
hkpool/mail  copies         1                      default

the pool is using a partition on a hardware RAID1:

[horkheimer:lopez] root# zpool status
   pool: hkpool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         hkpool      ONLINE       0     0     0
           da0s1f    ONLINE       0     0     0

Regards and thanx a lot for bringing on zfs,

Lorenzo

On 18.11.2008, at 10:20, Chao Shin wrote:

> On Mon, 17 Nov 2008 23:58:35 +0800，Pawel Jakub Dawidek <pjd at freebsd.o 
> rg> wrote:
>
>> On Thu, Nov 13, 2008 at 06:53:41PM -0800, Xin LI wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Xin LI wrote:
>>> > Hi, Pawel,
>>> >
>>> > We can still reproduce the ZFS crash (threading+heavy I/O load)  
>>> on a
>>> > fresh 7.1-STABLE build, in a few minutes:
>>> >
>>> > /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s  
>>> 30g -i
>>> > 0 -i 1 -i 2 -i 8 -+p 70 -C
>>> >
>>> > I have included a backtrace output from my colleague who has his  
>>> hands
>>> > on the test environment.  Should there is more information  
>>> necessary
>>> > please let us know and we wish to provide help on this.
>>>
>>> Further datapoint.  The system used to run with untuned  
>>> loader.conf, and
>>> my colleague just reported that with the following loader.conf, the
>>> problem can be triggered sooner:
>>>
>>> vm.kmem_size_max=838860800
>>> vm.kmem_size_scale="2"
>>>
>>> The system is running FreeBSD/amd64 7.1-PRERELEASE equipped with  
>>> 8GB of
>>> RAM with GENERIC kernel.
>>
>> With new ZFS I get:
>>
>> Memory allocation failed:: Cannot allocate memory
>>
>> Is this expected?
>>
>
> At first, Congratulations to you, thanks to your works, well done!
>
> I used this command on a FreeBSD 7.1-PRERELEASE amd64 box with 8GB  
> mem, isn't got output like that, but kernel panic.
> Maybe you should lower the threads and file size, for example:
>
> /usr/local/bin/iozone -M -e -+u -T -t 64 -S 4096 -L 64 -r 4k -s 2g - 
> i 0 -i 1 -i 2 -i 8 -+p 70 -C
>
> Actually, we had used this command to test a 8-current with zfs v12  
> patch on July, there is no more panic. So we hope
> zfs v13 can MFC as soon as possible, because we really need it now.
> -- 
> The Power to Serve
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org 
> "