Improving ZFS performance for large directories

Kevin Day toasty at dragondata.com
Tue Jan 29 23:20:19 UTC 2013


I'm trying to improve performance when using ZFS in large (>60000 files) directories. A common activity is to use "getdirentries" to enumerate all the files in the directory, then "lstat" on each one to get information about it. Doing an "ls -l" in a large directory like this can take 10-30 seconds to complete. Trying to figure out why, I did:

ktrace ls -l /path/to/large/directory
kdump -R |sort -rn |more

to see what sys calls were taking the most time, I ended up with:

 69247 ls       0.190729 STRU  struct stat {dev=846475008, ino=46220085, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1333196714, stime=1201004393, ctime=1333196714.547566024, birthtime=1333196714.547566024, size=30784, blksize=31232, blocks=62, flags=0x0 }
 69247 ls       0.180121 STRU  struct stat {dev=846475008, ino=46233417, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1333197088, stime=1209814737, ctime=1333197088.913571042, birthtime=1333197088.913571042, size=3162220, blksize=131072, blocks=6409, flags=0x0 }
 69247 ls       0.152370 RET   getdirentries 4088/0xff8
 69247 ls       0.139939 CALL  stat(0x800d8f598,0x7fffffffcca0)
 69247 ls       0.130411 RET   __acl_get_link 0
 69247 ls       0.121602 RET   __acl_get_link 0
 69247 ls       0.105799 RET   getdirentries 4064/0xfe0
 69247 ls       0.105069 RET   getdirentries 4068/0xfe4
 69247 ls       0.096862 RET   getdirentries 4028/0xfbc
 69247 ls       0.085012 RET   getdirentries 4088/0xff8
 69247 ls       0.082722 STRU  struct stat {dev=846475008, ino=72941319, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1348686155, stime=1348347621, ctime=1348686155.768875422, birthtime=1348686155.768875422, size=6686225, blksize=131072, blocks=13325, flags=0x0 }
 69247 ls       0.070318 STRU  struct stat {dev=846475008, ino=46211679, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1333196475, stime=1240230314, ctime=1333196475.038567672, birthtime=1333196475.038567672, size=829895, blksize=131072, blocks=1797, flags=0x0 }
 69247 ls       0.068060 RET   getdirentries 4048/0xfd0
 69247 ls       0.065118 RET   getdirentries 4088/0xff8
 69247 ls       0.062536 RET   getdirentries 4096/0x1000
 69247 ls       0.061118 RET   getdirentries 4020/0xfb4
 69247 ls       0.055038 STRU  struct stat {dev=846475008, ino=46220358, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1333196720, stime=1274282669, ctime=1333196720.972567345, birthtime=1333196720.972567345, size=382344, blksize=131072, blocks=773, flags=0x0 }
 69247 ls       0.054948 STRU  struct stat {dev=846475008, ino=75025952, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1351071350, stime=1349726805, ctime=1351071350.800873870, birthtime=1351071350.800873870, size=2575559, blksize=131072, blocks=5127, flags=0x0 }
 69247 ls       0.054828 STRU  struct stat {dev=846475008, ino=65021883, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1335730367, stime=1332843230, ctime=1335730367.541567371, birthtime=1335730367.541567371, size=226347, blksize=131072, blocks=517, flags=0x0 }
 69247 ls       0.053743 STRU  struct stat {dev=846475008, ino=46222016, mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=4294967295, atime=1333196765, stime=1257110706, ctime=1333196765.206574132, birthtime=1333196765.206574132, size=62112, blksize=62464, blocks=123, flags=0x0 }
 69247 ls       0.052015 RET   getdirentries 4060/0xfdc
 69247 ls       0.051388 RET   getdirentries 4068/0xfe4
 69247 ls       0.049875 RET   getdirentries 4088/0xff8
 69247 ls       0.049156 RET   getdirentries 4032/0xfc0
 69247 ls       0.048609 RET   getdirentries 4040/0xfc8
 69247 ls       0.048279 RET   getdirentries 4032/0xfc0
 69247 ls       0.048062 RET   getdirentries 4064/0xfe0
 69247 ls       0.047577 RET   getdirentries 4076/0xfec
(snip)

the STRU are returns from calling lstat().

It looks like both getdirentries and lstat are taking quite a while to return. The shortest return for any lstat() call is 0.000004 seconds, the maximum is 0.190729 and the average is around 0.0004. Just from lstat() alone, that makes "ls" take over 20 seconds.

I'm prepared to try an L2arc cache device (with secondarycache=metadata), but I'm having trouble determining how big of a device I'd need. We've got >30M inodes now on this filesystem, including some files with extremely long names. Is there some way to determine the amount of metadata on a ZFS filesystem?



More information about the freebsd-fs mailing list