Filesystem snapshots dog slow

Tue Oct 16 04:30:47 PDT 2007

Since the snapshot code (e.g. mksnap_ffs(8) and friends) was introduced,
dump(8) was modified to nag you if you didn't use the -L argument.  "Um,
okay, I'd better use -L" is what came out of my mouth, and I'm sure a
lot of other administrators' when they saw this message.

But it seems the making a snapshot is an incredibly slow/intensive task.
The documentation I've read indicates that making a snapshot "is
incredibly fast" -- based on my experiences, it isn't.  At least it's no
where near as fast as, say, a Netapp filer.

I've found 3 threads (dating 2003, 2005, and 2007) about this problem:

http://lists.freebsd.org/pipermail/freebsd-current/2003-August/009135.html
http://lists.freebsd.org/pipermail/freebsd-fs/2005-July/001216.html
http://lists.freebsd.org/pipermail/freebsd-stable/2007-January/031882.html

This issue is still present on RELENG_7, and I can confirm it on
multiple machines (some running *completely* different hardware than
others).

osiris# df -ki /disk2
Filesystem  1024-blocks Used     Avail Capacity iused    ifree %iused  Mounted on
/dev/ad6s1d   236511738    4 217590796     0%       2 30570492    0%   /disk2

osiris# time mksnap_ffs /disk2 /disk2/mysnapshot
0.000u 1.012s 5:12.23 0.3%      5+1149k 7803+18819io 0pf+0w

While mksnap_ffs runs, the process remains in wdrain state.  gstat(8)
shows immense disk I/O.  ms/r occasionally jumps up to 1100 or higher,
but usually hovers around 40-60.

osiris# gstat -I500ms -f'ad6'
dT: 0.501s  w: 0.500s  filter: ad6
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    2     80     52    830   38.6     28    447   22.4  100.2| ad6
    2     80     52    830   38.6     28    447   22.4  100.2| ad6s1
    0      0      0      0    0.0      0      0    0.0    0.0| ad6s1c
    2     80     52    830   38.6     28    447   22.4  100.2| ad6s1d

Now for snapshot removal:

osiris# time rm /disk2/mysnapshot
override r--r-----  root/operator snapshot for /disk2/mysnapshot? y
0.000u 0.285s 1:58.03 0.2%      16+1161k 7456+7456io 0pf+0w

While rm runs, the process remains in biord state. 

During either of these operations, the system can occasionally go into a
"stalled" state, where any disk operations remain deadlocked until the
mksnap_ffs or rm are finished.

I ran a second mksnap_ffs "just to see" what happened.  Between the
first time and this time, *nothing* happened on the filesystem (no disk
reads or writes AFAIK):

osiris# time mksnap_ffs /disk2 /disk2/mysnapshot
0.016u 1.352s 10:13.73 0.2%     5+1164k 14501+27931io 0pf+0w

The time doubled.  This isn't good.

Disks are getting larger, filesystems growing, people storing more data.
Hitachi, for example, has guaranteed 4TB disks by the end of 2011.  If
this problem has sat idle for at least 4 years already, we'll be in a
lot of trouble come 2011.  And let's not forget that every piece of
FreeBSD documentation tells admins to "use dump, it's the best!".  This
issue is a good reason to consider using tools like rsync or tar
instead.  :-(

I will gladly work with anyone who wishes to tackle this, either by
providing hardware (MB/disks/etc.) for free, or by giving the individual
access to a box that has serial console + a serial debugger available.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |