Looking for a Text on ZFS

Mon Feb 4 04:15:58 PST 2008

On Sun, 3 Feb 2008 17:55:12 +0100 (CET) Wojciech Puchar wrote:

> that's like 64-bit soundcards that have to be "better" than 32-bit, while 
> most of them was unable to actually get past 13-14 bit (most past 12) with 
> it's signal to noise ratio.

Maybe that's not quite the same thing. :-)

However. Even a 64bit filesystem still has gigantic reserves of space and
although filling that may not cause the oceans to boil, any storage device
that can actually sore all the date that a 64bit fs can allocate will be
"pretty big" in terms of volume and mass and will also use a good deal of
energy - even if this is calculated in the same minimalistic way as for
ZFS: http://en.wikipedia.org/wiki/Zfs#Capacity

Now I am not sure if the world actually needs a file system that can never
be filled - with the limits not made by any feable estimates like we
thought we'd never get a 1GB drive full, but by quantum physics. But the
fact that it's maximum theoretical size has "a few reserves" isn't a
problem in itself. I have serious doubts that the computers of today are
ready for the overhead at all and the overhead is worth the bother.

> 1) you make create 1000 of "filesystems" without partitioning. so lots of 
> "admins" that think more partitions=better are happy. you may set quota 
> for each "filesystem"

Well, actually I am an admin who believes this within limits. I have
seperate file systems for /, /usr, /var, /tmp, /home and /usr/obj. The
reasons for this are numerous. I have /usr/obj on a different drive than
/usr to spread the load while making worlds and I mount /usr/obj
asynchronously to increase write speed. With several filesystems I can
spread to load the way I want it and decide where the data goes. And one
broken fs doesn't screw up the others in the process.

I do know the drawbacks of this: Storage is pretty static. Correcting
wrong estimates about the needed fs-sizes is a big problem. That is why I
keep /usr/home on one big fs. If the users require (for example) 20MB
each, then it doesn't matter if one user needs 25MB, as long as 5 others
only use 24.

If ZFS gives us the best of both worlds, that would actually be an
improvement.

> 2) it takes many drives to the pool and you may add then new drives.
> same as gconcat+growfs.

I read about this. However, I didn't find anything conclusive as to how
well the drives can still live on their own if they are ever seperated.
Now I don't think they will be addressed as a RAID0 with all the risks of
that. But what happens if one of four drives breaks down? Does it make a
difference, if the broken drive is the first one, the last one or a middle
one?

> 3) it doesn't have per user quota, which creates a problem that is 
> "solved" by 1), and you have to create at least one filesystem/user, which 
> then is said to relieve admininstrator from work ;)

This doesn't have to be a problem either. Quota are used instead of
partitions to tackle two problems: The number of partitions is very
limited and resizing a partition is a major issue. By changing the quota
you can give one user (or one service) more room and take it away from
some of the others that seem to need less than was anticipated.

If each user or service can be confined to it's own fs, that would also be
good. A newsserver runnung with tradspool needs lots of inodes, most other
applications far less. I do see a drawback though: If you change the size
of the filesystems a few times, you could wind up with a new sort of
fragmentation. New because this sort (a filesystem that is patched
together over a drive) hasn't really been encountered yet and it will be
very interesting to see what effects this may have.

> 4) ZFS says that hardware checksums are not enough and disk hardware may 
> be buggy so then "solve" this problem checking everything with CPU.

This also creates a lot of overhead and CPU load. I tried this with GELI
on a fs that needed to be intact in a paranoid sense - I get like that
sometimes. :-) I did it once and once only. The performance was just not
good enough. Granted, I didn't do this on a really new computer but I'm
not likely to through away all my old ones either, just so my paranoia can
be met with a good speed. :-)

> while i've had failing drives many times i never seen it reading bad data 
> and not reporting error.

Same here. Since I use HDs on my computers, I have had about 20 to 25
drives break down over the years. Ok, I used many of the drives long after
other people took similar drives out of their machines and used them as
door stops. Basicly, I made these drives work until they dropped dead. :-)
None of these drives *ever* gave strange data back. The only time I had
that was when the driver for a controller was broken and that issue was
there right from the beginning.

> 5) you don't have to wait for fsck. one of the few real adventages.
> Anyway - FreeBSD doesn't crash like windoze, so it's not that big thing.

Wrong! Crashed accure and they do that quite frequently. Even if FreeBSD
is stable in itself, a blackout, an idiot user or broken hardware can
cause a system to crash. I had a S-ATA cable in one of my machines that
wasn't too tight. The light vibrations of the drive caused it to
disconnect after a while. You couldn't see and difference, you couldn't
feel any either and it took me a long time to figure out what the problem
was. The cable wasn't all that loose that it felt funny.

The computer crashed several times before I changed the cable. I was quite
glad that FreeBSD always managed to recover from that without any
problems.

> 7) there is no per file encryption, while it's said it will be SOON ready.

I don't really see that to be the file system's job anyway.

> 8) ZFS is clear winner on artifical tests like creating miliion of small 
> files and then deleting them etc..

The key word being "artificial".

> 9) ZFS is very fast, just add more RAM and faster CPU.

Isn't everything? :-)

> there was a lot of excitement here after ZFS was ported, but i think it's 
> time too see that 20 (or more?) year old UFS is still a winner.

I intend to stick with UFS for a fair while yet. I does it's job and
that's what I want it to do.

> i think some changes in UFS, like larger cylinder groups (so there won't 
> be 10000 of then on big filesystem), possibly dynamic allocation of 
> inodes, would be good. but not critical :)

There is always something to improve. I personally would like more flexible
rights, so I can decide who gets what (sort of) access to a file or
directory in more detail.

Regards
Chris