Alternate Data Stream Support in FreeBSD (was Re: O_XATTR support in FreeBSD?)

Tue Nov 26 17:08:34 UTC 2013

On Nov 26, 2013, at 4:27 AM, Lionel Cons <lionelcons1972 at gmail.com> wrote:

>> I don’t know if I’d go so far as to say “you do not need more syscalls”;
>> there are additional functions for manipulating EAs that go well beyond
>> the Solaris extensions to the directory and file I/O functions.  Assuming you
>> want to be able to get/set as well as enumerate or remove EAs, then
>> you might just as well add getxattr(2), listxattr(2), removexattr(2), setxattr(2)
>> too and follow the herd (Linux and OS X, so far).
> 
> You mean 'follow the lemmings down into the abyss'? :)

Well, I don’t know that it’s an “abyss” - EAs may or may not be useful, depending on how you employ them!

In the first version of OS X to support them, in fact, I believe they were limited in size to 4883 bytes (don’t ask me why that number) and they were still used to apply various “tags” to files (Finder metadata, some index values into the search database, etc).  General pressure to use them for more things eventually got this size bumped up to 128K, and now it’s actually 2GB(!) (http://support.apple.com/kb/HT5983) so I think it’s fair to say that EAs in OS X are now essentially equivalent to forked files, more or less.

> Could we first agree what we are talking about, please? I'm a bit new
> to this thread, but AFAIK we are talking about the Windows Alternate
> Data Streams as they appear in networked filesystem like NFSv4 and
> CIFS and physical filesystems like NTFS, ZFS and Solaris UFS, right?
> ACLs have no direct relation to those streams.

Actually, I didn’t think we were talking about alternate data streams myself.  Conceptually they’re equivalent, I guess, but I’ve always through they were somewhat overkill and I’ve yet to encounter an application that seriously uses them.  I’m sure they’re out there somewhere, but even back in the days when EAs were limited to just over 4K, we found them very useful for what was essentially their original purpose - an extension to the file attribute data that Unix already proves.   The only reason that ACLs crept into the discussion is because of where they’re stored.  I don’t know about Linux, but Apple has chosen to store ACLs in EAs, which is pretty useful because this gives you an easy way of serializing the ACLs too - you just serialize them from a suitably privileged process.

The main point I was trying to make is that if you’re going to have EAs at all, you need to commit fully.  The various Unix tools need to support them (we’ve already talked about the archivers and compressors) and tools like ls(1) need to be able to show them on files.  You need a way of dealing with them on foreign filesystems that don’t support EAs.  Most folks just cram EAs into the filesystem, add a few decorations to existing system calls and then shout “done!” and do a victory dance.  Then when nobody actually uses EAs, they go “See?  I always told you EAs were crap!  Terrible idea!  Never should have added them!”

This is tantamount to building a car with an engine but no wheels, dashboard or steering wheel and then declaring that the world just isn’t ready for cars since they’re not buying yours.

I know you cite Solaris’ integration as an example of such a “full solution,  and maybe the “~@“ syntax was awesome in practice, I dunno, all I can say is that the only namespace trick we needed to pull at Apple were the AppleDouble (._) sidecar files.  There was an earlier filename/..rfork/ syntax for addressing resource forks, which predated EAs in OS X, and some folks used it quite a bit, but it was eventually deprecated in favor of the single sidecar file.  I never found a need to “cd” into the namespace of a file’s EAs when I had the xattr(1) command so handy for deleting / changing them, and ls’s -@ argument would also display them for me.  I suppose it is all a matter of taste.  If someone wanted to do the namespace thing in FreeBSD, I wouldn’t argue against it.

I also wouldn’t argue against fully parallel “forks” being a superset of EAs, since I guess at CERN where folks are routinely looking at Petabytes of data from a detector like ATLAS or CMS, anything that puts size constraints on their data is just the devil, but again, that wasn’t actually the point I was trying to make.  I was simply trying to say that NFSv4 or ZFS “native EA support” is the easy part.  The harder part is in making sure that the EAs don’t get stripped out in transit or during routine file manipulations, and this requires that everything from cp(1) to rsync(8) becomes EA-aware.  Most of the implementations I’ve seen don’t bother to do that last mile of integration, and as a result EAs are just basically untrustworthy beasts that users shy away from.

- Jordan