GSoC: Semantic File System

Robert Watson rwatson at FreeBSD.org
Thu Apr 2 10:26:10 PDT 2009


On Thu, 2 Apr 2009, Gabriele Modena wrote:

> On Sun, Mar 22, 2009 at 6:52 PM, Robert Watson <rwatson at freebsd.org> wrote:
>> We are certainly not uninterested in projects along these lines, but I 
>> think the trick will be creating a convincing proposal that argues that (a) 
>> you can do the work in a summer, (b) there's a compelling usage case for 
>> including the results in FreeBSD, and (c) find a mentor who can supervise 
>> you in this project.
>
> Thanks, I will keep it on mind when writing the proposal. How do you suggest 
> to proceed for finding a mentor?
>
> By the way, this is a project that I'm very probably going to carry on even 
> without GSoC support (even though that would be very useful).

Well, I think the first step is to write the proposal, and we can see about 
shopping it around for a potential mentor.

>> What sort of semantic file system do you have in mind?  How would you feel 
>> about a middle-ground project along the lines of Mac OS X Spotlight or 
>> similar efficient userspace indexing of a file system based on feedback 
>> from the file system about what has changed, or something BeOS-like, in 
>> which indexing takes place for extended attributes rather than for 
>> contents?
>
> In this moment I am considering also an userspace approach similar to 
> Spotlight/Beagles, but I don't know how I could propose this as a FreeBSD 
> GSoC project.

I think that would make a fine GSoC proposal.  Keep in mind that one of the 
premises of Spotlight is the fsevents kernel feature, and fseventsd, which 
allow Spotlight to subscribe to changes in trees and kick off reindexing as 
required.  Porting the fsevents API to FreeBSD is fairly straight forward, 
with one exception: HFS+ offers a much more reliable notion of vnode->path 
mapping, but it would be interesting to see how well our current vnode->path 
mapping mechanisms would suffice in practice (since a lot of the edge cases 
that don't work well with our mapping system are exactly that -- edge cases).

Between kernel and userspace parts there's quite a bit to do, but one 
possibility would be to borrow parts from Mac OS X/etc that we need.  For 
example, do a literal port of the fsevents mechanism from XNU, provide our own 
implementation that provides a similar API, or provide a new mechanism that 
meets fseventd's semantic requirements for monitoring.

> What I have in mind at the moment would be an indexing based on contents 
> rather than extended fs attributes. I did not know about the BeOS semantics 
> capabilities, I will surely have a look at that.

I'm probably blending reality with imagination here, but my vague recollection 
is that the model was a slightly different blend of user vs. application 
involvement in indexing.  For systems like Spotlight, there are no 
kernel-maintained indexes, the kernel simply provides a change list so that 
the userspace indexer can go through and apply file type-specific indexes to 
all files that have changed.  So, for example, there are indexers for word 
files, plain text files, pdf's, and so on.

In the BeOS model, or my reinterpretation based on something I read a long 
time ago and then presumably had dreams about, the split is a bit different: 
the file system maintains indexes of extended attributes, which are written by 
applications in order to expose searchable material.  For example, a mail 
application might write out each message as a file, and attach a series of 
extended attributes, such as subject line, date, author, etc.  These extended 
attributes are then indexed automatically by the file system in order to allow 
queries to be evaluated.  I don't recall how queries and results are 
expressed, and in particular, whether the queries are processed by the file 
system (possibly exposed via special APIs or the name space) or userspace 
(accessing special files maintained by the kernel that are the indexes).

It's also worth observing that one of the authors of BFS was Dominic 
Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as 
part of their Spotlight project.

Robert N M Watson
Computer Laboratory
University of Cambridge


More information about the freebsd-hackers mailing list