Cluster Filesystem for FreeBSD - any interest?

Wed Jul 20 02:37:26 GMT 2005

在 2005-07-19二的 21:16 -0500，Eric Anderson写道：
> Bakul Shah wrote:
> [..snip..]
> >>:) I understand.  Any nudging in the right direction here would be
> >>appreciated.
> > 
> > 
> > I'd probably start with modelling a single filesystem and how
> > it maps to a sequence of disk blocks (*without* using any
> > code or worrying about details of formats but capturing the
> > essential elements).  I'd describe various operations in
> > terms of preconditions and postconditions.  Then, I'd extend
> > the model to deal with redundancy and so on.  Then I'd model
> > various failure modes. etc.  If you are interested _enough_
> > we can take this offline and try to work something out.  You
> > may even be able to use perl to create an `executable'
> > specification:-)
> 
> I've done some research, and read some books/articles/white papers since 
> I started this thread.
> 
> First, porting GFS might be a more universal effort, and might be 
> 'easier'.  However, that doesn't get us a clustered filesystem with BSD 
> license (something that sounds good to me).

It has been said it would be a seven man-month efforts for a FS expert.

> 
> Clustering UFS2 would be cool.  Here's what I'm looking for:

It is exactly how "Lustre" doing its work, though it build itself on
Ext3, and Lustre targets at  http://www.lustre.org/docs/SGSRFP.pdf .

> 
> A clustered filesystem (or layer?) that allows all machines in the 
> cluster to see the same filesystem as if it were local, with read/write 
> access.  The cluster will need cache coherency across all nodes, and 
> there will need to be some sort of lock manager on each node to 
> communicate with all the other nodes to coordinate file locking.  The 
> filesystem will have to support journaling.
> 
> I'm wondering if one could make a pseudo filesystem something like 
> nullfs that sits on top of a UFS2 partition, and essentially monitors 
> all VFS operations to the filesystem, and communicates them over TCP/IP 
> to the other nodes in the cluster.  That way, each node would know which 
> inodes and blocks are changing, so they can flush those buffers, and 
> they would know which blocks (or partial blocks) to view as locked as 
> another node locks it. This could be done via multicast, so all nodes in 
> the cluster would have to be running a distributed lock manager daemon 
> (dlmd) that would coordinate this.  I think also that the UFS2 
> filesystem would have to have a bit set upon mount that tracked it's 
> mount as a 'clustered' filesystem mount.  The reason for that is so that 
> we could modify mount to only mount 'clustered' filesystems (mount -o 
> clustered) if the dlmd was running, since that would be a dependency for 
> stable coherent file control on a mount point.
> 
> Does anyone have any insight as to whether a layer would work?  Or maybe 
> I'm way off here and I need to do more reading :)
> 
> Eric
> 
> 
> 
-- 
yf-263 <yfyoufeng at 263.net>
Unix-driver.org