JUFS update, and questions.

Fri Mar 12 13:05:17 PST 2004

On Thu, 2004-03-11 at 01:08, Manuel Petit wrote:

> I assume that you are taking care that the blocks freed by truncate are 
> not
> recycled until the transaction is written to the journal.
Yes, though the exact mechanism of this is still under work, the idea is
to mark the blocks "dirty" with the TID of the operation that freed
them, upon their reallocation the journal is flushed to that TID before
the call is allowed to return.

> 
> Additionally regarding the volume not being rw-mountable if the journal 
> is not
> empty... i like much better the way it was done in BFS: the filesystem 
> replayed
> the journal on a rw-mount. Also you seem to avoid the topic of 
> ro-mounting,
> for ro-mounting is reasonable to avoid mounting if the journal is not 
> empty
> (since the journal cannot be replayed)... or it could be done like the 
> BeOS
> bootstrapper did: it built a block relocation table that redirected 
> blocks
> to the contents of the journal (the bootstrapper did not attempt to 
> replay
> the log... but just to walk the fs for finding the kernel)
> 
If the FS is not allowed to be mounted-ro until the journal is replayed
then "/" could never be journaled.  I also like the idea of allowing the
admin to "safely" look at the filesystem before potentially damageing
operations take place, if even to only do a dump/tar of the data
someplace else.

By "ignoring" the ro-mount I had intended to treat it the same way it is
treated now by the kernel, to allow it.  This makes it roughly
equivalent to the RO mount of an async FS that went boom; only its fully
recoverable when the FSCK runs.  There may be issues here on the
root-fs, and a sync journal option may be needed.  I can think of a few
ways to accomplish this; the easiest would be to keep re-writing the
same journal block and append data onto it until its full, then move on.

> >
> > (3) There is another problem here, files that were held open when the
> >     system crashed.  They could have a reference count of zero, but
> >     still have allocated data.  It seems that an fsck would still be
> >     required to walk the inode tables and put these files "somewhere",
> >     or just free the blocks they were using.  Can anyone think of a
> >     better way to do this?
> 
> Yes. On unlink if the reference count is 0 relink it to a ghost 
> directory
> that gets purged on mount. The file also gets purged when is finally
> closed... it is a bit hacky since the file is linked to that directory 
> while
> keeping its reference count to 0; but on close you know that if 
> reference
> count is zero it is linked to the ghost directory and unlinking from it
> can be handled specially.

This was a potential idea that I had as well, the problem is the case of
filesystem full.  Consider a filesystem that is 100% full (and what
better time to delete files than 100% full).  To delete the file you
then need to allocate a block (consider the case even if you have a
pre-allocated structure, the potential need to grow this structure) to
link the file to this phantom directory.  The idea then becomes to just
pre-allocate something that is the maximum possible size... and then
isn't that just equivalent to an inode table?  At this point I am not
seeing the problem of just walking the inode table as that much of a
problem, its _very_ quick to do that, inodes are just 256 bytes each
(UFS2), and just looking for the case of refcount=0 and free!=0.

What do other people think?

-- 
David E. Cross