defrag

Thu Mar 1 22:21:59 UTC 2007

In response to Ivan Voras <ivoras at fer.hr>:

> Bill Moran wrote:
> > In response to Ivan Voras <ivoras at fer.hr>:
> 
> >> 352462 files, 2525857 used, 875044 free (115156 frags, 94986 blocks,
> >> 3.4% fragmentation)
> 
> > 
> > Just to reiterate:
> > "Fragmentation" on a Windows filesystem is _not_ the same as "fragmentation"
> > on a unix file system.  They are not comparable numbers, and do not mean
> > the same thing.  The only way to avoid fragmentation on a unix file system
> > is to make every file you create equal to a multiple of the block size.
> 
> Ok, my point was that 3.4% is a low number for a long used system, but,
> for education sake, what is the difference between Windows'
> "fragmentation" and Unix's "fragmentation"?
> 
> I believe that a "fragmented file" in common usage refers to a file
> which is not stored continuously on the drive - i.e. it occupies more
> than one continuous region. How is UFS fragmentation different than
> fragmentation on other kinds of file systems?

That common usage refers to Windows filesystems.

In unix filesystems, fragmentation refers to the number of blocks that have
been broken down in to fragments to either hold files smaller than a block,
or (as you mentioned) use the space at the end of a file that doesn't fit
exactly in a block.

> UFS has cylinder groups, blocks and block fragments. Obviously, a file
> larger than a cylinder group will get fragmented to spill over to
> another cylinder group. Block fragments only occur at the end of files.

Yes, and UFS _intentionally_ creates what Windows users would call
"fragmentation"  There's no way I know of to measure this, however.

The key to understanding this is that not all fragmentation is bad.
Typically, files are accessed in chunks.  Your OS seldom grabs an
entire 50M file all at once -- it grabs (perhaps) 16 blocks worth, then
sends it to the requesting program, then grabs another 16 blocks worth,
etc, etc.  The time between grabbing a chunk is enough that allowing
the heads time to reposition to a difference cylinder group doesn't cause
a significant performance problem.  As a result, the OS _intentionally_
switches to a different cylinder group after a certain number of blocks
have been written (this is tunable with tunefs).  The result is that a
large file will typically be strewn about the disk.

But this also makes it _easy_ for the filesystem to avoid causing the type
of fragmentation that _does_ degrade performance.  For example, when the
first block is on track 10, then the next block is on track 20, then we're
back to track 10 again, then over to track 35 ... etc, etc

Keep in mind, that in the previous 3 paragraphs, I was using the "Windows"
definition of "fragmentation."

-- 
Bill Moran
Collaborative Fusion Inc.