On-disk indexing for "Project Ideas" page
Nikolay Pavlov
qpadla at gmail.com
Sat Sep 8 03:07:54 PDT 2007
Recently while reading "Design and Implementation of FreeBSD operation
system" by Marshall Kirk McKusick and gnn i have found a very intresting
paragraph regarding UFS2 implementation, indexing and B-trees. According
to it on-disk indexes was not implemented, but some structures was
reserved for future development. May be some SOC students could implement
this in future. How about to adding this into Project Ideas page?
Let me quote from the paragraph "8.3 Naming":
Finding of Names in Directories
A common request to the filesystem is to lookup a specific name in a
directory. The kernel usually does the lookup by starting at the beginning
of the directory and going through, comparing each entry in turn. First,
the length of the sought-after name is compared with the length of the
name being checked. If the lengths are identical, a string comparison of
the name being sought and the directory entry is made. If they match, the
search is complete; if they fail, either in the length or in the string
comparison, the search continues with the next entry. Whenever a name is
found, its name and containing directory are entered into the systemwide
name cache described in Section 6.6. Whenever a search is unsuccessful, an
entry is made in the cache showing that the name does not exist in the
particular directory. Before starting a directory scan, the kernel looks
for the name in the cache. If either a positive or negative entry is
found, the directory scan can be avoided.
Another common operation is to lookup all the entries in a directory. For
example, many programs do a stat system call on each name in a directory
in the order that the names appear in the directory. To improve
performance for these programs, the kernel maintains the directory offset
of the last successful lookup for each directory. Each time that a lookup
is done in that directory, the search is started from the offset at which
the previous name was found (instead of from the beginning of the
directory). For programs that step sequentially through a directory with n
files, search time decreases from Order(n2) to Order(n).
One quick benchmark that demonstrates the maximum effectiveness of the
cache is running the ls -l command on a directory containing 600 files. On
a system that retains the most recent directory offset, the amount of
system time for this test is reduced by 85 percent. Unfortunately, the
maximum effectiveness is much greater than the average effectiveness.
Although the cache is 90 percent effective when hit, it is applicable to
only about 25 percent of the names being looked up. Despite the amount of
time spent in the lookup routine itself decreasing substantially, the
improvement is diminished because more time is spent in the routines that
that routine calls. Each cache miss causes a directory to be accessed
twice—once to search from the middle to the end and once to search from
the beginning to the middle.
These caches provide good directory lookup performance but are ineffective
for large directories that have a high rate of entry creation and
deletion. Each time a new directory entry is created, the kernel must scan
the entire directory to ensure that the entry does not already exist. When
an existing entry is deleted, the kernel must scan the directory to find
the entry to be removed. For directories with many entries these linear
scans are time-consuming.
The approach to solving this problem in FreeBSD 5.2 is to introduce dynamic
directory hashing that retrofits a directory indexing system to UFS [Dowse
& Malone, 2002]. To avoid repeated linear searches of large directories,
the dynamic directory hashing builds a hash table of directory entries on
the fly when the directory is first accessed. This table avoids directory
scans on later lookups, creates, and deletes. Unlike filesystems
originally designed with large directories in mind, these indices are not
saved on disk and so the system is backward compatible. The drawback is
that the indices need to be built the first time that a large directory is
encountered after each system reboot. The effect of the dynamic directory
hashing is that large directories in UFS cause minimal performance
problems.
When we built UFS2, we contemplated solving the large directory update
problem by changing to a more complex directory structure such as one that
uses B-trees. This technique is used in many modern filesystems such as
XFS [Sweeney et al., 1996], JFS [Best & Kleikamp, 2003], ReiserFS [Reiser,
2001], and in later versions of Ext2 [Phillips, 2001]. We decided not to
make the change at the time that UFS2 was first implemented for several
reasons. First, we had limited time and resources, and we wanted to get
something working and stable that could be used in the time frame of
FreeBSD 5.2. By keeping the same directory format, we were able to reuse
all the directory code from UFS1, did not have to change numerous
filesystem utilities to understand and maintain a new directory format,
and were able to produce a stable and reliable filesystem in the time
frame available to us. The other reason that we felt that we could retain
the existing directory structure is because of the dynamic directory
hashing that was added to FreeBSD.
Borrowing the technique used by the Ext2 filesystem a flag was also added
to show that an on-disk indexing structure is supported for directories
[Phillips, 2001]. This flag is unconditionally turned off by the existing
implementation of UFS. In the future, if an implementation of an on-disk
directory-indexing structure is added, the implementations that support it
will not turn the flag off. Index-supporting kernels will maintain the
indices and leave the flag on. If an old non-index-supporting kernel is
run, it will turn off the flag so that when the filesystem is once again
run under a new kernel, the new kernel will discover that the indexing
flag has been turned off and will know that the indices may be out date
and have to be rebuilt before being used. The only constraint on an
implementation of the indices is that they have to be an auxiliary data
structure that references the old linear directory format.
--
======================================================================
- Best regards, Nikolay Pavlov. <<<-----------------------------------
======================================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20070908/00697ff9/attachment.pgp
More information about the freebsd-fs
mailing list