[PATCH] fadvise(2) system call
John Baldwin
jhb at freebsd.org
Fri Oct 28 18:26:02 UTC 2011
I have been working for the last week or so on a patch to add an fadvise(2)
system call. It is somewhat similar to madvise(2) except that it operates on
a file descriptor instead of a memory region. It also only really makes sense
for regular files and does not apply to other file descriptor types.
Just as with madvise(2) there are two types of advice that can be given. One
set specifies the access pattern for a specific region of the file while the
second set result in immediate action. The first set consist of FADV_NORMAL,
FADV_SEQUENTIAL, FADV_RANDOM, and FADV_NOREUSE. For these operations what I
have done is to add an optional "advice region" to a file descriptor. When a
read(2) or write(2) is performed on a file, if the requested region falls
completely within an active "advice region", then the associated advice is
used to modify the IO_* flags passed down with the request. FADV_NORMAL just
uses the current IO_* flags including using sequential_heuristic() to
determine the amount of read-ahead and/or clustering to perform. FADV_RANDOM
always passes a sequential count of zero to prevent read-ahead.
FADV_SEQUENTIAL is the same as FADV_NORMAL for now (perhaps it should always
be setting the maximum sequential count?). FADV_NOREUSE passes a sequential
count of zero and sets IO_DIRECT (as if the operation were performed on a file
opened with O_DIRECT).
To simplify the implementation, only a single "advice region" is maintained
for now (unlike madvise(2) which will split up vm map entries if necessary to
ensure all requests are honored). Since the advice is only advisory, I think
this is an ok approach for now. If we really had a valid use case, we could
maybe add a list of advice regions, but then you have to deal with possibly
splitting up read(2) or write(2) requests that span multiple advice regions,
etc. I didn't feel that this extra complexity was warranted for now.
The other two operations (FADV_WILLNEED and FADV_DONTNEED) are implemented via
a new VOP_ADVISE(). The patch includes a default implementation
(vop_stdadvise()) which is a nop for FADV_WILLNEED (I couldn't come up with a
filesystem-independent way to trigger an async read-ahead). For FADV_DONTNEED
it has a functional implementation which flushes all clean buffers from the
vnode (via a new V_CLEANONLY mode for vinvalbuf()) and then moves any clean,
unwired pages in the specified range of the file to the cache page queue
(using a new vm_object_page_cache() routine).
Various versions of this patch have already been reviewed and/or glanced at by
alc@, kib@, and mdf@, but I'd like to open it for wider review before
committing it. I will likely also MFC it back to 8 after 9.0 is released.
The patch can be found at www.freebsd.org/~jhb/patches/fadvise.patch
You can read the description of posix_fadvise() (which this implements) here:
http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fadvise.html
--
John Baldwin
More information about the freebsd-arch
mailing list