[PATCH] fadvise(2) system call

Peter Wemm peter at wemm.org
Thu Nov 10 01:47:59 UTC 2011


On Wed, Nov 9, 2011 at 5:25 PM, Alfred Perlstein <alfred at freebsd.org> wrote:
> * Paul Saab <ps at mu.org> [111109 16:32] wrote:
>> On Wed, Nov 9, 2011 at 3:44 PM, Bruce Cran <bruce at cran.org.uk> wrote:
>> > On 08/11/2011 13:00, John Baldwin wrote:
>> >>
>> >> I think it would be fine to add flags to applications like 'tar' to allow
>> >> users to alter their behavior in specific use cases when it makes sense.
>> >> However, I think there are more workloads for 'tar' than the ones you are
>> >> thinking of and we should be hesitant to change applications to use non-
>> >> default settings.
>> >
>> > Someone's done that for GNU tar on Linux, adding a --no-oscache switch:
>> > http://www.mysqlperformanceblog.com/2010/04/02/fadvise-may-be-not-what-you-expect/
>>
>> So adding this support is good, but not for general purpose.  It's
>> really only good when you're pumping gigs of data through tar.  I did
>> this for libarchive  (plus other work for O_DIRECT reading and
>> creating the archive) for copying large amounts of data without
>> impacting a running system.. It worked great for this, but then it
>> absolutely fails when extracting a tar archive with millions of little
>> files because of all the sync operations.
>
> I've thought about this and it almost makes sense to have a secondary
> LRU that such pages would wind up in that is much smaller than the system
> one.  I'm pretty sure there are a number of papers on this, but I've not
> looked over them in a long while.

We actually do have a fairly extensive anti-swamping mechanisms in
place, but they are in somewhat obscure places or are side effects of
other policies elsewhere.  eg: the vmio kva mapping space is limited
and the dirty fraction is enforced there.  This provides the file
write based anti-swamping back pressure.

This of course is completely bypassed with mmap reads/writes.  That's
why writing to a huge mmap file hits like a truck, but doing write()
to the same file does not.

>>
>> Anyway, this is a good option to enable and has very practical uses
>> out there, but it should be turned on with an option and not on by
>> default.
>
> What about the operation of just reading the tar archive itself?

Personally, I really don't want to blow away useful cache contents
with a tar file unless I explicitly say so.  I'm generally more
worried about keeping usefully cached random bits of files that are
scattered all over the drive than with a sequentially readable tar
file that is a best case scenario for file reads.

I'd rather read a 2GB file sequentially twice than to kick out 2GB of
random access data.

In any case, fadvise() is a tool that we should have.  Deciding on
tar(1) policy is an entirely separate thing.

-- 
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell


More information about the freebsd-arch mailing list