posix_fadvise noreuse disables file caching

Rick Macklem rmacklem at uoguelph.ca
Wed Feb 1 01:34:02 UTC 2012


John Baldwin wrote:
> On Tuesday, January 31, 2012 12:21:07 pm Ulrich Spörlein wrote:
> > On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote:
> > > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote:
> > > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote:
> > > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote:
> > > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans
> > > > >> wrote:
> > > > >>> I recently noticed that multimedia/vlc generates a lot of
> > > > >>> disk IO when
> > > > >>> playing media files. For instance, when playing a 320kbps
> > > > >>> mp3 gstat
> > > > >>> reports about 1250kBps (=10000kbps). That's quite a lot of
> > > > >>> overhead.
> > > > >>>
> > > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire
> > > > >>> file and
> > > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as
> > > > >>> if
> > > > >>> O_DIRECT was specified during open(2), i.e. it disables all
> > > > >>> caching.
> > > > >>> That means every 1028 byte read turns into a 32KiB read (new
> > > > >>> default
> > > > >>> block size in 9.0) which explains the above numbers.
> > > > >>>
> > > > >>> I've copied the relevant vlc code below
> > > > >>> (modules/access/file.c:Open()).
> > > > >>> It's interesting to see that on OSX it sets F_NOCACHE which
> > > > >>> disables
> > > > >>> caching too, but combined with F_RDAHEAD there's still
> > > > >>> read-ahead
> > > > >>> caching.
> > > > >>>
> > > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT.
> > > > >>> It should
> > > > >>> still cache data (and even do read-ahead if F_RDAHEAD is
> > > > >>> specified),
> > > > >>> and once data is fetched from the cache, it can be marked
> > > > >>> WONTNEED.
> > > > >>
> > > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it
> > > > >> asks for.
> > > > >>
> > > > >>> Is it possible to implement it this way, or if not to just
> > > > >>> ignore
> > > > >>> the NOREUSE hint for now?
> > > > >>
> > > > >> I think it would be good to improve NOREUSE, though I had
> > > > >> sort of
> > > > >> assumed that applications using NOREUSE would do their own
> > > > >> buffering
> > > > >> and read full blocks. We could perhaps reimplement NOREUSE by
> > > > >> doing
> > > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free
> > > > >> buffers
> > > > >> and pages after the data is copied out to userland. I also
> > > > >> have an
> > > > >> XXX about whether or not NOREUSE should still allow
> > > > >> read-ahead as it
> > > > >> isn't very clear what the right thing to do there is. HP-UX
> > > > >> (IIRC)
> > > > >> has an fadvise() that lets you specify multiple policies, so
> > > > >> you
> > > > >> could specify both NOREUSE and SEQUENTIAL for a single region
> > > > >> to
> > > > >> get read-ahead but still release memory once the data is read
> > > > >> once.
> > > > >
> > > > > So I've came up with this untested patch. It uses
> > > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE
> > > > > region, and
> > > > > leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED
> > > > > doesn't
> > > > > do any good really for writes (it only flushes clean buffers),
> > > > > so I've
> > > > > left write(2) operations as using IO_DIRECT still. Does this
> > > > > sound
> > > > > reasonable? I've not yet tested this at all:
> > > >
> > > > The patch drastically improves vlc, but there's still a tiny
> > > > overhead.
> > > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD
> > > > buffer
> > > > size). With NOREUSE there's an extra transfer of 32KiB (block
> > > > size).
> > >
> > > This is probably because vlc is not reading on block boundaries,
> > > so the
> > > noreuse is throwing away partial blocks at the end of a read that
> > > then have to
> > > be re-read. We could maybe fix this by making FADV_DONTNEED only
> > > throw
> > > away completely-contained blocks rather than completely-contained
> > > pages.
> > > However, this will probably result in NOREUSE not actually
> > > throwing away
> > > anything at all if an app always reads sub-blocksize chunks.
> > >
> > > We could maybe make the case of vlc work ok in this case though by
> > > allowing
> > > an extension where you can do 'posix_fadvise(SEQUENTIAL |
> > > NOREUSE)', and
> > > in this case we could make the VOP_ADVISE(DONTNEED) in read() use
> > > an offset
> > > of 0 rather than the start of the read request.
> > >
> > > However, posix_fadvise() really is going to work best if the
> > > userland
> > > application reads aligned FS blocks.
> >
> > I find it questionable in general that an application can tell the
> > system what to do wrt. caching. Perhaps I'm running 100s of VLC
> > players
> > all on the same file and actually *do* want reads to be cached?
> >
> > What happens if I seek back in the file? It has to do a potentially
> > high-latency read again. The system has a better overview of blocks
> > that
> > are frequently being requested than any individual application.
> >
> > I fully understand the intention, and in 99.99% of the cases, this
> > data
> > *is* just being read once so there's no need to cache any reads for
> > actually requested data. But as the example shows, requested data is
> > not
> > necessarily the data that lower layers have to fetch from the disk.
> >
> > Perhaps taking to VLC people on why they think this is useful and
> > where
> > it actually, measurably helped them would be interesting.
> >
> > Sorry if this is all perfectly obvious
> 
> There are certainly cases where the user can choose to run specific
> apps in
> such a way where this makes sense, so the OS needs this functionality.
> As
> to whether or not specific apps should use these APIs or if they
> should make
> use of these APIs configurable, that is a question for each app (e.g.
> vlc).
> However, the OS should provide the tools.
> 
I'd agree. However, there might be an argument for sysctl that tells the
OS to ignore the hints, so a sysadmin can work around a case where an
app runs poorly in their environment, due to the hint?

rick



More information about the freebsd-current mailing list