(in)appropriate uses for MAXBSIZE

Thu Apr 15 02:34:33 UTC 2010

On Wed, 14 Apr 2010, Bruce Evans wrote:

> On Sun, 11 Apr 2010, Rick Macklem wrote:
>
>> On Sun, 11 Apr 2010, Bruce Evans wrote:
>> 
>>> Er, the maximum size of buffers in the buffer cache is especially
>>> irrelevant for nfs.  It is almost irrelevant for physical disks because
>>> clustering normally increases the bulk transfer size to MAXPHYS.
>>> Clustering takes a lot of CPU but doesn't affect the transfer rate much
>>> unless there is not enough CPU.  It is even less relevant for network
>>> i/o since there is a sort of reverse-clustering -- the buffers get split
>>> up into tiny packets (normally 1500 bytes less some header bytes) at
>>> the hardware level.  ...
>> 
[stuff snipped]
>
> Indeed, I was only caring about a LAN environment.  Especially with
> LANs optimized for latency (50-100 uS), nfs performance is poor for
> small files, at least for the old nfs client, mainly due to close to
> open consistency defeating caching, but not a problem for bulk transfers.
>

And I'll admit I was thinking that for a low latency LAN, a large 
read/write RPC wouldn't have a negative impact, but it sounds like
you've found 16Kb to be optimal for this case.

For NFSv4, if the client has a delegation for the file, it doesn't
have worry about close/open consistency, so there is some hope w.r.t.
small files for this case.

>
> Clustering is currently only for the local file system, at least for
> the old nfs server.  nfs just does a VOP_READ() into its own buffer,
> with ioflag set to indicate nfs's idea of sequentialness.  (User reads
> are similar except their uio destination is UIO_USERSPACE instead of
> UIO_SYSSPACE and their sequentialness is set generically and thus not
> so well (but the nfs setting isn't very good either).)  The local file
> system then normally does a clustered read into a larger buffer, with
> the sequentialness affecting mainly startup (per-file), and virtually
> copies the results to the local file system's smaller buffers.  VOP_READ()
> completes by physically copying the results to nfs's buffer (using
> bcopy() for UIO_SYSSPACE and copyout() for UIO_USERSPACE).  nfs can't
> easily get at the larger clustering buffers or even the local file
> system's buffers.  It can more easily benefit from larger MAXBSIZE.
> There is still the bcopy() to take a lot of CPU and memory bus resources,
> but that is insignifcant compared with WAN latency.  But as I said in
> a related thread, even the current MAXBSIZE is too large to use
> routinely, due to buffer cache fragmentation causing significant latency
> problems, so any increase in MAXBSIZE and/or routine use of buffers
> of that size needs to be accompanied by avoiding the fragmentation.
> Note that the fragmentation is avoided for the larger clustering buffers
> by allocating them from a different pool.
>
Ah, now I know what you were referring to w.r.t. clustering. I haven't
looked at the mechanism used to allocate buffer space in the buffer
cache, so I'll just take your word for it w.r.t. fragmentation. It
sounds like the allocation mechanism needs to be thought about if/when
MAXBSIZE gets increased.

Thanks for your input and I hope I didn't upset you when I jumped on
the "I care about WANs" bandwagon, while basically ignoring the LAN case.

rick