maxphys and block sizes on slices

Mon Feb 25 07:08:15 UTC 2008

On Mon, 25 Feb 2008, Chris wrote:

> I got a server that is primarily handling large files not massive
> files but files that are 15meg+ in size and very few smaller files.
>
> So I decided to use the following options in newfs.
>
> -f 4096 -b 32768
>
> Eventually I realised this was a bad decision especially when I
> noticed vfs.bufdefragcnt growing.
>
> In addition I have noticed all servers that are using the default
> settings have 128kbytes per transfer and appear to use what maxphys is
> set to whilst the ones with the custom newfs options are locked to
> 64kb/transfer even if dfltphys and maxphys are increased.  I did

ATA drives had a DMA limit of 64K when I last looked.  SCSI (da) drives
have a more bogus limit of DFLTPHYS (default 64K), so increasing
DFLTPHYS is likely to break some drives.  There seem to be reports of
usb drives that can't even handle 64K, so the default DFLTPHYS breaks
them.

There is als MAXBSIZE (default 64K).  Non-clustered i/o must use this.
Only clustered i/o can use MAXPHYS, and then only if the drive supports
it of course.

> increase BKVASIZE to 32768 to stop the bufdefragcnt tho.  My lesson is
> learned tho new servers I setup I will keep the default block sizes
> unless someone has experience of better settings.  For now I want to
> make the best of the settings I got in place.

I've never seen block sizes above 32K work better (on low end hardware).
Sizes above about 1M work worse even for read/write(2) since they ensure
thrashing for the L2 cache, and small sizes like 512 can work better for
read()/write(2) because the fit in the L1 cache.

> 1 - is the 64kB per transfer not adjustable and is a penalty for
> choosing the large block size?  It is nearly always penned at 64kB
> with 100s transfers per second.

AFAIK (not far, but I tried increasing it), it cannot be increased for
ATA drives.  ATA drives in PIO mode used to support block sizes of
256 or 255 sectors (128K or 128K-512), but this seems to be broken,
and PIO mode is too slow for any drive less than 10-12 years old.

> 2 - is there a way to adjust the block sizes without wiping the data?

Low-level sizes can be changed.  Clustered i/o then uses larger sizes.

> 3 - How big an impact does a growing vfs.bufdefragcnt make on
> performance? after I fixed it I have noticed no difference.

It used to be very expensive.  Seems not so bad now.

> 4 - Is there anything in general reccomended to set for a server that
> handles large files but not many of them.

Implement extents.

> 5 - What are the reccomended values on newfs for large files, the
> defaults? and does the 1/8th rule have to apply for frag size vs block
> size?

Don't know.  I only care about small files :-).

> 6 - finally I have read vfs.hirunningspace boosts write speeds by
> buffering more but it can be detrimental to read speeds is this true?

I don't know of any especially bad interactions, but in general, if
there are say 50MB of writes pending on a 50MB/sec disk, then reads
will have to wait a second or more sometimes.  In congested cases,
getblk() takes more than a second sometimes (mainly under load with
large fs block sizes, and for slow devices like DVDs).  I haven't
determined if it is waiting for the disk or the software.

Bruce