amd64/89202: Kernel crash when accessing filesystem

Thu Nov 17 15:32:14 PST 2005

On Thu, 17 Nov 2005, Ivo Janssen wrote:

> I'm sure you've thought about this, but I can see the following
> improvements to be made:
>
> - make dirsize 64bit - add checks to the multiplication operation to make 
> sure it doesn't
>  overflow at runtime
> - add logic tunefs and newfs so that user cannot set values
>  that will lead to kernel panics
> - add at the very least huge warnings to the newfs and tunefs
>  manpages, or mention why their usefulness is limited.

I prefer just adding limits to newfs.  newfs already enforces other limits.
ffs does very little runtime checking except via fsck, and if it ever does
more directly it should start with more important parameters.

> This particular partition is used for a huge postgres database, which
> typically use files holding the actual tables. We assumed tuning the
> fs would gain us some improvements...

It would be interesting to know if these parameters actually do give
improvements.  The came with the dirpref changes, and were initially
undocumented except in the log message.  The log message still documents
them much better than the man page.  From the log for ffs_alloc 1.55:

%   The maxcontigdirs is a maximum number of directories which may be created
% without an intervening file creation. I found in my tests that the best
% performance occurs when I restrict the number of directories in one cylinder
% group such that all its files may be located in the same cylinder group.
% There may be some deterioration in performance if all the file inodes
% are in the same cylinder group as its containing directory, but their
% data partially resides in a different cylinder group. The maxcontigdirs
% value is calculated to try to prevent this condition. Since there is
% no way to know how many files and directories will be allocated later
% I added two optimization parameters in superblock/tunefs. They are:
% 
%         int32_t  fs_avgfilesize;   /* expected average file size */
%         int32_t  fs_avgfpdir;      /* expected # of files per directory */
% 
% These parameters have reasonable defaults but may be tweeked for special
% uses of a filesystem. They are only necessary in rare cases like better
% tuning a filesystem being used to store a squid cache.

So the usefulness of these parameters is limited to cases where there are
largish files with frequent inode updates, where tuning prevents the
inodes being in different cylinder groups than the data, and where the
reduction in seeks from this is actually significant, i.e., where the
cylinder groups aren't so large that seeks within them aren't almost
as slow as inter-cg seeks and where the working set consists of only 1
cg.  I doubt that there are many such cases.  You either have a small
working set which is fast enough to access because it is small, or a
larger one which will require large seeks to access.

Also, settings with the product larger than the size of a cylinder
group are not useful; the size of a cg is also int32_t so newfs just
needs to check that the size of the produce doesn't exceed the size
of a cg.

Bruce