[rfc] Replacing FNV and hash32 with Paul Hsieh's SuperFastHash

Sun Dec 26 17:47:27 UTC 2010

On (26/12/2010 15:20), Ivan Voras wrote:
> On 26 December 2010 14:24, Gleb Kurtsou <gleb.kurtsou at gmail.com> wrote:
> > On (25/12/2010 20:29), Ivan Voras wrote:
> >> On 23.12.2010 23:46, Gleb Kurtsou wrote:
> >>
> >> > For testing I've used dbench with 16 processes on 1 Gb swap back md
> >> > device, UFS + SoftUpdates:
> >> > Old hash (Mb/s): 599.94  600.096 599.536
> >> > SFH hash (Mb/s): 612.439 612.341 609.673
> >> >
> >> > It's just ~1% improvement, but dbench is not a VFS metadata intensive
> >> > benchmark. Subjectively it feels faster accessing maildir mailboxes
> >> > with ~10.000 messages : )
> >>
> >> Try blogbench if you need metadata-intensive operations, or even fsx.
> 
> > blogbench should be good, but I've always had hard time interpreting its
> > results. Besides results tend to very a lot, there is no way to set seed
> > value like in fsx, so that I could run exactly the same test in different
> > configurations.
> 
> I think the exact sequence of blogbench operations depends on duration
> of previous operations (it's multithreaded) so from that angle you are
Why should it? Operation order in dbench or fsx doesn't depend on
duration of previous operations.

> right - you can't do a repeatable run except in the trivial cases. On
> the other hand, it uses rand() without seeding it with
> srand()/sranddev() so this part is actually very repeatable :)
I've once tried to make its behaviour more predictable, I can't find
the patch and can't recall any specifics, but there were architectural
issues. You are right, setting seed and calling rand() should give
stable results, that's what I was trying to achieve.

The other way to work around such "limitation" is too run sufficiently
large number of tests. Which requires patience :)

> > fsx is a different beast, it reads/writes/truncates at random offsets -
> > great tool for debugging mmap/truncate issues. Patch doesn't improve it
> > in any way.
> 
> It depends on what metadata operations you require - blogbench will
> create, find and write files (if we ignore atime); fsx will create a
> decent amount of traffic with file size and mtime changes. In your
> case you'll probably need to run it on a memory file system or tmpfs
> due to sensitivity to disk IO latencies (if your improvements is on
> the order of few percent).
I meant create/readdir/remove as metadata intensive operations -- blogbench
is very good for it. fsx creates single file.

Most people will only notice changes in vfs_cache.c and UFS' dirhash,
that's 600 Mb/s vs 613 Mb/s improvement I've written about above.

I'd appreciate if someone could benchmark if_lagg, it was using hash32
for binary data, which could result in poor hash table usage, which
could possibly make most of the data go on single interface. But there
would be hardly any performance improvement due to limited network
bandwidth. Besides old hash32 is faster than new SFH.