what is fsck's "slowdown"?

Sat Sep 4 11:34:46 PDT 2004

On  4 Sep, Matthew Dillon wrote:
> :This sort of thing was my initial thought, but the posted CPU usage
> :statistics show that fsck is burning up most of its CPU cycles in
> :userland.
> :
> :>> load: 0.99  cmd: fsck 67 [running] 15192.26u 142.30s 99% 184284k
> :
> :Increasing MAXBUFSPACE looks like it would make the problem worse
> :because getdatablk() does a linear search.
> 
>     Oh my. I  didn't even notice.  That code dates all the way back to 1994
>     so I wont bash the author too badly, but it is pretty aweful coding.

At least it moves the buffer to the front of the list so that repeated
accesses of the same buffer are fast.

>     Hashing the buffer cache is trivial.  I'll do it for DragonFly and post
>     the patch as a template for you guys to do it in FreeBSD (or you could just
>     do it on your own, it really does look trivial).

You might want to instrument the code to find out how much reuse there
is of the cached data.  I would not expect much reuse within a pass,
which may mean that increasing MAXBUFSPACE might not help much.

Doing read-ahead might help, especially if it can be hinted (do
read-ahead when reading inode blocks but not indirect blocks).  The
read-ahead already done by the drive gets rid of the latency caused by
mechanical issues, but explicitly doing read-ahead in the fsck code
would eliminate some kernel overhead and drive command latency.

As I mentioned before, profiling the code is likely to be interesting.