fsck recoveries, configuration

Polytropon freebsd at edvax.de
Sat Aug 18 01:05:45 UTC 2012


On Fri, 17 Aug 2012 14:19:07 -0600, Gary Aitken wrote:
> 1.  It appears to me that the file system (ufs) is not writing
> stuff out when things are idle.  If I do a sync manually and
> leave the machine idle and it crashes later, it comes up clean. 
> If I don't do a sync manually and it crashes later, it often
> comes up needing fsck.  Is there a way to configure the filesystem
> to cache but still write cached stuff at low priority?

Note that even if the OS orders a data write, it's up to the
disk driver to actually tell the disk to do it. And the disk
then _has_ to do it. There is no real "connection" (in time)
for those components of the "task line", even though one would
assume that they happen immediately.

On a somewhat idle system, you could keep a process (e. g. top -S)
running to check system processes that could be responsible for
writes (or missing writes).



> 2.  When my machine hung (could not rlogin or ping), I powered
> off and rebooted.

Does the machine have a "soft power button" and it is configured
to issue a "shutdown -p now" (which is quite common)? When you
have access to the machine, try that. Even if the machine does
not accept network logins, this mechanism might still work.



> Reboot did a deferred fsck. 

Is this intended? Personally, I'd rather wait some time to boot
in a fully checked file system environment then dealing with the
uncertain situation of snapshots and background FS check activity.
In worst case, I want to be prompted by fsck if a major defect has
been found that requires administrator attention.

Put

	background_fsck="NO"

into /etc/rc.conf to get this behaviour. Note that as long as fsck
is running, you can't enter any interactive commands, and it will
happen _prior_ to allowing any network connections. Also note that
this is in single user mode, so you can't switch VTs.



> After it booted I logged in, and also logged in on another system. 
> On the remote system I could do a ping but rlogin returned
> "connection reset by peer", even though I could log in locally. 

Does rlogin work when you "give the system some time to recover"?



> I presume that is because the background fscks were not complete?

Possible. Background fsck is uncertain per se, so for diagnostics
better leave it aside and use the maybe "less comfortable" method.
This is easy when you have local access to the machine in question.



> I then did a 
>   ps ax | grep fsck
> and saw only the "logger" process for the deferred fsck's.
> I did a 
>   man logger
> which appeared to hang -- no output.  I'm guessing because it needed
> the filesystems which hadn't yet fsck'd.

Just a guess: Maybe you're experiencing a file system defect and fsck,
even though running in background, needs an input? I'm not really sure
about this, because I'm _intendedly_ not using fsck that way.



> I then attempted to switch consoles using
>   <alt>fn
> but could not.

That would imply you're still stuck in SUM. A strange constellation
given that it appears that you have fsck running in background.



> I then attempted to kill the man logger process using ^C with no success.

Waiting / hanging process?



> Can someone shed light on the above sequence of events?  It's highly
> likely some of them occurred before the 60 second delay for fsck
> timed out, but I'd like to understand what the heck is going on.

Try to construct a more _defined_ situation for further diagnostics.
Also you could boot the system up in SUM (use "boot -s") and then
perform fsck manually, just to make sure your disks are fine.




-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...


More information about the freebsd-questions mailing list