how to royally mess up a -stable system

Mikhail Teterin mi+kde at aldan.algebra.com
Sun Jul 18 00:03:39 PDT 2004


Have ImageMagick try to load a really big image file -- big enough to
overblow your /var/tmp ...

Here is the state of the box (after the libMagick process was killed) --
from the `systat -pigs':

                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   |||||||||||||||||||||||||||||||||||||||||||||||||| 10.5

                    /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
root         syncer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root     pagedaemon XXXXX  
             <idle> X
[...]

The machine is almost entirely unresponsive. When tcsh echoes the
commands back at all, it can not execute them for many minutes. The
kernel then tries to log each and every error:

[...]
Jul 18 02:06:33 corbulon /kernel: vnode_pager_putpages: residual I/O 65536 at 
18777
Jul 18 02:06:33 corbulon /kernel: pid 7 (syncer), uid 0 on /var: file system 
full
Jul 18 02:06:33 corbulon /kernel: vnode_pager_putpages: I/O error 28
Jul 18 02:06:33 corbulon /kernel: vnode_pager_putpages: residual I/O 65536 at 
18778
Jul 18 02:06:34 corbulon /kernel: pid 7 (syncer), uid 0 on /var: file system 
full
Jul 18 02:06:34 corbulon /kernel: vnode_pager_putpages: I/O error 28
Jul 18 02:06:34 corbulon /kernel: vnode_pager_putpages: residual I/O 65536 at 
18779
Jul 18 02:06:34 corbulon /kernel: pid 7 (syncer), uid 0 on /var: file system 
full
Jul 18 02:06:35 corbulon /kernel: vnode_pager_putpages: I/O error 28
[...]
Jul 18 02:09:43 corbulon /kernel: vnode_pager_putpages: residual I/O 40960 at
8179
Jul 18 02:09:43 corbulon /kernel: pid 3 (pagedaemon), uid 0 on /var: file
system full
Jul 18 02:09:43 corbulon /kernel: vnode_pager_putpages: I/O error 28
Jul 18 02:09:43 corbulon /kernel: vnode_pager_putpages: residual I/O 40960 at
8179
Jul 18 02:09:43 corbulon /kernel: pid 3 (pagedaemon), uid 0 on /var: file
system full
Jul 18 02:09:43 corbulon /kernel: vnode_pager_putpages: I/O error 28
Jul 18 02:09:43 corbulon /kernel: vnode_pager_putpages: residual I/O 40960 at
8179
Jul 18 02:09:43 corbulon /kernel: pid 3 (pagedaemon), uid 0 on /var: file
system full
[...]

which really chokes the box even though /var/log is on a different
device from /var/tmp ...

Yesterday my 4.8-stable kernel had to be cold-rebooted after almost
a year because of this -- existing processes (sshd, webmin) were
responding sometimes, but were unable to launch any new processes --
like shell (in case of sshd) or even /sbin/reboot (in case of webmin).

Why is a fast-writing program (not run by root) able to hang a server?

Perhaps, these errors logged by the kernel can be made less specific and
fit into one line -- that way syslogd will be able to cope with them
better, at least?

	-mi



More information about the freebsd-stable mailing list