vnode_pager_putpages errors and DOS?
Uwe Doering
gemini at geminix.org
Wed Nov 3 15:14:57 PST 2004
Igor Sysoev wrote:
> On Sat, 9 Oct 2004, Uwe Doering wrote:
>>[...]
>>I wonder whether the unresponsiveness is actually just the result of the
>>kernel spending most of the time in printf(), generating warning
>>messages. vnode_pager_generic_putpages() doesn't return any error in
>>case of a write failure, so the caller (syncer in this case) isn't aware
>>that the paging out failed, that is, it is supposed to carry on as if
>>nothing happened.
>>
>>So how about limiting the number of warnings to one per second? UFS has
>>similar code in order to curb "file system full" and the like. Please
>>consider trying the attached patch, which applies cleanly to 4-STABLE.
>>It won't make the actual application causing these errors any happier,
>>but it may eliminate the DoS aspect of the issue.
>
> I have just tried your patch. To test I ran the program from
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/67919
>
> The patch allows me to login on machine while the system reports about
> "vnode_pager_putpages: I/O error 28". However, the file system access is
> very limited and after some time the system became unresponsible.
Limited file system access is to be expected, since
vnode_pager_putpages() keeps the number of dirty buffers
('numdirtybuffers') near its upper limit ('hidirtybuffers'). However,
the unresponsiveness may be caused by another shortcoming I found in the
meantime.
When 'numdirtybuffers' is greater or equal 'hidirtybuffers', function
bwillwrite() will block until 'numdirtybuffers' drops below some
threshold value. bwillwrite() gets called in a number of places that
deal with writing data to disk.
Two of these places are dofilewrite() (which is in turn called by
write() and pwrite()) and writev(). There, bwillwrite() gets called if
the file descriptor is of type DTYPE_VNODE. Now, this unfortunately
doesn't take into account that ttys, including pseudo ttys, and even
/dev/null and friends, are character device nodes and therefore vnodes
as well, but have nothing to do with writing data to disk. That is, in
case of heavy disk write activity, write attempts to these device nodes
get blocked, too! With the consequence that the system appears to
become unresponsive at the shell prompt, or reacts very sporadic. Even
daemonized processes that happen to log data to /dev/null (on stdout &
stderr, for example) will block.
What we need here is an additional test that makes sure that in case of
a character device bwillwrite() gets called only if the device is in
fact a disk. Please consider trying out the attached patch. It will
not reduce the heavy disk activity (which is, after all, legitimate),
but it is supposed to enable you to operate the system at shell level
and kill the offending process, or do whatever is necessary to resolve
the problem.
Uwe
--
Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers
gemini at geminix.org | http://www.escapebox.net
-------------- next part --------------
--- src/sys/kern/sys_generic.c.orig Tue Sep 14 19:56:53 2004
+++ src/sys/kern/sys_generic.c Sun Sep 26 13:13:46 2004
@@ -48,6 +48,7 @@
#include <sys/filio.h>
#include <sys/fcntl.h>
#include <sys/file.h>
+#include <sys/vnode.h>
#include <sys/proc.h>
#include <sys/signalvar.h>
#include <sys/socketvar.h>
@@ -78,6 +79,23 @@
static int dofilewrite __P((struct proc *, struct file *, int,
const void *, size_t, off_t, int));
+static __inline int
+isndchr(vp)
+ struct vnode *vp;
+{
+ struct cdevsw *dp;
+
+ if (vp->v_type != VCHR)
+ return (0);
+ if (vp->v_rdev == NULL)
+ return (0);
+ if ((dp = devsw(vp->v_rdev)) == NULL)
+ return (0);
+ if (dp->d_flags & D_DISK)
+ return (0);
+ return (1);
+}
+
struct file*
holdfp(fdp, fd, flag)
struct filedesc* fdp;
@@ -403,7 +420,7 @@
}
#endif
cnt = nbyte;
- if (fp->f_type == DTYPE_VNODE)
+ if (fp->f_type == DTYPE_VNODE && !isndchr((struct vnode *)(fp->f_data)))
bwillwrite();
if ((error = fo_write(fp, &auio, fp->f_cred, flags, p))) {
if (auio.uio_resid != cnt && (error == ERESTART ||
@@ -496,7 +513,7 @@
}
#endif
cnt = auio.uio_resid;
- if (fp->f_type == DTYPE_VNODE)
+ if (fp->f_type == DTYPE_VNODE && !isndchr((struct vnode *)(fp->f_data)))
bwillwrite();
if ((error = fo_write(fp, &auio, fp->f_cred, 0, p))) {
if (auio.uio_resid != cnt && (error == ERESTART ||
More information about the freebsd-stable
mailing list