misc/91606: sha1 /dev is suspended
Bruce Evans
bde at zeta.org.au
Thu Jan 19 05:50:14 PST 2006
The following reply was made to PR bin/91606; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: Kris Kennaway <kris at obsecurity.org>
Cc: freebsd-gnats-submit at FreeBSD.org
Subject: Re: misc/91606: sha1 /dev is suspended
Date: Fri, 20 Jan 2006 00:42:55 +1100 (EST)
On Thu, 19 Jan 2006, Kris Kennaway wrote:
> On Tue, Jan 10, 2006 at 05:52:31PM +0000, Dominik Karczmarski wrote:
> > >Description:
> > When I run (on FreeBSD 5.4)
> > > sha1 /dev=20
> > or
> > > md5 /dev
> >=20
> > checksum is NOT calculated and is suspended.
> > there is no problem on FreeBSD 4.X.
>
> What do you expect this to do? There are no files in /dev suitable
> for checksumming.
This shows that:
- md5 doesn't detect EOF properly
- devfs doesn't implement EOF properly
- neither of these bugs have been merged into FreeBSD-4.x.
Variants of this show that md5 doesn't read pipes or special files
properly. Most disk files in /dev are suitable for checksumming.
md5 was broken in rev.1.14 of libmd/mdXhl.c (4.x just missed the bug --
it is at 1.13):
% Index: mdXhl.c
% ===================================================================
% RCS file: /home/ncvs/src/lib/libmd/mdXhl.c,v
% retrieving revision 1.13
% retrieving revision 1.14
% diff -u -2 -r1.13 -r1.14
% --- mdXhl.c 28 Aug 1999 00:05:07 -0000 1.13
% +++ mdXhl.c 17 Mar 2001 10:00:50 -0000 1.14
% @@ -7,9 +7,10 @@
% * ----------------------------------------------------------------------------
% *
% - * $FreeBSD: src/lib/libmd/mdXhl.c,v 1.13 1999/08/28 00:05:07 peter Exp $
% + * $FreeBSD: src/lib/libmd/mdXhl.c,v 1.14 2001/03/17 10:00:50 phk Exp $
% *
% */
%
% #include <sys/types.h>
% +#include <sys/stat.h>
% #include <fcntl.h>
% #include <unistd.h>
% @@ -44,17 +45,38 @@
% MDXFile(const char *filename, char *buf)
% {
% + return MDXFileChunk(filename, buf, 0, 0);
% +}
% +
% +char *
% +MDXFileChunk(const char *filename, char *buf, off_t ofs, off_t len)
% +{
% unsigned char buffer[BUFSIZ];
% MDX_CTX ctx;
% - int f,i,j;
% + struct stat stbuf;
% + int f, i, e;
% + off_t n;
%
% MDXInit(&ctx);
% - f = open(filename,O_RDONLY);
% + f = open(filename, O_RDONLY);
% if (f < 0) return 0;
% - while ((i = read(f,buffer,sizeof buffer)) > 0) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MDXFile() used to detect EOF properly -- it used an almost(*)-sufficiently
large buffer and stopped reading after reading 0 in the above.
% - MDXUpdate(&ctx,buffer,i);
% - }
% - j = errno;
% + if (fstat(f, &stbuf) < 0) return 0;
% + if (ofs > stbuf.st_size)
% + ofs = stbuf.st_size;
% + if ((len == 0) || (len > stbuf.st_size - ofs))
% + len = stbuf.st_size - ofs;
Rev.14 adds an fstat() to bogusly determine the size. The size can be
given by fstat can be wrong for various reasons, e.g.:
- the size of a regular file or directory can grow or shrink.
Shrinkage should be still detected by reading EOF (as above), but
that is broken too (see below).
- the size in stbuf.st_size is likely to be wrong for other file types,
especially for pipes.
- some files (mainly ones in procfs) claim to be regular although they
are very irregular. Determining their current actual size is
problematic and not done on every stat().
- all directories in devfs claim to have size 512 but few have that size,
since directories in devfs are irregular like the irregular regular
files in procfs with less excuse since the directory contents don't
change often. devfs just reports a nominal st_size and it takes
reading to the end to determine the actual size. E.g., "wc /dev" on
sledge now shows that the size of /dev is 1116. It is not even a
multiple of 512 so other programs may be confused. md5 is especially
confused, as follows:
- it first tries to read the nominal size (512) (it would try up to 1024
if the nominal size were larger).
- devfs read() returns 504 since it doesn't return partial directory
entries.
- md5 now tries to read only what it thinks is the residual size (8).
This is too small for the next directory entry for the same reason
that 512 was too small for another entry, so read() returns 0. This
result is very irregular (it's a bug in devfs) since EOF has not
been reached.
- md5 doesn't understand the irregular EOF, so it doesn't handle the
problem by expanding the read, and it doesn't understand ordinary
EOF so it isn't confused into mishandling the problem by quitting
before it reaches real EOF, so it loops doing the previous step
forever.
% + if (lseek(f, ofs, SEEK_SET) < 0) return 0;
Rev.1.14 also tries to break pipes here. However, "cat | md5" still
works because the top level of md5 doesn't call here -- it uses
MDFilter() which uses fread() which has non-broken detection of EOF.
"cat | md5 /dev/stdin" fails here and then the top level warn()'s about
the illegal seek. For non-pipes, the above isn't even wrong, since
lseek() succeeds on unseekable files that aren't pipes, e.g. for most
device files. st_size is also garbage for most device files. This
breaks things like "md5 /dev/ad0" (md5 reads 0 bytes so /dev/ad0
appears to be the same as /dev/null).
(*) I think the fstat() in the above is good only for determining the
size of a sufficiently-large buffer (st_blksize (**)). E.g, for
directory entries on file systems that like devf only return non-truncated
entries, it is essential for the read size to be at least as large as
the largest directory entry. Using st_blksize should (**) give this
automatically, but using an arbitrary buffer size like BUFSIZ might not.
stdio tries to do the right thing by using st_blksize in most cases, but
some cases didn't work right and now all cases don't work right (**).
(**):
- st_blksize used to work right for regular files, at least above the
level of individual filesystems. Filesystems return their preferred
block size in va_blocksize, and vn_stat() used to actually return
va_blocksize to userland. This is unbroken in FreeBSD-4.
- st_blksize used to work right for directories, just like for files.
For ffs, va_blocksize is the same for directories as for files (it
is fs_bsize). This is unbroken in FreeBSD-3. In FreeBSD-4, it was
broken misconverting the default of a case statement in rev.1.81
of vfs_vnops.c.
- st_blocksize used to work not-so right for some special files (mainly
disks). Drivers were given a chance to set a good size in si_bsize_best.
A few disk drivers actually did so.
- -current ignores va_blocksize and doesn't have si_bsize_best. It sets
st_blksize to PAGE_SIZE in all cases.
- stdio attempts to use st_blksize for everything except ttys. This used
to work right for disks. It last worked in FreeBSD-3. Stdio doesn't
know of any good way to determine which devices are ttys. It just
classifies all cdevs as "could be tty". Since block disk devices were
lost in FreeBSD-4, stdio hasn't used st-blksize for disks since FreeBSD-3,
and the partially working cases for disks in vn_stat() in FreeBSD-4 were
never actually used by stdio.
% + n = len;
% + while (n > 0) {
% + if (n > sizeof(buffer))
% + i = read(f, buffer, sizeof(buffer));
% + else
% + i = read(f, buffer, n);
% + if (i < 0) break;
^^^^^^^^^^^^^^^^^
The new function MDXFileChunk() (and thus MDXFile() too) loops forever
if it unexpectedly hits EOF.
For reasons given above, the fix is not just to change the break
condition to (i <= 0):
- the buffer size should not be reduced to n, so that irregular EOFs
can't happen. Then the MD5File() case would just work like it used
to provided the lseek() and the fstat() are also removed in that case.
In the nondegenerate MD5FileChunk() case, the read might read past
(offset + len), but that wouldn't cause any new problems since we
close the file so our file pointer is irrelevant.
- fixing unseekable files is harder, since lseek() doesn't tell you if
a file is seekable (it only fails for pipes, and thos can be classified
just as easily using the result of the fstat()). The degenerate case
shouldn't seek, and the nondengenerate case should try harder to abort
for unseekable files. dd(1) has a fair amount of code to guess
seekability.
- I think using st_size should work for regular files and directories
only, but the case of procfs shows that it doesn't actually work for
"regular" files, and the case of devfs shows that it does't actually
work for directories. I'm not sure if it is really needed. All of
its current uses are dubious:
- it is just harmful for the degenerate case.
- if it is larger than the offset, then we reduce the offset. Why not
an error?
- if the size from the offset to st_size is < len, we reduce len? Why
not an error? Without errors, there is no way for the caller to
tell if the chunk read was anywhere near the chunk requested. Without
fixups to avoid the errors, the errors are easy to detect by just
seeking to `offset' and trying to read `len' bytes, provided lseek()
works.
% + MDXUpdate(&ctx, buffer, i);
% + n -= i;
% + }
% + e = errno;
% close(f);
% - errno = j;
% + errno = e;
% if (i < 0) return 0;
% return MDXEnd(&ctx, buf);
A workaround for pipes, devices and directories is to only use stdin
(not by name).
Bruce
More information about the freebsd-bugs
mailing list