svn commit: r265132 - in head: share/man/man4 sys/dev/null
brde at optusnet.com.au
Thu May 1 02:00:09 UTC 2014
On Wed, 30 Apr 2014, Matthew Fleming wrote:
> On Wed, Apr 30, 2014 at 7:48 AM, Ian Lepore <ian at freebsd.org> wrote:
>> For some reason this reminded me of something I've been wanting for a
>> while but never get around to writing... /dev/ones, it's just
>> like /dev/zero except it returns 0xff bytes. Useful for dd'ing to wipe
>> out flash-based media.
> dd if=/dev/zero | tr "\000" "\377" | dd of=<xxx>
Why all these processes and i/o's?
tr </dev/dev/zero "\000" "\377"
The dd's may be needed for controlling the block sizes.
> But it's not quite the same.
It is better, since it is not limited to 0xff bytes :-).
Oops, perhaps not. tr not only uses stdio to pessimize the i/o; it uses
wide characters 1 at a time. It used to use only characters 1 at a time.
yes(1) is limited to newline bytes, or newlines mixed with strings. It
also uses stdio to pessimize the i/o, but not wide characters yet.
stdio's pessimizations begin with naively believing that st_blksize gives
a good i/o size. For most non-regular files, including all (?) devices
and all (?) pipes, st_blksize is PAGE_SIZE. For disks, this has been
broken signficantly since FreeBSD-4 where it was the disk's si_bsize_best
(usually 64K). For pipes, this has been broken significantly since
FreeBSD-4 where it was pipe_buffer.size (either PIPE_SIZE = 16K or
BIG_PIPE_SIZE = 64K).
So standard utilities tend to be too slow to use on disks. You have to
use dd and relatively complicated pipelines to get adequate block sizes.
Sometimes dd or a special utility is needed to get adequate control and
error handling. I have such a special utility for copying disks
with bad sectors, but prefer to use just cp fpr copying disks. cp
doesn't use stdio, and doesn't use mmap() above certain small size; it
uses read/write() with a fixed block size of 64K or maybe larger in
-current, so it works OK for copying disks.
The most broken utilities that I use often for disk devices are:
- md5. This (really libmd/mdXhl.c) has been broken on all devices (really
on all non-regular files) since ~2001. It is broken by misusing
st_size instead of by trusting st_blksize. st_size is only valid
for regular files, but is used on other file types to break them.
pts/21:bde at freefall:~> md5 /dev/null
MD5 (/dev/null) = d41d8cd98f00b204e9800998ecf8427e
pts/21:bde at freefall:~> md5 /dev/zero
MD5 (/dev/zero) = d41d8cd98f00b204e9800998ecf8427e
Similarly for disk devices. All devices are seen as empty by md5.
The workaround is to use a pipeline, or just stdin. "cat /dev/zero | md5"
and even "md5 </dev/zero" confuse md5 into using a different input method
that works. OTOH, "md5 /dev/fd/0" sees an empty device file, and
"cat /dev/zero | md5 /dev/fd/0" fails immediately with a seek error.
Pipes have st_size == 0 too, so the input method that stats the file
would see an empty file too, so it must not be reached in the working
case. "md5 /dev/fd/0" apparently just stats the device file, and this
appears to be empty. I'm not sure if it is the tty device file or
/dev/fd/0 that is seen. "cat /dev/zero | md5 /dev/fd/0" apparently
reaches the buggy code, but somehow gets further and fails trying to
To get adequate block sizes for disks, use dd in the pipeline that must
be used for other reasons.
I only recently noticed that pipes have st_blksize = PAGE_SIZE, so that
if you pipe to stdio utilities then the i/o will be pessimized and
reblocking using another dd in a pipeline to get back to an adequate
size. PAGE_SIZE is large enough to not be very pessimal for some uses.
- cmp. cmp uses mmap() excessively for regular files, but for device files
it uses per-char stdio excessively.
More on md5. The i/o routine for the working is are in the application
(md5/md5.c). This uses fread() with the bad block size BUFSIZ. This
is still 1024. It is more broken than st_blksize. However, fread()
is not per-char, so it is reasonably efficient. stdio uses st_blksize
for read() from the file. When the file is regular, the block size
is again relatively unimportant provided the file system has a large
enough block size or does clustering. For device files, clustering
might occur at levels below the file system, but usually doesn't for
disks. Instead, small i/o's get relatively slower with time except
on high-end SSDs with high transactions per second, because clustering
at low levels takes too many transactions.
The i/o routine for the non0-working case is in the library
(libmd/mdXhl.c). It uses read(), but with the silly stdio block
size of BUFSIZ. libmd files have several includes of <stdio.h>, but
don't seem to use stdio except for bugs like this. The result is that
the i/o is especially pessimized for the usual regular file case.
Buffering in the kernel limits this pessimization.
The device file case for cmp just uses getc()/putc(). This first
gets the st_blksize pessimization. Then it gets the slow per-char
i/o fro using getc()/putc(). For disks, the first pessimizations
tends to dominate but the second one is noticeable. For fast
input devices it is very noticeable. On freefall now:
"dd if=/dev/zero bs=1m count=4k of=/dev/null": speed is 21GB/sec;
"dd if=/dev/zero bs=1m count=4k | cmp - /dev/zero": speed is 187MB/sec.
The overhead is a factor of 110. With iron disks, the overhead would
be a factor of about 1/2.
The loop in cmp for regular files is slow too, but only in comparison
with the memcpy() that is (essentially) used for reading /dev/zero
and with the memcmp() that should be used by cmp. It just compares
bytewise and has mounds of bookkeeping to count characters and lines
for the rare cases that fail. The usual case should just use mmap()
of the whole file (if not read()) and memcmp() on that.
I recently noticed a very bad case for cmp on regular files too. I
was comparing large files on an cd9600 file system on a DVD, under
an old version of FreeBSD. cmp mmap()s the whole file. The i/o
for this is done by vm, and vm generated only minimal i/o's with
the cd9660 block size of 2K. read() would have done clustering
to a block size of 64K. Perhaps vm is better now, but it is hard
to see how it could do as well as read() without doing the same
clustering as read().
One workaround for this is to prefetch files into the buffer (vmio)
cache using read(). It is hard to avoid thrashing of the cache
with this, so I used workarounds like diff'ing the files instead
of cmp'ing them. diff is much heavier weight, but it runs faster
since it doesn't use mmap() (gnu diff seems to use fread() and
suffers from stdio using st_blksize).
More information about the svn-src-head