kern/163076: It is not possible to read in chunks from linprocfs and procfs.

Mon Dec 5 15:07:05 UTC 2011

On Mon, 5 Dec 2011, Petr Salinger wrote:

>> Description:
> It is not possible to read in chunks from linprocfs and procfs.
> It is a regression against stable-8.
> I suspect it is due to changes of sbuf implementation between 8 and 9.
>
> Some files are rather big (over 4KB) and it is really standard to read them in blocks.
>> How-To-Repeat:
> "dd if=$FILE bs=1", with FILE any file in procfs or linprocfs
> The result is empty output.

I don't remember this ever working.  The correct way to fix it is
unclear (start by not claiming that the highly irregular files in
procfs are regular), but empty output is unnecessarily bad - I
would expect to get at least 1 byte.  Under FreeBSD-~5.2, I get
the following file sizes:

file       dd (1 byte)   dd (10k)   dd (1m)   wc | cut... wc -c      stat
--------   -----------   --------   -------   --------    --------   --------
cmdline    0             6          EIO       6           0          0
ctl        EBADF         EBADF      EBADF     EBADF       ctl        0
dbregs     hangs         hangs      hangs     hangs       0          0
etype      0             14         EIO       14          0          0
file@      575712        575712     575712    575712      575712     575712
fpregs     hangs         hangs      hangs     hangs       0          0
map        0             1150       EIO       1150        0          0
mem        EBADF         EBADF      EBADF     EBADF       0          0
note       EBADF         EBADF      EBADF     EBADF       0          0
notepg     EBADF         EBADF      EBADF     EBADF       0          0
regs       hangs         hangs      hangs     hangs       0          0
rlimit     0             65         EIO       65          0          0
status     0             94         EIO       94          0          0

The irregularity is so large that it confuses wc -c into not working,
while plain wc works.  This is apparently because wc -c believes the
claim that the file is regular, so it stats the file to get its size
and finds 0, while plain wc reads the whole file using block size 64K.
     (md5 is another utility that is broken on such files, but it
     is broken even for files that don't claim to be regular.  E.g.,
     md5 on /dev/zero (or any device file that you can open) gives
     the same result as md5 on /dev/null, because it just stats the
     file, although this is completely wrong for device files.  md5
     is unbroken on pipes, so you can apply it to device files using
     the apparent beginner's pessimization "cat /dev/foo | md5".
     This method works for the irregular regular files in procfs
     too.  You would have to use dd instead of cat to control the
     block size, and choose a size that is large enough to work and
     small enough to avoid EIO.)

The *regs files don't block doing the read(), but just loop endlessly
trying to read an infinite amount.  This is because the uio offset is
reset to 0 after each read.  ISTR this being done for some other file
types.  This is a different feeble attempt to fix the problem in this
PR.  The basic problem is that seeking is not implemented for many
files, so there is no way to continue reading from the previous uio
offset, so the new offset must be either infinity (for most files) or
0 (for regs files).

I can now explain more of the above irregularities:
- for tiny files, seeking is easy to implement by sprintf()ing the
   whole file and using an offset in the string.  The string constant
   should be either invariant or the previously generated string must
   be saved across reads (saving the string is only reasonable if it
   is tiny).  This (except possibly for sufficient invariance/saving)
   is done.  But some bug breaks reads of size 1.  Perhaps this is fixed
   in -current, or was fixed and has been broken again.  dd seems to
   work with block sizes betwen 2 and 128k inclusive in cases where it
   works with a block size of 10k in the above.  The 128k limit would
   be explained by the misimplementation of attempting to malloc() the
   user-specified read size instead of the tiny size actually needed.
   The user must not be allowed to malloc() large sizes and there is
   an arbitrary limit of 128k.
- the regs files are small although not tiny.  But they are highly
   variable so they should be read atomically using read() syscalls.
   Thus seeking in them is not useful.  This should probably by enforced
   by only allowing the uio offset to be 0 or EOF.  Instead, it is only
   partially enforced by resetting the offset to 0 after each read (I
   hink applications can mess this up by lseek()ing between reads), So
   callers don't need to do an lseek() for this.  This API was invented
   before pread() existed.  pread() should be used now.  This API results
   in casual observers reading the same data endlessly.  I sometimes
   look at these files using hd and would prefer that EOF worked normally
   for them.

Bruce