head behaviour
Bakul Shah
bakul at bitblocks.com
Mon Jun 7 00:06:09 UTC 2010
On Mon, 07 Jun 2010 00:13:28 +0200 =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des at des.no> wrote:
>
> The reason why head(1) doesn't work as expected is that it uses buffered
> I/O with a fairly large buffer, so it consumes more than it needs. The
> only way to make it behave as the OP expected is to use unbuffered I/O
> and never read more bytes than the number of lines left, since the worst
> case is input consisting entirely of empty lines. We could add an
> option to do just that, but the same effect can be achieved more
> portably with read(1) loops:
Except read doesn't do it quite right:
$ ps | (read a; echo $a ; grep zsh)
PID TT STAT TIME COMMAND
1196 p0 Is 0:02.23 -zsh (zsh)
1209 p1 Is 0:00.35 -zsh (zsh)
Alignment of column titles is messed up. Using egrep we can
get the right alignment but egrep also shows up.
$ ps | egrep 'TIME|zsh'
PID TT STAT TIME COMMAND
1196 p0 Is 0:02.23 -zsh (zsh)
1209 p1 Is 0:00.35 -zsh (zsh)
71945 p2 DL+ 0:00.01 egrep TIME|zsh
A small point but it is not trivial to get it exactly right.
head -n directly expresses what one wants.
But there is a deeper point.
Several people pointed out alternatives for the examples
given but in general you can't use a single command to
replace a sequence of commands where each operates on part of
the shared input in a different way.
The reason we can't do this is buffering for efficiency.
Usually there is no further use for the buffered but
unconsumed input & it can be safely thrown away. So this is
almost always the right thing to do but not when there *is*
further use for the unconsumed input. Some programs already
do the right thing (dd, for instance, as you pointed out).
Some other commands do give you this option in a limited way.
"man grep" & you will find:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
>>>> output, grep ensures that the standard input is positioned to
>>>> just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search.
So for instance
$ < /usr/share/dict/words (grep -m 1 ''; grep -m 1 '')
A
a
But pipe the file in and see what you get:
$ cat /usr/share/dict/words | (grep -m 1 ''; grep -m 1 '')
A
nterectasia
Grep does the right thing for files but not pipes! Now I do
understand *why* this happens but still, it is annoying. So
I believe there is value in providing an option to read *as
much as needed* but not more. It will be slower but will
handle the cases we are discussing. This will enhance
*composability* -- supposedly part of the unix philosophy.
The slow-but-read-just-as-much-as-needed option to be used
when you need certain kind of composability and there is no
other way. And yes, now do I think this is useful not just
for head but also any other program that quits before reading
to the end!
[cc'ed Rob in case he wishes to chime in]
More information about the freebsd-hackers
mailing list