flush file descriptor in a sh script while passing through a pipe

Oliver Fromme olli at lurza.secnetix.de
Mon May 26 17:59:51 UTC 2008


Mathieu Prevot wrote:
 > Let's be an example. With 1) I have a progress bar that is updated
 > regularly but with 2) I have to wait for the end of the download (the
 > next \n character ?) ...
 > 
 > 1)
 > wget http://tinyurl.com/5ztnb2
 > 
 > 2)
 > wget http://tinyurl.com/5ztnb2 --progress=bar:force 2>&1 | sed
 > '/^Location/d;/^HTTP/d;/^--/d'
 > 
 > I would like the progress bar to be updated through sed ... how can I
 > flush the file descriptor from sh or with a tiny command/signal ?

There are two problems with that.

The first problem is the fact that wget's output is fully
buffered when output is a pipe (not a tty).  Some programs
have an option for unbuffered or line-buffered output
(e.g. tcpdump -l), but unfortunately wget does not.  In
fact most programs don't have such an option.

There's a little trick that "emulates" a tty environment
for a process so it thinks that its standard output is a
tty, so output will be unbuffered:  Simply run it inside
script(1).  I've needed that for myself occasionally so
I've made a small alias for this:

alias intty='script -qt0 /dev/null </dev/null'

So you can simply type "intty wget http://... | sed ...".
(The alias works with /bin/sh, zsh and bash syntax, it'll
have to look slightly different for other shells.)

But in your case it will still not work, because there's
another problem:  sed(1) works on whole lines only.  That
means it'll read a complete line (newline-terminated) from
stdin, then apply the rules, then output it according to
the rules.

However, wget's progress bar does not consist of multiple
lines -- it's rather one huge line that consists of several
segments that are separated by carriage-return codes so
the row on the screen is overwritten from the left.
sed(1) waits for the final newline and then outputs the
whole line all at once.

This isn't easy to work around.  It's probably best to
translate all carriage-return codes to newline codes
first.  You can do this with tr(1) with the -u option:

intty wget http://... | tr -u '\r' '\n' | sed ...

That works, but the parts of the prograss display are now
written as a bunch of seperate lines, which is ugly and
probably not what you intended.  So we have to undo the
conversion for the lines that make up the progress bar.

The following piece of awk code will do that.  Note that
we know have to pass the -l option to sed so its output
isn't buffered either, because it's now going into a pipe
(to awk) instead of your tty.

intty wget http://... | tr -u '\r' '\n' | sed -l ... |
awk '{if($0=="")print;else printf"%s\r",$0;fflush()}'

That's much better.  Note that awk can also do filtering
like sed, so you can get rid of sed alltogether:

intty wget http://... | tr -u '\r' '\n' |
awk '!/^(Location|HTTP|--)/{if($0=="")print;else printf"%s\r",$0;fflush()}'

It's a good idea to put that into a small shell script for
later reuse.  When doing that, it's probably a good idea
to add some spaces and better formatting.  The following
script can be called with the URL as first argument:

#!/bin/sh -

alias intty='script -qt0 /dev/null </dev/null'

FILTER='!/^(Location|HTTP|--)/'

intty wget "$1" |
tr -u '\r' '\n' |
awk "$FILTER"'{
	if ($0 == "")
		print
	else
		printf "%s\r", $0
	fflush()
}'

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

(On the statement print "42 monkeys" + "1 snake":)  By the way,
both perl and Python get this wrong.  Perl gives 43 and Python
gives "42 monkeys1 snake", when the answer is clearly "41 monkeys
and 1 fat snake".        -- Jim Fulton


More information about the freebsd-hackers mailing list