scripting tip needed

Wed Jul 1 22:12:52 UTC 2009

On Wed, 1 Jul 2009 22:02:48 +0200 (CEST), Wojciech Puchar <wojtek at wojtek.tensor.gdynia.pl> wrote:
>> Using an interactive language like Python you can actually *test* the
>> code as you are writing it.  This is a major win most of the time.
>
> could you explain what you mean? You can and you have to test a code on
> any language be it bash, ksh python or C

Yes.  I mean that one can directly interact with the interpret in a REPL
prompt, doing stuff like:

    >>> import re
    >>> devre = re.compile(r'(/dev/\S+)\s+(\S+)\s.*$')
    >>> devre
    <_sre.SRE_Pattern object at 0x28462780>
    >>> devre.match('/dev/ad0s1d 1012974 390512 541426 42% /var')
    <_sre.SRE_Match object at 0x28432e78>
    >>> devre.match('/dev/ad0s1d 1012974 390512 541426 42% /var').groups()
    ('/dev/ad0s1d', '1012974')
    >>> devre =
    >>> re.compile(r'(/dev/\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+).*$')
    >>> devre.match('/dev/ad0s1d 1012974 390512 541426 42% /var').groups()
    ('/dev/ad0s1d', '1012974', '390512', '541426', '42%', '/var')

See how I am 'refining' the initial regular expression without ever
leaving the Python prompt?  That sort of interactivity is entirely lost
when you have to edit a file, save it, switch screen(1) windows or type
^Z to background the editor, run a script, watch it fail and repeat.

Then I can keep testing bits and pieces of code:

    >>> from subprocess import Popen, PIPE
    >>> pipe = Popen(['df', '-k'], shell=False, stdout=PIPE).stdout
    >>> for l in pipe:
    ...     m = devre.match(l)
    ...     if m:
    ...         print "device %s, size %ld KB" % (m.group(1), long(m.group(2)))
    ...
    device /dev/ad0s1a, size 1012974 KB
    device /dev/ad0s1d, size 1012974 KB
    device /dev/ad0s1e, size 2026030 KB
    device /dev/ad0s1f, size 10154158 KB
    device /dev/ad0s1g, size 284455590 KB
    device /dev/md0, size 19566 KB
    >>>

So piping df output to a Python bit of code works!  That's nice.  Then
once I have a 'rough idea' of how I want the script to work, I can
refactor a bit the repetitive bits:

    >>> def devsize(line):
    ...     m = devre.match(line)
    ...     if m:
    ...         return (m.group(1), m.group(2))
    ...
    >>> devsize('/dev/ad0s1d 1012974 390512 541426 42% /var')
    ('/dev/ad0s1d', '1012974')

So here's a short function to return a nice 2-item tuple with two values
(device name, number of 1 KB blocks).  Can we pipe df output through it?

    >>> pipe = Popen(['df', '-k'], shell=False, stdout=PIPE).stdout
    >>> pipe = Popen(['df', '-k'], shell=False, stdout=PIPE).stdout
    >>> map(devsize, pipe.readlines())
    [ None, ('/dev/ad0s1a', '1012974'), None, ('/dev/ad0s1d', '1012974'),
      ('/dev/ad0s1e', '2026030'), ('/dev/ad0s1f', '10154158'),
      ('/dev/ad0s1g', '284455590'), None, None, None, None, None, None,
      None, None, None, None, None, None, None, None,
      ('/dev/md0', '19566'), None]
    >>>

It looks we can do that too, but the tuple list may be more useful if we
trim the null items in the process:

    >>> pipe = Popen(['df', '-k'], shell=False, stdout=PIPE).stdout
    >>> [t for t in map(devsize, pipe.readlines()) if t]
    [ ('/dev/ad0s1a', '1012974'), ('/dev/ad0s1d', '1012974'),
      ('/dev/ad0s1e', '2026030'), ('/dev/ad0s1f', '10154158'),
      ('/dev/ad0s1g', '284455590'), ('/dev/md0', '19566') ]

So there it is.  A nice structure, supported by the core of the
language, using a readable, easy syntax, and listing all the /dev nodes
of my laptop along with their sizes in KBytes.

The entire thing was built 'piece by piece', in the same Python session,
and I now have not only a 'rough idea' of how the code should work, but
also a working copy of the code in my history.

Note the complete *lack* of care about how to append to a list, how to
create dynamic pairs of devicename-size tuples, how to map all elements
of a list through a function, and more importantly the complete and
utter lack of any sort of '"${[]}"' quoting for variable names, values,
nested expansions, and so on.

That's what I am talking about.  Shell scripts are nice, but if we are
not constrained for some reason to use only /bin/sh or ksh, there's no
excuse for wasting hours upon hours to decipher cryptic quoting rules
and exceptional edge-cases of "black quoting magic", just to get a short
job done.  Being able to _easily_ use higher level structures than a
plain 'stream of bytes' is nice :)