awk question

Polytropon freebsd at edvax.de
Mon Oct 5 23:22:14 UTC 2015


On Mon, 05 Oct 2015 18:57:54 -0400, Quartz wrote:
> >> It's not very much like sh or C syntax (or
> >> any other syntax) and new users tend to get really confused.
> >
> > Hmmm... I don't know, could you provide an example where you
> > would say, like, "this is not intuitive" or even "this does
> > something totally strange"?
> 
> Things I've noticed new users bump into all the time:
> 
> 
> Statements must be wrapped in curly braces, ie;
>  > awk '{print $1}{print $2}'
> I think awk is one of the few languages to do this.

This is equivalent to awk '{ print $1; print $2 }',
just like in C, C++, even Java and Javascript, { ... }
are being used to "group statements".



> Because of the above, having to type:
>  > awk '{print $1}'
> instead of just"
>  > awk 'print $1'
> .. in other words both the quotes and the curly braces are required. For 
> most other shell utilities one is enough.

If you consider my previous post about what { ... }
means, enlightenment will quickly follow: They define
a block, and what's infront of this block states _when_
the block should be executed. This "prefix" is important.
That's why it's neccessary to understand the basic
"dataflow" within awk:

	BEGIN { ... }		# before any data
	/pattern/ { ... }	# when pattern is found
	(condition) { ... }	# when condition is true
	{ ... }			# always!
	END { ... }		# after all data

There can be multiple pattern matching and conditional
blocks, of course.

The common form is awk '{ ... }' to process all input lines.
For example, if you only want those which are not empty and
not comments, awk '/^[^#]/ { ... }' would be used; or only
lines with text over 10 characters: awk '(length > 10) { ... }',
or just the 5th line: awk '(NR == 5) { ... }'



> People assume that awk prints string literals like (ba)sh:
>  > echo "$1$2$3"
> and
>  > awk '{print $1$2$3}'
> both yield fields with nothing between them. So far so good, right? but:
>  > echo "$1,$2,$3"
> yields results with commas between them, but:
>  > awk '{print $1,$2,$3}'
> yields results with spaces.

Yes, this is a difference, but once you know it, and
especially if you want more precise control over the
output, you'll quickly resort to awk's printf() function
which works like in C.



> OK, so it's not like sh. Maybe it's like 
> Javascript then?
>  >  awk '{print $1+","+$2+","+$3}'
> ... nope, now all they get is a huge list of mostly zeros, because awk 
> doesn't overload operators.

Of course not, because that would be stupid. :-)

As I said, when you want concatenation with a custom
separator, use printf(): awk '{ printf("%s+%s+%s\n", $1, $2, $3); }'
which provides good flexibility; like in C, you can even
add formatting options for the arguments (see "man 3 printf"
for comparison), like string length manipulation, numeric
output format, or even the use of control characters.



> > Yes, this is true, but keep in mind what awk is: a "pattern-directed
> > scanning and processing language". If you want higher precision
> > math, use system("<math stuff>  | dc") and incorporate the result;
> > awk isn't really for math, but integer math is usually fine. :-)
> 
> Right, but it's just something that makes people shy away from awk, for 
> better or worse.

Reading "man awk" gives you a quite good introduction on
what awk is and what it can do, and of course what it cannot
do (or at least where it's bad at). Choosing the right tool
for the job is _key_ to writing good code. As awk is not
a "one size fits all" kind of tool, if you need to process
numbers with high precision, it's a bad tool. And when few
simple calls to grep, cut, sed, tr etc. will do the job
similarly well, those could be considered instead.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...


More information about the freebsd-questions mailing list