awk programming question

Thu Jan 23 20:46:03 UTC 2014

--On January 23, 2014 at 12:20:26 PM -0800 dteske at FreeBSD.org wrote:

>
>
>> -----Original Message-----
>> From: RW [mailto:rwmaillists at googlemail.com]
>> Sent: Thursday, January 23, 2014 10:56 AM
>> To: freebsd-questions at freebsd.org
>> Subject: Re: awk programming question
>>
>> On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote:
>>
>> > On Thu, 23 Jan 2014, Paul Schmehl wrote:
>> >
>> > > I'm kind of stubborn.  There's lots of different ways to skin a cat,
>> > > but I like to force myself to use the built-in utilities to do
>> > > things so I can learn more about them and better understand how they
>> > > work.
>> > >
>> > > So, I'm trying to parse a file of snort rules, extract two string
>> > > values and insert a double pipe between them to create a sig-msg.map
>> > > file
>> > >
>> > > Here's a typical rule:
>> > >
>> > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY
>> > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2;
>> > > classtype:bad-unknown; sid:2008120; rev:1;)
>> > >
>> > > Here's a typical sig-msg.map file entry:
>> > >
>> > > 9624 || RPC UNIX authentication machinename string overflow attempt
>> > > UDP
>> > >
>> > > So, from the above rule I would want to create a single line like
>> > > this:
>> > >
>> > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request
>> > >
>> > > There are several ways I can extract one or the other value, and
>> > > I've figured out how to extract the sid and add the double pipe, but
>> > > for the life of me I can't figure out how to extract and print out
>> > > sid || msg.
>> > >
>> > > This prints out the sid and the double pipe:
>> > >
>> > > echo `awk 'match($0,/sid:[0-9]*;/) {print substr($0,RSTART,RLENGTH)"
>> > > || "}' /tmp/mtc.rules | tr -d ";sid"
>> > >
>> > > It seems I could put the results into a variable rather than
>> > > printing them out, and then print var1 || var2, but my google foo
>> > > hasn't found a useful example.
>> > >
>> > > Surely there's a way to do this using awk?  I can use tr for
>> > > cleanup.  I just need to get close to the right result.
>> > >
>> > > How about it awk experts?  What's the cleanest way to get this done?
>> >
>> > Not an awk expert, but you can do math on the start and length
>> > variables to get just the date part:
>> >
>> > echo "sid:2008120;" \
>> >    | awk '{ match($0, /sid:[0-9]*;/) ; \
>> >  	ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }'
>> >
>> > Closer to what you want:
>> >
>> > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;'
>> > \ | awk '{ match($0, /sid:[0-9]*;/) ; \
>> >  	ymd=substr($0, RSTART+4, RLENGTH-5) ; \
>> >  	match($0, /msg:.*;/) ; \
>> >  	msg = substr($0, RSTART+4, RLENGTH-5) ; \
>> >  	print ymd, "||", msg }'
>> >
>> > Note the error that the too-greedy regex creates, and the inability of
>> > awk to capture regex sub-expressions.  awk does not have a way to
>> > reduce the greediness, at least that I'm aware.  You may be able to
>> > work around that, like if the message is always the same length.
>>
>>
>> $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;'
>> | \
>>  awk '{ match($0, /sid:[0-9]+;/) ;  ymd=substr($0, RSTART+4, RLENGTH-5) ;
> \
>>       match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ;
> \
>>       print ymd, "||", msg }'
>>
>> 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request"
>>
>> Note that awk supports +, but not newfangled things like *.
>
> With respect to regex, what awk really needs is the quantifier syntax...
>
> * = {0,} = zero or more
> + = {1,} = one or more
> {x,y} = any quantity from x inclusively up to y
> {x,} = any quantity from x or more
>
> sed supports it -- e.g., echo "aaa" | sed -e 's/a\{1,2\}//' # produces "a"
> sed -E (aka sed -r) supports it -- e.g., echo "aaa" | sed -E 's/a{1,2}//'
> # produces "a"
> grep supports it -- e.g., echo "aaa" | grep 'a\{2,\}' # match printed
> grep -E (aka egrep) supports it -- e.g., echo "aaa" | grep -E 'a{2,}' #
> match printed
> perl supports it -- obviously (in the modern regex form, lacking
> backslash) nvi supports it -- e.g., :%s/a\{1,2\}//
> vim supports it -- obviously (and uses the backslash form; even with
> noncompatible set)
>
> onetrueawk however does NOT support it -- example given...
> echo aaa | awk '/a{2,}/{print}' # no match printed
> echo aaa | awk '/a\{2,\}/{print}' # no match printed
>
> There's a couple of other nits here...
>
> 1. sig-msg.map file according to OP shouldn't have the quotes that are
> present from the snort rule input
> 2. Doesn't ignore lines of disinterest (See http://oreilly.com/pub/h/1393)
> NB: The result code of match() is ignored; I don't think the program
> should output
> known bad sig-msg.map lines (where an sid is not given, for example; which
> appears
> to be the key for the sig-msg.map file).
>
> I gather that a more complete solution would be as follows:
>
> awk '!/^[[:space:]]*(#|$)/{if (!match($0,
> /[[:space:](;]sid:[[:space:]]*[0-9]/)) next; sid = substr($0, RSTART +
> RLENGTH - 1); sub(/[^0-9].*/, "", sid); if (!match($0,
> /[[:space:](;]msg:[[:space:]]*/)) next; buf = substr($0, RSTART +
> RLENGTH); quoted = substr(buf, 0, 1) == "\""; split(buf, msg, quoted ?
> "\"" : FS); print sid, "||", msg[quoted ? 2 : 1]}' rules_file
>
> Where "rules_file" is the name of the file you want to parse.
>
> Putting this into a script, we can clean it up so that it's readable...
>
># !/bin/sh
> awk '
> !/^[[:space:]]*(#|$)/ {
> 	if (!match($0, /[[:space:](;]sid:[[:space:]]*[0-9]/)) next
> 	sid = substr($0, RSTART + RLENGTH - 1)
> 	sub(/[^0-9].*/, "", sid)
> 	if (!match($0, /[[:space:](;]msg:[[:space:]]*/)) next
> 	buf = substr($0, RSTART + RLENGTH)
> 	quoted = substr(buf, 0, 1) == "\""
> 	split(buf, msg, quoted ? "\"" : FS)
> 	print sid, "||", msg[quoted ? 2 : 1]
> }' "$@"

Thanks so much!  In the end I opted to use perl, because i had more 
pressing matters to attend to, but I'm please to know that it's doable with 
awk, and I will test your script (and endeavor to more fully understand it) 
when I have the time to do so.

-- 
Paul Schmehl, Senior Infosec Analyst
As if it wasn't already obvious, my opinions
are my own and not those of my employer.
*******************************************
"It is as useless to argue with those who have
renounced the use of reason as to administer
medication to the dead." Thomas Jefferson
"There are some ideas so wrong that only a very
intelligent person could believe in them." George Orwell