awk programming question

Thu Jan 23 21:57:28 UTC 2014

> -----Original Message-----
> From: RW [mailto:rwmaillists at googlemail.com]
> Sent: Thursday, January 23, 2014 1:34 PM
> To: freebsd-questions at freebsd.org
> Subject: Re: awk programming question
> 
> On Thu, 23 Jan 2014 13:57:03 -0700 (MST) Warren Block wrote:
> 
> > On Thu, 23 Jan 2014, dteske at FreeBSD.org wrote:
> >
> > >> From: RW [mailto:rwmaillists at googlemail.com]
> > >> Note that awk supports +, but not newfangled things like *.
> > >
> > > With respect to regex, what awk really needs is the quantifier
> > > syntax...
> > >
> > > * = {0,} = zero or more
> > > + = {1,} = one or more
> > > {x,y} = any quantity from x inclusively up to y {x,} = any quantity
> > > from x or more
> >
> > I think RW meant to type that awk did not have the newfangled "?" for
> > non-greedy matches.
> 
> No I meant it doesn't support *, which had been used in all the previous
awk
> examples in this thread, and would have been interpreted as a literal "*".
> 
> $ echo "sid:2008120; re" | awk ' {match($0,/[0-9]+/) ; \
>         s=substr($0,RSTART,RLENGTH) ; print "_",s,"_"} '
> _ 2008120 _
> 21:12 (bob) ~
> $ echo "sid:2008120; re" | awk ' {match($0,/[0-9]*/) ; \
>         s=substr($0,RSTART,RLENGTH) ; print "_",s,"_"} '
> _  _
> 

Awk does support "*" but you have to give match() something
to "anchor" to. For example...

$ echo "sid:2008120; re" | awk '{match($0,/[0-9][0-9]*/); \
	s=substr($0,RSTART,RLENGTH); print "_",s,"_"}'
_ 2008120 _

> 
> On Thu, 23 Jan 2014 12:20:26 -0800
> dteske at FreeBSD.org wrote:
> 
> > 1. sig-msg.map file according to OP shouldn't have the quotes that are
> > present from the snort rule input 2. Doesn't ignore lines of
> > disinterest
> 
> I know nothing about snort - I was just going on the previous posts, but
> FWIW removing the quotes is just a matter of changing:
> 
>     msg = substr($0,RSTART+4, RLENGTH-5)
> 
> to
> 
>     msg = substr($0,RSTART+5, RLENGTH-6)

The match() that preceded that (going back in the thread) was:

	match($0, /msg:[^;]+;/

That was bad for a couple reasons.

1. The msg could be the last property in the ( ... ) set, meaning the msg
would
either be NULL or contain too much (go beyond the terminating parenthetical
and on to the next semi-colon

2. If the msg is not double-quoted, you'll end up shaving the first and last
byte
unexpectedly.

Hence why I test the first byte of the msg and then split on a conditional
field
separator, later extracting the appropriate field based on traditional
parsing
logic (which a little googling helped when it came to finding out how simple
vs.
complex the snort rules file was).
-- 
Devin

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.