sed(1) regular expression gurus

Rob listone at deathbeforedecaf.net
Mon Jul 14 08:19:51 PDT 2003


OK, here's a solution using awk - may be possible in sed, but awk has
more control statements for this kind of thing:

awk --posix -F'[^0-9A-Za-z.]+' '
  $1 ~ /by/ { result = $2
    for (i=3; i<=NF; i++) {
      if ($i ~ /^([0-9]+\.){3}[0-9]+$/) {
        result = result " " $i
      }
    }
  print result
  }'

* Use the field separator to throw away anything that isn't a number,
letter or periodic - don't have to worry about brackets anymore
* Match lines starting with 'by' and save the second word (which should
be a hostname)
* Check the following words - if they match an IP address, they're saved
too
* Then print the result!

There may be 'neater' ways of doing it, but it's the most concise
example I could come up with.

You need to include the --posix option to get the '{3}' notation to work
(peculiar to GNU awk).

----- Original Message -----
From: "D J Hawkey Jr" <hawkeyd at visi.com>
Subject: Attn: sed(1) regular expression gurus


> Hi all.
>
> I'm getting really frustrated by a seemingly simple problem. I'm doing
> this under FreeBSD 4.5.
>
> Given these portions of an e-mail's multi-line Received header as
tests:
>
>   by some.host.at.a.com (Postfix) with ESMTP id 3A4E07B03
>   by some.host.at.a.com (8.11.6) ESMTP;
>   by some.host.at.a.different.com (8.11.6p2/8.11.6) ESMTP;
>   by some.host.at.another.com ([123.4.56.789]) id 3A4E07B03
>   by some.host.at.yet.another.com (123.4.56.789) id 3A4E07B03
>
> I want to isolate the addresses (one for the 1st through 3rd, two for
> the 4th and 5th). Here's the sed(1) command I'm playing with:
>
>   echo "by nospam.mc.mpls.visi.com (Postfix) with ESMTP id 3A4E07B03"
\
>       |sed -E \
>         -e "s/by[[:space:]]+//" \
>         -e
"s/(\((\[?([0-9]{1,3}\.){3}[0-9]{1,3}\]?){0}\)|id|with|E?SMTP).*//"
>
> In all cases, the parenthetical word is returned, when only the last
> two should return the parenthetical word. The idea behind the first
> branch of the second sed(1) command is to match anything that isn't a
> "digits.digits.digits.digits" pattern. I've tried simpler expressions
> like "\(\[?[^0-9.]+\]?\)", but it fails on the third example.
>
> What the devil am I doing wrong?? Am I exercizing known bugs in GNU's
> sed(1)? Can anyone dream up a different solution - please, no Perl,
but
> awk(1) is fine.
>
> Thanks,
> Dave
>
> --
>   ______________________
______________________
>   \__________________   \    D. J. HAWKEY JR.   /
__________________/
>      \________________/\     hawkeyd at visi.com    /\________________/
>                       http://www.visi.com/~hawkeyd/
>
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
"freebsd-questions-unsubscribe at freebsd.org"
>



More information about the freebsd-questions mailing list