Attn: sed(1) regular expression gurus

Mon Jul 14 07:08:19 PDT 2003

Hi all.

I'm getting really frustrated by a seemingly simple problem. I'm doing
this under FreeBSD 4.5.

Given these portions of an e-mail's multi-line Received header as tests:

  by some.host.at.a.com (Postfix) with ESMTP id 3A4E07B03
  by some.host.at.a.com (8.11.6) ESMTP;
  by some.host.at.a.different.com (8.11.6p2/8.11.6) ESMTP;
  by some.host.at.another.com ([123.4.56.789]) id 3A4E07B03
  by some.host.at.yet.another.com (123.4.56.789) id 3A4E07B03

I want to isolate the addresses (one for the 1st through 3rd, two for
the 4th and 5th). Here's the sed(1) command I'm playing with:

  echo "by nospam.mc.mpls.visi.com (Postfix) with ESMTP id 3A4E07B03" \
      |sed -E \
        -e "s/by[[:space:]]+//" \
        -e "s/(\((\[?([0-9]{1,3}\.){3}[0-9]{1,3}\]?){0}\)|id|with|E?SMTP).*//"

In all cases, the parenthetical word is returned, when only the last
two should return the parenthetical word. The idea behind the first
branch of the second sed(1) command is to match anything that isn't a
"digits.digits.digits.digits" pattern. I've tried simpler expressions
like "\(\[?[^0-9.]+\]?\)", but it fails on the third example.

What the devil am I doing wrong?? Am I exercizing known bugs in GNU's
sed(1)? Can anyone dream up a different solution - please, no Perl, but
awk(1) is fine.

Thanks,
Dave

-- 
  ______________________                         ______________________
  \__________________   \    D. J. HAWKEY JR.   /   __________________/
     \________________/\     hawkeyd at visi.com    /\________________/
                      http://www.visi.com/~hawkeyd/