Script to merge mailinglist archives

Giorgos Keramidas keramida at ceid.upatras.gr
Thu Jan 27 10:07:12 PST 2005


On 2005-01-27 19:40, Mikko Heiskanen <mikko at whitecortex.net> wrote:
> Thanks everyone for the answers, but none of you really understood what
> I meant :)
> So let me rephrase.
> I have archived mailinglist mbox format files on my hd, downloaded from
> http://docs.freebsd.org/mail/
>
> #ls
> 20021215.freebsd-questions 20040125.freebsd-questions
> 20021222.freebsd-questions 20040201.freebsd-questions
> 20021229.freebsd-questions 20040208.freebsd-questions
> 20021231.freebsd-questions 20040215.freebsd-questions
> 20030105.freebsd-questions 20040222.freebsd-questions
> 20030112.freebsd-questions 20040229.freebsd-questions
> 20030119.freebsd-questions 20040307.freebsd-questions
> 20030126.freebsd-questions 20040314.freebsd-questions
> 20030202.freebsd-questions 20040321.freebsd-questions
> 20030209.freebsd-questions 20040328.freebsd-questions
> 20030211.freebsd-questions 20040404.freebsd-questions
> 20030216.freebsd-questions 20040411.freebsd-questions
> 20030223.freebsd-questions 20040418.freebsd-questions
> 20030302.freebsd-questions 20040425.freebsd-questions
> 20030309.freebsd-questions 20040502.freebsd-questions
> 20030316.freebsd-questions 20040509.freebsd-questions
> 20030323.freebsd-questions 20040802.freebsd-questions
> 20030330.freebsd-questions 20041101.freebsd-questions
> 20030406.freebsd-questions 20041502.freebsd-questions
> 20030413.freebsd-questions 20041801.freebsd-questions
> 20030420.freebsd-questions 20042501.freebsd-questions
>
> Now as you can see, there are multiple mbox files per month.  I would
> like to make a one-liner/script which would merge all the mboxes of
> every month to a single mbox representing that month.

A script is called for.  Try running the following mini script in the
directory shown above; it will give you an idea of how joining multiple
files may be done:

     1  #!/bin/sh
     2
     3  if [ $# -lt 1 ] || [ $# -gt 3 ]; then
     4          echo "usage: $(basename $0) listname [year [month]]" >&2
     5          exit 1
     6  fi
     7
     8  list="$1"
     9  year="$2"
    10  month="$3"
    11
    12  for fname in ${year}${month}*.${list} ;do
    13          if [ -f "${fname}" ]; then
    14                  echo "${fname}"
    15          fi
    16  done

If you want to discover all the different months, it may be as easy as:

    % cd ~/mail-archive
    % ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].* | \
	cut -c 1-6 | sort | uniq | sed -e 's/^..../& /'
    2002 12
    2003 01
    2003 02
    2003 03
    2003 04
    2004 01
    2004 02
    2004 03
    2004 04
    2004 05
    2004 08
    2004 11
    2004 15
    2004 18
    2004 25


Once you have the list of year and month pairs, you can feed it to a
while loop that "generates" a shel script to call the "jlist" script
shown above for each pair:

    % ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].* | \
	cut -c 1-6 | sort | uniq | sed -e 's/^..../& /' | \
	while read year month ;do
	    echo sh jlist "$year" "$month"
	done
    sh jlist 2002 12
    sh jlist 2003 01
    sh jlist 2003 02
    sh jlist 2003 03
    sh jlist 2003 04
    sh jlist 2004 01
    sh jlist 2004 02
    sh jlist 2004 03
    sh jlist 2004 04
    sh jlist 2004 05
    sh jlist 2004 08
    sh jlist 2004 11
    sh jlist 2004 15
    sh jlist 2004 18
    sh jlist 2004 25

Then, pipe this final output into an sh(1) invocation and let it do all
the magic "jlist" knows doing for each pair of year/month ;-)

- Giorgos



More information about the freebsd-questions mailing list