Optimize shell

Olivier Nicole on at cs.ait.ac.th
Mon Feb 6 20:30:29 PST 2006


I am setting up a machine to work as a mail back-up. It receives copy
of every email for every user. When the disk is almost full, I want to
delete older messages up to a total size of 4000000000.

Messages are stored in /home/sub_home/user/Maildir/cur in maildir

Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
where the first number is a Unix time stamp.

I came up with the following sheel to find the messages of all users,
sort them by date and compute the total size up to 4gB.

for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do
    /bin/rm $i

find /home -mindepth 5 -ls makes a list of all files and directory at
     a depth of 5 and more because my directory structure is so that
     messages are store at level 6

grep /Maildir/cur/ because courrierimapo tends to put things in other
     directories it creates when it needs too

These two commads give me a list of the form:

1397490    8 -rw-------    1 on               staff            3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th

where 3124 is the size

The sed command transforms the line into date, size, filname:

1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th

Then it sorts on the date field and awk is used to sum on the size
field and print the filename until the total of 4gB is reached.

That works OK, but it is damn slow: for 200 users, 7800 messages and
302MB it takes something like 3+ minutes... For 25 GB of email it
should take more than 4 hours, this is too much.

It sems that the long part is the sort:

without sort
time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' |  cat /dev/null
0.026u 0.035s 0:07.67 0.6%      51+979k 0+0io 0pf+0w

with sort
time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null
0.281u 0.366s 3:44.75 0.2%      39+1042k 0+0io 0pf+0w

Any idea how to speed up the things?

Thanks in advance,


More information about the freebsd-questions mailing list