Batch file question - average size of file in directory
Kurt Buff
kurt.buff at gmail.com
Wed Jan 3 10:48:06 PST 2007
On 1/3/07, Ian Smith <smithi at nimnet.asn.au> wrote:
> > Message: 17
> > Date: Tue, 2 Jan 2007 19:50:01 -0800
> > From: James Long <list at museum.rain.com>
>
> > > Message: 28
> > > Date: Tue, 2 Jan 2007 10:20:08 -0800
> > > From: "Kurt Buff" <kurt.buff at gmail.com>
>
> > > I don't even have a clue how to start this one, so am looking for a little help.
> > >
> > > I've got a directory with a large number of gzipped files in it (over
> > > 110k) along with a few thousand uncompressed files.
>
> If it were me I'd mv those into a bunch of subdirectories; things get
> really slow with more than 500 or so files per directory .. anyway ..
I just store them for a while - delete them after two weeks if they're
not needed again. The overhead isn't enough to worry about at this
point.
> > > I'd like to find the average uncompressed size of the gzipped files,
> > > and ignore the uncompressed files.
> > >
> > > How on earth would I go about doing that with the default shell (no
> > > bash or other shells installed), or in perl, or something like that.
> > > I'm no scripter of any great expertise, and am just stumbling over
> > > this trying to find an approach.
> > >
> > > Many thanks for any help,
> > >
> > > Kurt
> >
> > Hi, Kurt.
>
> And hi, James,
>
> > Can I make some assumptions that simplify things? No kinky filenames,
> > just [a-zA-Z0-9.]. My approach specifically doesn't like colons or
> > spaces, I bet. Also, you say gzipped, so I'm assuming it's ONLY gzip,
> > no bzip2, etc.
> >
> > Here's a first draft that might give you some ideas. It will output:
> >
> > foo.gz : 3456
> > bar.gz : 1048576
> > (etc.)
> >
> > find . -type f | while read fname; do
> > file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)"
> > done
>
> % file cat7/tuning.7.gz
> cat7/tuning.7.gz: gzip compressed data, from Unix
>
> Good check, though grep "gzip compressed" excludes bzip2 etc.
>
> But you REALLY don't want to zcat 110 thousand files just to wc 'em,
> unless it's a benchmark :) .. may I suggest a slight speedup, template:
>
> % gunzip -l cat7/tuning.7.gz
> compressed uncompr. ratio uncompressed_name
> 13642 38421 64.5% cat7/tuning.7
>
> > If you really need a script that will do the math for you, then
> > pip the output of this into bc:
> >
> > #!/bin/sh
> >
> > find . -type f | {
> >
> > n=0
> > echo scale=2
> > echo -n "("
> > while read fname; do
> - > if file $fname | grep -q "compressed"
> + if file $fname | grep -q "gzip compressed"
> > then
> - > echo -n "$(zcat $fname | wc -c)+"
> + echo -n "$(gunzip -l $fname | grep -v comp | awk '{print $2}')+"
> > n=$(($n+1))
> > fi
> > done
> > echo "0) / $n"
> >
> > }
> >
> > That should give you the average decompressed size of the gzip'ped
> > files in the current directory.
>
> HTH, Ian
Ah - yes, I think that's much better. I should have thought of awk.
At some point, I'd like to do a bit more processing of file sizes,
such as trying to find out the number of IP packets each file would
take during an SMTP transaction, so that I could categorize overhead a
bit, but for now the average uncompressed file size is good enough.
Thanks again for your help!
Kurt
More information about the freebsd-questions
mailing list