Scripting question
Jonathan McKeown
jonathan+freebsd-questions at hst.org.za
Fri Sep 14 00:27:45 PDT 2007
On Thursday 13 September 2007 20:35, Roland Smith wrote:
> On Thu, Sep 13, 2007 at 10:16:40AM -0700, Kurt Buff wrote:
> > I'm trying to do some text file manipulation, and it's driving me nuts.
[snip]
> > I've looked at sort and uniq, and I've googled a fair bit but can't
> > seem to find anything that would do this.
> >
> > I don't have the perl skills, though that would be ideal.
> >
> > Any help out there?
>
> #!/usr/bin/perl
> while (<>) {
> # Assuming no whitespace in addresses; kill everything after the first
> # space
> s/ .*$//;
> # Store the name & count in a hash
> $names{$_}++;
> }
> # Go over the hash
> while (($name,$count) = each(%names)) {
> if ($count == 1) {
> # print unique names.
> print $name, "\n";
> }
> }
Another approach in Perl would be:
#!/usr/bin/perl
my (%names, %dups);
while (<>) {
my ($key) = split;
$dups{$key} = 1 if $names{$key};
$names{$key} = 1;
}
delete @names{keys %dups};
#
# keys %names is now an unordered list of only non-repeated elements
# keys %dups is an unordered list of only repeated elements
split splits on whitespace, returning a list of fields which can be assigned
to a list of variables. Here we only want to capture the first field: split
is more efficient for this than using a regex. The first occurrence of $key
is in parens because it's actually a list of one variable name.
We build two hashes, one, %name, keyed by the original names (this is the
classic way to reduce duplicates to single occurrences, since the duplicated
keys overwrite the originals), and one, %dup, whose keys are names already
appearing in %names - the duplicated entries. Having done that we use a hash
slice to delete from %names all the keys of %dups, which leaves the keys of
%names holding all the entries which only appear once (and the keys of %dups
all the duplicated entries if that's useful).
Jonathan
More information about the freebsd-questions
mailing list