Script to clean text files
Kristian Vaaf
vaaf at broadpark.no
Sun Feb 12 02:53:51 PST 2006
At 22:45 11.02.2006, Parv wrote:
>in message <7.0.1.0.2.20060211172807.0214a4b8 at broadpark.no>,
>wrote Kristian Vaaf thusly...
> >
> >
> > Among other things, this script is suppose to add an empty line at
> > the bottom of a file.
> >
> > But somehow it always removes the first line in a text file,
> > how do I stop this?
>
>Can you provide a small sample file complete w/ things that you
>want to remove?
>
>
> > #!/usr/local/bin/bash
> > #
> > # Remove CRLF, trailing whitespace and double lines.
>
>What are "double lines"?
>
>
> > # $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $
> > #
> > for file in `find -s . -type f -not -name ".*"`; do
> > if file -b "$file" | grep -q 'text'; then
> > echo >> "$file"
> > perl -i -pe 's/\015$//' "$file"
> > perl -i -pe 's/[^\S\n]+$//g' "$file"
>
>Why do you have two perl runs? More importantly, you will remove
>anything which is not whitespace or not newline. That means, in the
>end, you should have a file filled w/ whitespace only.
>
> >
> > perl -pi -00 -e 1 "$file"
> > echo "$file: Done"
> > fi
> > done
>
>To remove CRLF, trailing whitespace, and 2 consecutive blank lines
>...
>
> {
> tr -d '\r' < "$file" \
> | sed -E -e 's/[[:space:]]+$//' \
> | cat -s - > "${file}.tmp"
> } && mv -f "${file}.tmp" "$file"
>
>
> - Parv
>
>--
Hello Parv!
Yes I meant blank lines :)
I've used the script for a long time now.
The only error is that it removes the top blank space, if any.
Which is a bit annoying. It's fine for scripts with shebangs but not
for custom laid out documents etc.
I just wanted to know where that error was.
I use the Perl runs because those were the only runs people gave me.
You know how it is, you enter a FreeBSD help channel and ask how you
do this or that, and the upper gentlemen always reply "Learn Perl," and
then they go on giving you Perl runs :)
Your suggestion looks very very good.
So is this alright?
#!/usr/local/bin/bash
#
# Remove CRLF, trailing whitespace and blank lines.
# $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $
#
for file in `find -s . -type f -not -name ".*"`; do
if file -b "$file" | grep -q 'text'; then
echo >> "$file"
tr -d '\r' < "$file"
sed -E -e 's/[[:space:]]+$//'
cat -s - > "${file}.tmp" && mv -f "${file}.tmp" "$file"
echo "$file: Done"
fi
done
All the best man,
Vaaf
More information about the freebsd-questions
mailing list