Script to clean text files

Kristian Vaaf vaaf at broadpark.no
Sun Feb 12 02:53:51 PST 2006


At 22:45 11.02.2006, Parv wrote:
>in message <7.0.1.0.2.20060211172807.0214a4b8 at broadpark.no>,
>wrote Kristian Vaaf thusly...
> >
> >
> > Among other things, this script is suppose to add an empty line at
> > the bottom of a file.
> >
> > But somehow it always removes the first line in a text file,
> > how do I stop this?
>
>Can you provide a small sample file complete w/ things that you
>want to remove?
>
>
> > #!/usr/local/bin/bash
> > #
> > #   Remove CRLF, trailing whitespace and double lines.
>
>What are "double lines"?
>
>
> > #   $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $
> > #
> > for file in `find -s . -type f -not -name ".*"`; do
> >       if file -b "$file" | grep -q 'text'; then
> >               echo >> "$file"
> >               perl -i -pe 's/\015$//' "$file"
> >               perl -i -pe 's/[^\S\n]+$//g' "$file"
>
>Why do you have two perl runs?  More importantly, you will remove
>anything which is not whitespace or not newline.  That means, in the
>end, you should have a file filled w/ whitespace only.
>
> >
> >               perl -pi -00 -e 1 "$file"
> >               echo "$file: Done"
> >       fi
> > done
>
>To remove CRLF, trailing whitespace, and 2 consecutive blank lines
>...
>
>   {
>     tr -d '\r' < "$file" \
>     | sed -E -e 's/[[:space:]]+$//' \
>     | cat -s - > "${file}.tmp"
>   } && mv -f "${file}.tmp" "$file"
>
>
>   - Parv
>
>--

Hello Parv!

Yes I meant blank lines :)

I've used the script for a long time now.
The only error is that it removes the top blank space, if any.

Which is a bit annoying. It's fine for scripts with shebangs but not
for custom laid out documents etc.

I just wanted to know where that error was.

I use the Perl runs because those were the only runs people gave me.
You know how it is, you enter a FreeBSD help channel and ask how you
do this or that, and the upper gentlemen always reply "Learn Perl," and
then they go on giving you Perl runs :)

Your suggestion looks very very good.

So is this alright?

#!/usr/local/bin/bash
#
#   Remove CRLF, trailing whitespace and blank lines.
#   $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $
#

for file in `find -s . -type f -not -name ".*"`; do

         if file -b "$file" | grep -q 'text'; then

                 echo >> "$file"

                 tr -d '\r' < "$file"
                 sed -E -e 's/[[:space:]]+$//'
                 cat -s - > "${file}.tmp" && mv -f "${file}.tmp" "$file"

                 echo "$file: Done"

         fi

done

All the best man,
Vaaf



More information about the freebsd-questions mailing list