what's the easiest way to de-html-ize files?
Gary Kline
kline at tao.thought.org
Wed May 16 01:35:13 UTC 2007
On Tue, May 15, 2007 at 03:34:14PM +1000, Ian Smith wrote:
> On Sat, 12 May 2007 14:34:52 -0700 Gary Kline <kline at tao.thought.org> wrote:
> > On Mon, May 14, 2007 at 12:09:07PM -0700, Chuck Swiger wrote:
> > > On May 12, 2007, at 12:54 PM, Gary Kline wrote:
> > > >This is for those of us who appreciate ASCII or straight
> > > > ISO_8859-15 rather than marked up files. I have slapped together
> > > > a crude C program that does scotch (or *cleanse*) text of
> > > > <B></B> and so on. Still... is there some standalone converter
> > > > that gets rids of markup more elegantly? Something where i
> > > > can say
> > > >
> > > > % cmd file_1.html ... file_N.html and output file_1.text ...
> > > > file_N.text?
> > >
> > > Perhaps:
> > >
> > > lynx -dump file1.html ... > file.text
> > >
> > > ...?
> >
> > Hm, maybe Ineed Bill Campbell's -force_html switch.
> >
> > Yes, seems that way. USing just -dump got most of them, but
> > using the -force_html caught all. Need to script something to
> > reformat, but the worst of it's done!
>
> Also, if using Mozilla (so, I would assume, Firefox) the 'Save Page As'
> dialog offers a picklist for 'Files of Type' that includes 'Text Files'.
>
> This does a pretty decent job of producing text from HTML files, and is
> quicker than firing up lynx (or links) if you're already viewing a page.
Oh sure; I've been saving html in text, ascii/8859-1 for years.
But what I've got, and there are more saved **somewhere**, are
files that are saved by default in markup. I have a slew of
these on different boxen and have been moving then to one place.
Problem is: how to de-html the bunch.
I'm too lazy to write something that would automate what Can be
automated--markup like "&foo;" are problematic. So probably the
easiest way would be to create a dehtml.sh script that is just a
wrapper around lynx.
I don't think I'm the only hacker who wants just-plain-ascii, so
this might mak a good project for somebody who's new to C or
perl. That's my two pennies' worth!
gary
>
> Cheers, Ian
>
--
Gary Kline kline at thought.org www.thought.org Public Service Unix
More information about the freebsd-questions
mailing list