docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Jeroen Ruigrok van der Werven
asmodai at in-nomine.org
Sun May 13 15:20:10 UTC 2007
The following reply was made to PR docs/50211; it has been noted by GNATS.
From: Jeroen Ruigrok van der Werven <asmodai at in-nomine.org>
To: bug-followup at FreeBSD.org
Subject: Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Date: Sun, 13 May 2007 16:59:23 +0200
A long overdue update I guess.
Neither links or elinks will help for the multibyte environments of Chinese,
Japanese, Korean and the likes. They simply do not understand encodings such
as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.
Using www/w3m-m17n I can at least view Japanese pages.
Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
resulting file is an UTF-8 encoded plain text file.
The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
(Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
all works fine.
So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
ã¤ã§ã«ã¼ã³ ã©ã¦ãããã¯ ã´ã¡ã³ ãã« ã¦ã§ã«ã´ã§ã³
http://www.in-nomine.org/ | http://www.rangaku.org/
Reality is an illusion, grimmer. The dreamlands are like masks within
masks, and Time has no dominion beyond the Shroud...
More information about the freebsd-doc