docs/50211: [PATCH] doc.docbook.mk: fix textfile creation

Jeroen Ruigrok van der Werven asmodai at in-nomine.org
Sun May 13 15:20:10 UTC 2007


The following reply was made to PR docs/50211; it has been noted by GNATS.

From: Jeroen Ruigrok van der Werven <asmodai at in-nomine.org>
To: bug-followup at FreeBSD.org
Cc:  
Subject: Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Date: Sun, 13 May 2007 16:59:23 +0200

 A long overdue update I guess.
 
 Neither links or elinks will help for the multibyte environments of Chinese,
 Japanese, Korean and the likes. They simply do not understand encodings such
 as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.
 
 Using www/w3m-m17n I can at least view Japanese pages.
 Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
 resulting file is an UTF-8 encoded plain text file.
 
 The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
 (Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
 KOI8-U (Russian).
 
 I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
 all works fine.
 
 So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.
 
 -- 
 Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
 イェルーン ラウフロック ヴァン デル ウェルヴェン
 http://www.in-nomine.org/ | http://www.rangaku.org/
 Reality is an illusion, grimmer. The dreamlands are like masks within
 masks, and Time has no dominion beyond the Shroud...



More information about the freebsd-doc mailing list