ports/122524: www/links1 uses 7-bit us-ascii codepage only when using "-dump"

Alexander Zagrebin alexz at visp.ru
Mon Apr 7 10:40:03 UTC 2008


>Number:         122524
>Category:       ports
>Synopsis:       www/links1 uses 7-bit us-ascii codepage only when using "-dump"
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-ports-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 07 10:40:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Alexander Zagrebin
>Release:        7.0-RELEASE
>Organization:
-
>Environment:
>Description:
When running in the interactive mode, links 0.98 (www/links1) works fine.
But when it is used for dumping html page to stdout (links -dump ...), it always assumes us-ascii (7-bit) encoding for output.
So there are some problems, if html page uses non us-ascii encoding.
For example:
1. Some programs (mail/mutt, misc/mc etc.) can use the "links -dump ..." as html-to-text converter. When html has non us-ascii encoding, we are getting an unreadable output at most cases.
2. FreeBSD documentation project uses the links to convert html documentation to plain text version. So plain text documentation for, for example, ru_RU.KOI8-R is unreadable.
>How-To-Repeat:
Try to convert html source, containing 8-bit (or utf-8) characters, with
"links -dump source.html", and compare result with "links source.html"
The output from -dump will contain us-ascii characters only.
>Fix:
I have added -dump-codepage <codepage> command line parameter (see the patch).
It defines an output codepage, when links is running in the "dump" mode.
I use koi8-r encoding, and, with this patch applied, I can use links like
"links -dump -dump-codepage koi8-r source.html"



Patch attached with submission follows:

--- default.c.orig	2008-04-06 14:50:26.000000000 +0400
+++ default.c	2008-04-06 15:02:21.000000000 +0400
@@ -651,6 +651,19 @@
 	}
 }
 
+unsigned char *dump_codepage_rd(struct option *o, unsigned char *c)
+{
+	unsigned char *token;
+	int i;
+
+	if (!(token = get_token(&c))) return "Missing argument";
+	i = get_cp_index(token);
+	mem_free(token);
+	if (i == -1) return "Unknown codepage";
+	dump_codepage = i;
+	return NULL;
+}
+
 unsigned char *gen_cmd(struct option *o, unsigned char ***argv, int *argc)
 {
 	unsigned char *r;
@@ -783,6 +796,9 @@
   Write a plain-text version of the given HTML document to\n\
   stdout.\n\
 \n\
+ -dump-codepage <charset>\n\
+  Output codepage to be used for -dump\n\
+\n\
  -width <size>\n\
   Size of screen in characters, used in combination with -dump\n\
 \n\
@@ -840,6 +856,7 @@
 int base_session = 0;
 int dmp = 0;
 int force_html = 0;
+int dump_codepage = 0;
 
 int async_lookup = 1;
 int download_utime = 0;
@@ -896,6 +913,7 @@
 	1, force_html_cmd, NULL, NULL, 0, 0, NULL, NULL, "force-html",
 	1, dump_cmd, NULL, NULL, D_DUMP, 0, NULL, NULL, "dump",
 	1, dump_cmd, NULL, NULL, D_SOURCE, 0, NULL, NULL, "source",
+	1, gen_cmd, dump_codepage_rd, NULL, 0, 0, NULL, NULL, "dump-codepage",
 	1, gen_cmd, num_rd, num_wr, 0, 1, &async_lookup, "async_dns", "async-dns",
 	1, gen_cmd, num_rd, num_wr, 0, 1, &download_utime, "download_utime", "download-utime",
 	1, gen_cmd, num_rd, num_wr, 1, 16, &max_connections, "max_connections", "max-connections",
--- links.h.orig	2002-06-29 21:44:25.000000000 +0400
+++ links.h	2008-04-06 14:30:10.000000000 +0400
@@ -2003,6 +2003,7 @@
 extern int no_connect;
 extern int base_session;
 extern int force_html;
+extern int dump_codepage;
 
 #define D_DUMP		1
 #define D_SOURCE	2
--- main.c.orig	2002-06-29 21:44:25.000000000 +0400
+++ main.c	2008-04-06 14:48:55.000000000 +0400
@@ -201,7 +201,7 @@
 		o.xw = screen_width;
 		o.yw = 25;
 		o.col = 0;
-		o.cp = 0;
+		o.cp = dump_codepage;
 		ds2do(&dds, &o);
 		o.plain = 0;
 		o.frames = 0;


>Release-Note:
>Audit-Trail:
>Unformatted:



More information about the freebsd-ports-bugs mailing list