bin/84992: gcc4.x cleanup of usr.bin/hexdump

Bruce Evans bde at zeta.org.au
Wed Aug 17 03:00:47 GMT 2005


The following reply was made to PR bin/84992; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Divacky Roman <xdivac02 at stud.fit.vutbr.cz>
Cc: FreeBSD-gnats-submit at freebsd.org, freebsd-bugs at freebsd.org
Subject: Re: bin/84992: gcc4.x cleanup of usr.bin/hexdump
Date: Wed, 17 Aug 2005 12:52:20 +1000 (EST)

 On Tue, 16 Aug 2005, Divacky Roman wrote:
 
 >> Description:
 > gcc4.x (tested with gcc41) cleanup of usr.bin/hexdump
 >> How-To-Repeat:
 > apply the patch
 >> Fix:
 >
 > Index: conv.c
 > ===================================================================
 > RCS file: /home/ncvs/src/usr.bin/hexdump/conv.c,v
 > retrieving revision 1.8
 > diff -u -r1.8 conv.c
 > --- conv.c	16 Jul 2004 11:07:07 -0000	1.8
 > +++ conv.c	11 Aug 2005 13:22:00 -0000
 > @@ -103,7 +103,7 @@
 > 	if (odmode && MB_CUR_MAX > 1) {
 > 		oclen = 0;
 > retry:
 > -		clen = mbrtowc(&wc, p, bufsize, &pr->mbstate);
 > +		clen = mbrtowc(&wc, (const char *)p, bufsize, &pr->mbstate);
 
 Adding casts is a cleandown.  p should have type "char *" to begin with.
 If p actually needs to be "unsigned char *" then it is unclear that
 mbrtowc() can handle it correctly and casting it to "char *" breaks the
 warning about this.
 
 The possible critical differences between unsigned chars and signed chars
 are especially clear in hexdump.  hexdump wants to print raw bytes so it
 should use unsigned chars just like it already does except without the
 type mismatches from passes pointers to string functions that want signed
 chars.  Signed chars may have any number of padding bits that wouldn't be
 printed right if the bytes are accessed as signed chars.  The padding
 bits may contain any number of trap representations which would abort the
 program before it prints anything.  Signed chars must have a single sign
 bit, but it may be anywhere and may be printed weirdly.  See section
 6.2.6.2 of the draft C99 standard.
 
 It is clear that string functions can't in general handle arrays of
 unsigned chars via type punning as above, since the unsigned chars may
 have trap representations when punned to signed chars -- explicit
 conversion at the level of chars is required, although it might not
 work right for hexdump since the conversion might lose bits (it _must_
 lose any trap bits).  What is unclear is whether the problems can
 actually occur, even in theory.  Note that the above code is only
 executed in the odmode case due to the "hd" mode wanting to be rawer.
 
 > 		if (clen == 0)
 > 			clen = 1;
 > 		else if (clen == (size_t)-1 || (clen == (size_t)-2 &&
 > @@ -118,7 +118,7 @@
 > 			 * can complete it.
 > 			 */
 > 			oclen = bufsize;
 > -			bufsize = peek(p = peekbuf, MB_CUR_MAX);
 > +			bufsize = peek(p = (u_char *)peekbuf, MB_CUR_MAX);
 > 			goto retry;
 > 		}
 > 		clen += oclen;
 
 Similarly, in the reverse direction.  peek() wants unsigned chars so
 using possibly-signed chars for its buffers is nonsense.  It is unclear
 if peek() actually needs unsigned chars.
 
 > Index: parse.c
 > ===================================================================
 > RCS file: /home/ncvs/src/usr.bin/hexdump/parse.c,v
 > retrieving revision 1.13
 > diff -u -r1.13 parse.c
 > --- parse.c	22 Jul 2004 13:14:42 -0000	1.13
 > +++ parse.c	11 Aug 2005 13:22:00 -0000
 > @@ -54,7 +54,7 @@
 > void
 > addfile(char *name)
 > {
 > -	unsigned char *p;
 > +	char *p;
 > 	FILE *fp;
 > 	int ch;
 > 	char buf[2048 + 1];
 > @@ -79,7 +79,7 @@
 > void
 > add(const char *fmt)
 > {
 > -	unsigned const char *p, *savep;
 > +	const char *p, *savep;
 > 	static FS **nextfs;
 > 	FS *tfs;
 > 	FU *tfu, **nextfu;
 
 Changing the basic type to plain char gives bugs even on normal machines
 (2's complement with 8-bit chars).  The data will have been read using
 read(2) or passed on the command line via char **argv  so it naturally
 has type char, but hexdump depends critically on the reverse of the
 bogus casts above -- it puns the plain chars to unsigned chars to get
 rid of the sign bit.  Here p points into the format string so the type
 pun is more bogus than for file data, but it is intentional -- p had
 type plain "char *" in rev.1.1, but it was changed to "unsigned char
 *" in rev.1.3 to work around many sign extension bugs that occur in
 practice.  The first one in add() is the well-known bug of using
 isspace() on a possibly-signed char.  This gives undefined behaviour
 if the value is negative but different from EOF, and bogus defined
 behaviour if the value is EOF.  The format string can be specified on
 the command line so it is easy for users who use more than the ASCII
 character set to put a negative char in it.
 
 [... Similarly to undo most of the work-arounds in rev.1.3].
 
 Bruce


More information about the freebsd-bugs mailing list