svn commit: r229794 - head/usr.bin/hexdump

Sun Jan 8 10:07:25 UTC 2012

> Log:
>  - Fix how hexdump parses escape strings
>  From the NetBSD bug:
>  The way how hexdump(1) parses escape sequences has some bugs.
>  It shows up when an escape sequence is used as the non-last character
>  of a format string.
>
>  PR:		bin/144722
>  Submitted by:	gcooper
>  Approved by:	rpaulo
>  Obtained from:	NetBSD
>  MFC after:	1 week

This has lots of style bugs.

> Modified: head/usr.bin/hexdump/parse.c
> ==============================================================================
> --- head/usr.bin/hexdump/parse.c	Sat Jan  7 22:29:46 2012	(r229793)
> +++ head/usr.bin/hexdump/parse.c	Sat Jan  7 23:15:21 2012	(r229794)
> @@ -255,7 +255,9 @@ rewrite(FS *fs)
> 					sokay = NOTOKAY;
> 			}
>
> -			p2 = p1 + 1;		/* Set end pointer. */
> +			p2 = *p1 ? p1 + 1 : p1;	/* Set end pointer -- make sure
> +						 * that it's non-NUL/-NULL first
> +						 * though. */
> 			cs[0] = *p1;		/* Set conversion string. */
> 			cs[1] = '\0';
>

Comments to the right of code should be used less than was already done here.
Large comments should never be to the right of code.  They cause formatting
problems, as the above shows.

The "it" in the comment doesn't match its subject.  Its subject is the end
pointer, but "it" is meant to refer to the first byte of the string.

Perhaps contractions like "it's" should be avoided in comments for the
same reason as in man pages (they are harder to parse for non-native
speakers).  And parsing this one shows that the "it" is wrong.

The comment doesn't match the code.  The code doesn't have any NULL pointer
check.  And a null pointer check makes no sense, since the old code
derefences p1 without checking for that, and the new code adds another
dereference.

The test of *p1 is a boolean test in non-boolean context.  But nearby
code has the same style bug, in the much more non-idiomatic form `!*p'.

Fixing these style bugs gives something like:

%%%
 			/*
 			 * Set end pointer.  First make sure that p1 is not
 			 * the empty string..
 			 */
 			p2 = *p1 == '\0' ? p1 + p1 + 1;

 			/* Set conversion string. */
 			cs[0] = *p1;
 			cs[1] = '\0';

%%%

Possible furthe improvements:
- some programmers can't would add unnecessary parentheses for the ?:
   operator.  Even I might add them.
- it might be clearer to set the conversion string first and then use
   cs[0] instead of *p1 twice.

> @@ -449,13 +451,21 @@ escape(char *p1)
> 	char *p2;
>
> 	/* alphabetic escape sequences have to be done in place */
> -	for (p2 = p1;; ++p1, ++p2) {
> -		if (!*p1) {
> -			*p2 = *p1;
> -			break;
> -		}
> -		if (*p1 == '\\')
> -			switch(*++p1) {
> +	for (p2 = p1; *p1; p1++, p2++) {
> +		/*
> +		 * Let's take a peak at the next item and see whether or not
> +		 * we need to escape the value...
> +		 */

This comment is very verbose, starting with the noise words "Let's".

This comment mispells "peek".  "Peek" is another chatty word.

The keyboard has excessive repeat on its "." key.  There is no reason
to use an ellipsis here.

> +		if (*p1 == '\\') {
> +
> +			p1++;
> +

The keyboard also dribbles newlines.  2 instances here.

> +			switch(*p1) {
> +			/* A standalone `\' */
> +			case '\0':

Previous comment is misplaced and misindented.  It should be here.  It
should be terminated by a ".".  Perhaps the backslash in it should be
quoted.

> +				*p2 = '\\';
> +				*++p2 = '\0';

It's not clear why this doesn't give a buffer overrun.  It writes past
the end of a string.  This is only safe if the caller allocated space
beyond the end of the string.  cs[] is allocated locally, and it is
clear that it has space for just 1 byte of expansion.

> +				break;
> 			case 'a':
> 			     /* *p2 = '\a'; */
> 				*p2 = '\007';
> @@ -482,7 +492,12 @@ escape(char *p1)
> 				*p2 = *p1;
> 				break;
> 			}
> +
> +		} else
> +			*p2 = *p1;
> +

2 more extra newline.

> 	}
> +

1 more extra newline.

> }
>
> void
>

Bruce