bin/140976: comm(1) mishandles lines with tabs
darcy at NetBSD.org
Sat Nov 28 17:00:04 UTC 2009
>Synopsis: comm(1) mishandles lines with tabs
>Arrival-Date: Sat Nov 28 17:00:03 UTC 2009
>Originator: D'Arcy Cain
FreeBSD shell.vex.net 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May 1 08:49:13 UTC 2009 root at walker.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386
If an input file contains tabs it may not be handled correctly. In fact, the problem would happen with any character that compares lower than newline.
Run this script. The two tests should print the same thing.
trap "rm -f $TEST.*" 0
cat << MOUSE > $TEST.1a
e f g
cat << MOUSE > $TEST.1b
e f g
tr ' ' '\t' < $TEST.1a > $TEST.2a
tr ' ' '\t' < $TEST.1b > $TEST.2b
echo "Test 1 (spaces) output:"
comm -12 $TEST.1a $TEST.1b
echo "Test 2 (tabs) output:"
comm -12 $TEST.2a $TEST.2b
http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/comm/comm.c.diff?r1=1.17&r2=1.18&only_with_tag=MAIN&f=h is how I fixed it on NetBSD but you have a much different version of comm.c. The basic fix is to not read the newline. The newline is the separator between lines, not part of the line and including it causes it to be erroneously included in the comparisons. sort(1) gets this right and that's where the problem occurs. comm(1) does not agree with the sorting criteria.
In NetBSD current there is a library function called getline which more or less does what the getline included in comm.c does except that it doesn't return the newline. Perhaps you should pull that in and use it instead. Don't forget to change your printf statements to add the newline.
More information about the freebsd-bugs