bin/143369: awk(1) doesn't handle RS as a regexp but as a single
character
Mikolaj Golub
to.my.trociny at gmail.com
Sat Jan 30 11:30:01 UTC 2010
>Number: 143369
>Category: bin
>Synopsis: awk(1) doesn't handle RS as a regexp but as a single character
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Jan 30 11:30:00 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator: Mikolaj Golub
>Release: 8.0-STABLE, 7.2-STABLE
>Organization:
>Environment:
FreeBSD zhuzha.ua1 8.0-STABLE FreeBSD 8.0-STABLE #6: Sun Jan 24 21:36:17 EET 2010 root at zhuzha.ua1:/usr/obj/usr/src/sys/GENERIC i386
>Description:
This problem with awk(1) was reported to NetBSD by John Darrow and it was fixed there.
awk allows a complete string to be put into the RS variable, but does not treat that string as a regular expression for record splitting purposes - instead, it splits only on the first character of the string.
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30294
FreeBSD has the same problem and it would be nice to fix this.
>How-To-Repeat:
zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS=" ";} {print $0}'
a
b
c
d
zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}'
a b c d
zhuzha:~% echo 'a[b[c[d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}'
a
b
c
d
>Fix:
See the attached patch adopted from NetBSD (PR/30294: John Darrow: nawk doesn't handle RS as a RE but as a single character).
Patch attached with submission follows:
diff -ru contrib/one-true-awk.orig/lib.c contrib/one-true-awk/lib.c
--- contrib/one-true-awk.orig/lib.c 2007-10-25 15:38:02.000000000 +0300
+++ contrib/one-true-awk/lib.c 2010-01-30 13:04:13.000000000 +0200
@@ -194,22 +194,62 @@
;
if (c != EOF)
ungetc(c, inf);
- }
- for (rr = buf; ; ) {
- for (; (c=getc(inf)) != sep && c != EOF; ) {
- if (rr-buf+1 > bufsize)
- if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 1"))
- FATAL("input record `%.30s...' too long", buf);
+ } else if ((*RS)[1]) {
+ fa *pfa = makedfa(*RS, 1);
+ int tempstat = pfa->initstat;
+ char *brr = buf;
+ char *rrr = NULL;
+ int x;
+ for (rr = buf; ; ) {
+ while ((c = getc(inf)) != EOF) {
+ if (rr-buf+3 > bufsize)
+ if (!adjbuf(&buf, &bufsize, 3+rr-buf,
+ recsize, &rr, "readrec 2"))
+ FATAL("input record `%.30s...'"
+ " too long", buf);
+ *rr++ = c;
+ *rr = '\0';
+ if (!(x = nematch(pfa, brr))) {
+ pfa->initstat = tempstat;
+ if (rrr) {
+ rr = rrr;
+ ungetc(c, inf);
+ break;
+ }
+ } else {
+ pfa->initstat = 2;
+ brr = rrr = rr = patbeg;
+ }
+ }
+ if (rrr || c == EOF)
+ break;
+ if ((c = getc(inf)) == '\n' || c == EOF)
+ /* 2 in a row */
+ break;
+ *rr++ = '\n';
+ *rr++ = c;
+ }
+ } else {
+ for (rr = buf; ; ) {
+ for (; (c=getc(inf)) != sep && c != EOF; ) {
+ if (rr-buf+1 > bufsize)
+ if (!adjbuf(&buf, &bufsize, 1+rr-buf,
+ recsize, &rr, "readrec 1"))
+ FATAL("input record `%.30s...'"
+ " too long", buf);
+ *rr++ = c;
+ }
+ if (**RS == sep || c == EOF)
+ break;
+ if ((c = getc(inf)) == '\n' || c == EOF)
+ /* 2 in a row */
+ break;
+ if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr,
+ "readrec 2"))
+ FATAL("input record `%.30s...' too long", buf);
+ *rr++ = '\n';
*rr++ = c;
}
- if (**RS == sep || c == EOF)
- break;
- if ((c = getc(inf)) == '\n' || c == EOF) /* 2 in a row */
- break;
- if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr, "readrec 2"))
- FATAL("input record `%.30s...' too long", buf);
- *rr++ = '\n';
- *rr++ = c;
}
if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3"))
FATAL("input record `%.30s...' too long", buf);
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list