git: f32a6403d346 - main - Merge one true awk from 2024-01-22 for the Awk Second Edition support
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 29 Feb 2024 17:46:20 UTC
The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=f32a6403d34654ac6e61182d09abb5e85850e1ee commit f32a6403d34654ac6e61182d09abb5e85850e1ee Merge: 73157ce4982e e8a605e129c6 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2024-02-28 15:16:16 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2024-02-29 17:42:06 +0000 Merge one true awk from 2024-01-22 for the Awk Second Edition support This brings in Unicode support, CSV support and a number of bug fixes. They are described in _The AWK Programming Language_, Second Edition, by Al Aho, Brian Kernighan, and Peter Weinberger (Addison-Wesley, 2024, ISBN-13 978-0138269722, ISBN-10 0138269726). Sponsored by: Netflix contrib/one-true-awk/FIXES | 1429 ++------------------ contrib/one-true-awk/FIXES.1e | 1429 ++++++++++++++++++++ contrib/one-true-awk/README.md | 80 +- contrib/one-true-awk/awk.1 | 34 +- contrib/one-true-awk/awk.h | 23 +- contrib/one-true-awk/awkgram.y | 49 +- contrib/one-true-awk/b.c | 409 ++++-- contrib/one-true-awk/bugs-fixed/REGRESS | 8 +- .../one-true-awk/bugs-fixed/getline-corruption.awk | 5 + .../one-true-awk/bugs-fixed/getline-corruption.in | 1 + .../one-true-awk/bugs-fixed/getline-corruption.ok | 1 + contrib/one-true-awk/bugs-fixed/matchop-deref.awk | 11 + contrib/one-true-awk/bugs-fixed/matchop-deref.bad | 2 + contrib/one-true-awk/bugs-fixed/matchop-deref.in | 1 + contrib/one-true-awk/bugs-fixed/matchop-deref.ok | 2 + .../one-true-awk/bugs-fixed/missing-precision.ok | 2 + contrib/one-true-awk/bugs-fixed/negative-nf.ok | 2 + contrib/one-true-awk/bugs-fixed/pfile-overflow.ok | 4 + contrib/one-true-awk/bugs-fixed/rstart-rlength.awk | 10 + contrib/one-true-awk/bugs-fixed/rstart-rlength.ok | 4 + contrib/one-true-awk/bugs-fixed/system-status.awk | 19 + contrib/one-true-awk/bugs-fixed/system-status.bad | 3 + contrib/one-true-awk/bugs-fixed/system-status.ok | 3 + contrib/one-true-awk/bugs-fixed/system-status.ok2 | 3 + .../one-true-awk/bugs-fixed/unicode-fs-rs-1.awk | 6 + contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in | 2 + contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok | 5 + .../one-true-awk/bugs-fixed/unicode-fs-rs-2.awk | 7 + contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in | 2 + contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok | 4 + .../one-true-awk/bugs-fixed/unicode-null-match.awk | 6 + .../one-true-awk/bugs-fixed/unicode-null-match.bad | 1 + .../one-true-awk/bugs-fixed/unicode-null-match.ok | 1 + contrib/one-true-awk/lex.c | 60 +- contrib/one-true-awk/lib.c | 158 ++- contrib/one-true-awk/main.c | 23 +- contrib/one-true-awk/makefile | 8 +- contrib/one-true-awk/maketab.c | 4 +- contrib/one-true-awk/parse.c | 2 +- contrib/one-true-awk/proto.h | 10 +- contrib/one-true-awk/run.c | 942 +++++++++---- contrib/one-true-awk/testdir/Compare.tt | 2 +- contrib/one-true-awk/testdir/REGRESS | 2 +- contrib/one-true-awk/testdir/T.argv | 6 + contrib/one-true-awk/testdir/T.csv | 80 ++ contrib/one-true-awk/testdir/T.flags | 5 +- contrib/one-true-awk/testdir/T.misc | 20 + contrib/one-true-awk/testdir/T.overflow | 2 + contrib/one-true-awk/testdir/T.split | 1 + contrib/one-true-awk/testdir/T.utf | 194 +++ contrib/one-true-awk/testdir/T.utfre | 234 ++++ contrib/one-true-awk/testdir/tt.15 | 2 +- contrib/one-true-awk/tran.c | 26 +- 53 files changed, 3525 insertions(+), 1824 deletions(-) diff --cc contrib/one-true-awk/FIXES.1e index 000000000000,8cbd6ac1a097..8cbd6ac1a097 mode 000000,100644..100644 --- a/contrib/one-true-awk/FIXES.1e +++ b/contrib/one-true-awk/FIXES.1e diff --cc contrib/one-true-awk/README.md index 76ae3d48c983,000000000000..a41fb3c3b128 mode 100644,000000..100644 --- a/contrib/one-true-awk/README.md +++ b/contrib/one-true-awk/README.md @@@ -1,123 -1,0 +1,149 @@@ +# The One True Awk + +This is the version of `awk` described in _The AWK Programming Language_, - by Al Aho, Brian Kernighan, and Peter Weinberger - (Addison-Wesley, 1988, ISBN 0-201-07981-X). ++Second Edition, by Al Aho, Brian Kernighan, and Peter Weinberger ++(Addison-Wesley, 2024, ISBN-13 978-0138269722, ISBN-10 0138269726). ++ ++## What's New? ## ++ ++This version of Awk handles UTF-8 and comma-separated values (CSV) input. ++ ++### Strings ### ++ ++Functions that process strings now count Unicode code points, not bytes; ++this affects `length`, `substr`, `index`, `match`, `split`, ++`sub`, `gsub`, and others. Note that code ++points are not necessarily characters. ++ ++UTF-8 sequences may appear in literal strings and regular expressions. ++Aribtrary characters may be included with `\u` followed by 1 to 8 hexadecimal digits. ++ ++### Regular expressions ### ++ ++Regular expressions may include UTF-8 code points, including `\u`. ++ ++### CSV ### ++ ++The option `--csv` turns on CSV processing of input: ++fields are separated by commas, fields may be quoted with ++double-quote (`"`) characters, quoted fields may contain embedded newlines. ++Double-quotes in fields have to be doubled and enclosed in quoted fields. ++In CSV mode, `FS` is ignored. ++ ++If no explicit separator argument is provided, ++field-splitting in `split` is determined by CSV mode. + +## Copyright + +Copyright (C) Lucent Technologies 1997<br/> +All Rights Reserved + +Permission to use, copy, modify, and distribute this software and +its documentation for any purpose and without fee is hereby +granted, provided that the above copyright notice appear in all +copies and that both that the copyright notice and this +permission notice and warranty disclaimer appear in supporting +documentation, and that the name Lucent Technologies or any of +its entities not be used in advertising or publicity pertaining +to distribution of the software without specific, written prior +permission. + +LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, +INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. +IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY +SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER +IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, +ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF +THIS SOFTWARE. + +## Distribution and Reporting Problems + +Changes, mostly bug fixes and occasional enhancements, are listed +in `FIXES`. If you distribute this code further, please please please +distribute `FIXES` with it. + +If you find errors, please report them - to bwk@cs.princeton.edu. ++to the current maintainer, ozan.yigit@gmail.com. +Please _also_ open an issue in the GitHub issue tracker, to make +it easy to track issues. +Thanks. + +## Submitting Pull Requests + +Pull requests are welcome. Some guidelines: + +* Please do not use functions or facilities that are not standard (e.g., +`strlcpy()`, `fpurge()`). + +* Please run the test suite and make sure that your changes pass before +posting the pull request. To do so: + + 1. Save the previous version of `awk` somewhere in your path. Call it `nawk` (for example). + 1. Run `oldawk=nawk make check > check.out 2>&1`. + 1. Search for `BAD` or `error` in the result. In general, look over it manually to make sure there are no errors. + +* Please create the pull request with a request +to merge into the `staging` branch instead of into the `master` branch. +This allows us to do testing, and to make any additional edits or changes +after the merge but before merging to `master`. + +## Building + +The program itself is created by + + make + +which should produce a sequence of messages roughly like this: + - yacc -d awkgram.y - conflicts: 43 shift/reduce, 85 reduce/reduce - mv y.tab.c ytab.c - mv y.tab.h ytab.h - cc -c ytab.c - cc -c b.c - cc -c main.c - cc -c parse.c - cc maketab.c -o maketab - ./maketab >proctab.c - cc -c proctab.c - cc -c tran.c - cc -c lib.c - cc -c run.c - cc -c lex.c - cc ytab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o -lm ++ bison -d awkgram.y ++ awkgram.y: warning: 44 shift/reduce conflicts [-Wconflicts-sr] ++ awkgram.y: warning: 85 reduce/reduce conflicts [-Wconflicts-rr] ++ awkgram.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o awkgram.tab.o awkgram.tab.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o b.o b.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o main.o main.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o parse.o parse.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 maketab.c -o maketab ++ ./maketab awkgram.tab.h >proctab.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o proctab.o proctab.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o tran.o tran.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o lib.o lib.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o run.o run.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 -c -o lex.o lex.c ++ gcc -g -Wall -pedantic -Wcast-qual -O2 awkgram.tab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o -lm + +This produces an executable `a.out`; you will eventually want to +move this to some place like `/usr/bin/awk`. + +If your system does not have `yacc` or `bison` (the GNU +equivalent), you need to install one of them first. ++The default in the `makefile` is `bison`; you will have ++to edit the `makefile` to use `yacc`. + - NOTE: This version uses ANSI C (C 99), as you should also. We have ++NOTE: This version uses ISO/IEC C99, as you should also. We have +compiled this without any changes using `gcc -Wall` and/or local C +compilers on a variety of systems, but new systems or compilers +may raise some new complaint; reports of difficulties are +welcome. + +This compiles without change on Macintosh OS X using `gcc` and +the standard developer tools. + +You can also use `make CC=g++` to build with the GNU C++ compiler, +should you choose to do so. + - The version of `malloc` that comes with some systems is sometimes - astonishly slow. If `awk` seems slow, you might try fixing that. - More generally, turning on optimization can significantly improve - `awk`'s speed, perhaps by 1/3 for highest levels. - +## A Note About Releases + - We don't do releases. ++We don't usually do releases. + +## A Note About Maintenance + +NOTICE! Maintenance of this program is on a ''best effort'' +basis. We try to get to issues and pull requests as quickly +as we can. Unfortunately, however, keeping this program going +is not at the top of our priority list. + +#### Last Updated + - Sat Jul 25 14:00:07 EDT 2021 ++Mon 05 Feb 2024 08:46:55 IST diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.awk index 000000000000,461e551cfff5..461e551cfff5 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/getline-corruption.awk +++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.awk diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.in index 000000000000,78981922613b..78981922613b mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/getline-corruption.in +++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.in diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.ok index 000000000000,3efb54597c6d..3efb54597c6d mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/getline-corruption.ok +++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.ok diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.awk index 000000000000,000000000000..6c066aad911d new file mode 100644 --- /dev/null +++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.awk @@@ -1,0 -1,0 +1,11 @@@ ++function foo() { ++ return "aaaaaab" ++} ++ ++BEGIN { ++ print match(foo(), "b") ++} ++ ++{ ++ print match(substr($0, 1), "b") ++} diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.bad index 000000000000,000000000000..343ee5c2f6cb new file mode 100644 --- /dev/null +++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.bad @@@ -1,0 -1,0 +1,2 @@@ ++-1 ++-1 diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.in index 000000000000,000000000000..0d197e1b6a30 new file mode 100644 --- /dev/null +++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.in @@@ -1,0 -1,0 +1,1 @@@ ++aaaaaab diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.ok index 000000000000,000000000000..49019db80789 new file mode 100644 --- /dev/null +++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.ok @@@ -1,0 -1,0 +1,2 @@@ ++7 ++7 diff --cc contrib/one-true-awk/bugs-fixed/missing-precision.ok index 000000000000,75e1e3d00446..75e1e3d00446 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/missing-precision.ok +++ b/contrib/one-true-awk/bugs-fixed/missing-precision.ok diff --cc contrib/one-true-awk/bugs-fixed/negative-nf.ok index 000000000000,de97f8b27def..de97f8b27def mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/negative-nf.ok +++ b/contrib/one-true-awk/bugs-fixed/negative-nf.ok diff --cc contrib/one-true-awk/bugs-fixed/pfile-overflow.ok index 000000000000,a0de50f9007f..a0de50f9007f mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok +++ b/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok diff --cc contrib/one-true-awk/bugs-fixed/rstart-rlength.awk index 000000000000,f423f0168be3..f423f0168be3 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/rstart-rlength.awk +++ b/contrib/one-true-awk/bugs-fixed/rstart-rlength.awk diff --cc contrib/one-true-awk/bugs-fixed/rstart-rlength.ok index 000000000000,961cb895b51b..961cb895b51b mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/rstart-rlength.ok +++ b/contrib/one-true-awk/bugs-fixed/rstart-rlength.ok diff --cc contrib/one-true-awk/bugs-fixed/system-status.awk index 000000000000,8daf563e6f4f..8daf563e6f4f mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/system-status.awk +++ b/contrib/one-true-awk/bugs-fixed/system-status.awk diff --cc contrib/one-true-awk/bugs-fixed/system-status.bad index 000000000000,a1317dba54a8..a1317dba54a8 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/system-status.bad +++ b/contrib/one-true-awk/bugs-fixed/system-status.bad diff --cc contrib/one-true-awk/bugs-fixed/system-status.ok index 000000000000,737828f5ed7a..737828f5ed7a mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/system-status.ok +++ b/contrib/one-true-awk/bugs-fixed/system-status.ok diff --cc contrib/one-true-awk/bugs-fixed/system-status.ok2 index 000000000000,000000000000..f1f631e1cb33 new file mode 100644 --- /dev/null +++ b/contrib/one-true-awk/bugs-fixed/system-status.ok2 @@@ -1,0 -1,0 +1,3 @@@ ++normal status 42 ++death by signal status 257 ++death by signal with core dump status 262 diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk index 000000000000,67366ec75070..67366ec75070 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in index 000000000000,2e882af62a2c..2e882af62a2c mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok index 000000000000,f337302be903..f337302be903 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk index 000000000000,34d77bf2c95f..34d77bf2c95f mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in index 000000000000,2de6e718fd3b..2de6e718fd3b mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok index 000000000000,2387001bc1b2..2387001bc1b2 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok +++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.awk index 000000000000,0c056126922b..0c056126922b mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.awk +++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.awk diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.bad index 000000000000,7cd35ff2d932..7cd35ff2d932 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.bad +++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.bad diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.ok index 000000000000,1ac142f8a895..1ac142f8a895 mode 000000,100644..100644 --- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.ok +++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.ok diff --cc contrib/one-true-awk/testdir/T.csv index 000000000000,e0f3d708edaf..e0f3d708edaf mode 000000,100755..100755 --- a/contrib/one-true-awk/testdir/T.csv +++ b/contrib/one-true-awk/testdir/T.csv diff --cc contrib/one-true-awk/testdir/T.utf index 000000000000,18f2b9c355cf..18f2b9c355cf mode 000000,100755..100755 --- a/contrib/one-true-awk/testdir/T.utf +++ b/contrib/one-true-awk/testdir/T.utf diff --cc contrib/one-true-awk/testdir/T.utfre index 000000000000,20e66cbde9a5..20e66cbde9a5 mode 000000,100755..100755 --- a/contrib/one-true-awk/testdir/T.utfre +++ b/contrib/one-true-awk/testdir/T.utfre