git: f32a6403d346 - main - Merge one true awk from 2024-01-22 for the Awk Second Edition support

From: Warner Losh <imp_at_FreeBSD.org>
Date: Thu, 29 Feb 2024 17:46:20 UTC
The branch main has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=f32a6403d34654ac6e61182d09abb5e85850e1ee

commit f32a6403d34654ac6e61182d09abb5e85850e1ee
Merge: 73157ce4982e e8a605e129c6
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-02-28 15:16:16 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-02-29 17:42:06 +0000

    Merge one true awk from 2024-01-22 for the Awk Second Edition support
    
    This brings in Unicode support, CSV support and a number of bug fixes.
    They are described in _The AWK Programming Language_, Second Edition, by
    Al Aho, Brian Kernighan, and Peter Weinberger (Addison-Wesley, 2024,
    ISBN-13 978-0138269722, ISBN-10 0138269726).
    
    Sponsored by:           Netflix

 contrib/one-true-awk/FIXES                         | 1429 ++------------------
 contrib/one-true-awk/FIXES.1e                      | 1429 ++++++++++++++++++++
 contrib/one-true-awk/README.md                     |   80 +-
 contrib/one-true-awk/awk.1                         |   34 +-
 contrib/one-true-awk/awk.h                         |   23 +-
 contrib/one-true-awk/awkgram.y                     |   49 +-
 contrib/one-true-awk/b.c                           |  409 ++++--
 contrib/one-true-awk/bugs-fixed/REGRESS            |    8 +-
 .../one-true-awk/bugs-fixed/getline-corruption.awk |    5 +
 .../one-true-awk/bugs-fixed/getline-corruption.in  |    1 +
 .../one-true-awk/bugs-fixed/getline-corruption.ok  |    1 +
 contrib/one-true-awk/bugs-fixed/matchop-deref.awk  |   11 +
 contrib/one-true-awk/bugs-fixed/matchop-deref.bad  |    2 +
 contrib/one-true-awk/bugs-fixed/matchop-deref.in   |    1 +
 contrib/one-true-awk/bugs-fixed/matchop-deref.ok   |    2 +
 .../one-true-awk/bugs-fixed/missing-precision.ok   |    2 +
 contrib/one-true-awk/bugs-fixed/negative-nf.ok     |    2 +
 contrib/one-true-awk/bugs-fixed/pfile-overflow.ok  |    4 +
 contrib/one-true-awk/bugs-fixed/rstart-rlength.awk |   10 +
 contrib/one-true-awk/bugs-fixed/rstart-rlength.ok  |    4 +
 contrib/one-true-awk/bugs-fixed/system-status.awk  |   19 +
 contrib/one-true-awk/bugs-fixed/system-status.bad  |    3 +
 contrib/one-true-awk/bugs-fixed/system-status.ok   |    3 +
 contrib/one-true-awk/bugs-fixed/system-status.ok2  |    3 +
 .../one-true-awk/bugs-fixed/unicode-fs-rs-1.awk    |    6 +
 contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in |    2 +
 contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok |    5 +
 .../one-true-awk/bugs-fixed/unicode-fs-rs-2.awk    |    7 +
 contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in |    2 +
 contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok |    4 +
 .../one-true-awk/bugs-fixed/unicode-null-match.awk |    6 +
 .../one-true-awk/bugs-fixed/unicode-null-match.bad |    1 +
 .../one-true-awk/bugs-fixed/unicode-null-match.ok  |    1 +
 contrib/one-true-awk/lex.c                         |   60 +-
 contrib/one-true-awk/lib.c                         |  158 ++-
 contrib/one-true-awk/main.c                        |   23 +-
 contrib/one-true-awk/makefile                      |    8 +-
 contrib/one-true-awk/maketab.c                     |    4 +-
 contrib/one-true-awk/parse.c                       |    2 +-
 contrib/one-true-awk/proto.h                       |   10 +-
 contrib/one-true-awk/run.c                         |  942 +++++++++----
 contrib/one-true-awk/testdir/Compare.tt            |    2 +-
 contrib/one-true-awk/testdir/REGRESS               |    2 +-
 contrib/one-true-awk/testdir/T.argv                |    6 +
 contrib/one-true-awk/testdir/T.csv                 |   80 ++
 contrib/one-true-awk/testdir/T.flags               |    5 +-
 contrib/one-true-awk/testdir/T.misc                |   20 +
 contrib/one-true-awk/testdir/T.overflow            |    2 +
 contrib/one-true-awk/testdir/T.split               |    1 +
 contrib/one-true-awk/testdir/T.utf                 |  194 +++
 contrib/one-true-awk/testdir/T.utfre               |  234 ++++
 contrib/one-true-awk/testdir/tt.15                 |    2 +-
 contrib/one-true-awk/tran.c                        |   26 +-
 53 files changed, 3525 insertions(+), 1824 deletions(-)

diff --cc contrib/one-true-awk/FIXES.1e
index 000000000000,8cbd6ac1a097..8cbd6ac1a097
mode 000000,100644..100644
--- a/contrib/one-true-awk/FIXES.1e
+++ b/contrib/one-true-awk/FIXES.1e
diff --cc contrib/one-true-awk/README.md
index 76ae3d48c983,000000000000..a41fb3c3b128
mode 100644,000000..100644
--- a/contrib/one-true-awk/README.md
+++ b/contrib/one-true-awk/README.md
@@@ -1,123 -1,0 +1,149 @@@
 +# The One True Awk
 +
 +This is the version of `awk` described in _The AWK Programming Language_,
- by Al Aho, Brian Kernighan, and Peter Weinberger
- (Addison-Wesley, 1988, ISBN 0-201-07981-X).
++Second Edition, by Al Aho, Brian Kernighan, and Peter Weinberger
++(Addison-Wesley, 2024, ISBN-13 978-0138269722, ISBN-10 0138269726).
++
++## What's New? ##
++
++This version of Awk handles UTF-8 and comma-separated values (CSV) input.
++
++### Strings ###
++
++Functions that process strings now count Unicode code points, not bytes;
++this affects `length`, `substr`, `index`, `match`, `split`,
++`sub`, `gsub`, and others.  Note that code
++points are not necessarily characters.
++
++UTF-8 sequences may appear in literal strings and regular expressions.
++Aribtrary characters may be included with `\u` followed by 1 to 8 hexadecimal digits.
++
++### Regular expressions ###
++
++Regular expressions may include UTF-8 code points, including `\u`.
++
++### CSV ###
++
++The option `--csv` turns on CSV processing of input:
++fields are separated by commas, fields may be quoted with
++double-quote (`"`) characters, quoted fields may contain embedded newlines.
++Double-quotes in fields have to be doubled and enclosed in quoted fields.
++In CSV mode, `FS` is ignored.
++
++If no explicit separator argument is provided,
++field-splitting in `split` is determined by CSV mode.
 +
 +## Copyright
 +
 +Copyright (C) Lucent Technologies 1997<br/>
 +All Rights Reserved
 +
 +Permission to use, copy, modify, and distribute this software and
 +its documentation for any purpose and without fee is hereby
 +granted, provided that the above copyright notice appear in all
 +copies and that both that the copyright notice and this
 +permission notice and warranty disclaimer appear in supporting
 +documentation, and that the name Lucent Technologies or any of
 +its entities not be used in advertising or publicity pertaining
 +to distribution of the software without specific, written prior
 +permission.
 +
 +LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
 +INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
 +IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
 +SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
 +IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
 +ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
 +THIS SOFTWARE.
 +
 +## Distribution and Reporting Problems
 +
 +Changes, mostly bug fixes and occasional enhancements, are listed
 +in `FIXES`.  If you distribute this code further, please please please
 +distribute `FIXES` with it.
 +
 +If you find errors, please report them
- to bwk@cs.princeton.edu.
++to the current maintainer, ozan.yigit@gmail.com.
 +Please _also_ open an issue in the GitHub issue tracker, to make
 +it easy to track issues.
 +Thanks.
 +
 +## Submitting Pull Requests
 +
 +Pull requests are welcome. Some guidelines:
 +
 +* Please do not use functions or facilities that are not standard (e.g.,
 +`strlcpy()`, `fpurge()`).
 +
 +* Please run the test suite and make sure that your changes pass before
 +posting the pull request. To do so:
 +
 +  1. Save the previous version of `awk` somewhere in your path. Call it `nawk` (for example).
 +  1. Run `oldawk=nawk make check > check.out 2>&1`.
 +  1. Search for `BAD` or `error` in the result. In general, look over it manually to make sure there are no errors.
 +
 +* Please create the pull request with a request
 +to merge into the `staging` branch instead of into the `master` branch.
 +This allows us to do testing, and to make any additional edits or changes
 +after the merge but before merging to `master`.
 +
 +## Building
 +
 +The program itself is created by
 +
 +	make
 +
 +which should produce a sequence of messages roughly like this:
 +
- 	yacc -d awkgram.y
- 	conflicts: 43 shift/reduce, 85 reduce/reduce
- 	mv y.tab.c ytab.c
- 	mv y.tab.h ytab.h
- 	cc -c ytab.c
- 	cc -c b.c
- 	cc -c main.c
- 	cc -c parse.c
- 	cc maketab.c -o maketab
- 	./maketab >proctab.c
- 	cc -c proctab.c
- 	cc -c tran.c
- 	cc -c lib.c
- 	cc -c run.c
- 	cc -c lex.c
- 	cc ytab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o -lm
++	bison -d  awkgram.y
++	awkgram.y: warning: 44 shift/reduce conflicts [-Wconflicts-sr]
++	awkgram.y: warning: 85 reduce/reduce conflicts [-Wconflicts-rr]
++	awkgram.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o awkgram.tab.o awkgram.tab.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o b.o b.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o main.o main.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o parse.o parse.c
++	gcc -g -Wall -pedantic -Wcast-qual -O2 maketab.c -o maketab
++	./maketab awkgram.tab.h >proctab.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o proctab.o proctab.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o tran.o tran.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o lib.o lib.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o run.o run.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2   -c -o lex.o lex.c
++	gcc -g -Wall -pedantic -Wcast-qual   -O2 awkgram.tab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o   -lm
 +
 +This produces an executable `a.out`; you will eventually want to
 +move this to some place like `/usr/bin/awk`.
 +
 +If your system does not have `yacc` or `bison` (the GNU
 +equivalent), you need to install one of them first.
++The default in the `makefile` is `bison`; you will have
++to edit the `makefile` to use `yacc`.
 +
- NOTE: This version uses ANSI C (C 99), as you should also.  We have
++NOTE: This version uses ISO/IEC C99, as you should also.  We have
 +compiled this without any changes using `gcc -Wall` and/or local C
 +compilers on a variety of systems, but new systems or compilers
 +may raise some new complaint; reports of difficulties are
 +welcome.
 +
 +This compiles without change on Macintosh OS X using `gcc` and
 +the standard developer tools.
 +
 +You can also use `make CC=g++` to build with the GNU C++ compiler,
 +should you choose to do so.
 +
- The version of `malloc` that comes with some systems is sometimes
- astonishly slow.  If `awk` seems slow, you might try fixing that.
- More generally, turning on optimization can significantly improve
- `awk`'s speed, perhaps by 1/3 for highest levels.
- 
 +## A Note About Releases
 +
- We don't do releases. 
++We don't usually do releases.
 +
 +## A Note About Maintenance
 +
 +NOTICE! Maintenance of this program is on a ''best effort''
 +basis.  We try to get to issues and pull requests as quickly
 +as we can.  Unfortunately, however, keeping this program going
 +is not at the top of our priority list.
 +
 +#### Last Updated
 +
- Sat Jul 25 14:00:07 EDT 2021
++Mon 05 Feb 2024 08:46:55 IST
diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.awk
index 000000000000,461e551cfff5..461e551cfff5
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/getline-corruption.awk
+++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.awk
diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.in
index 000000000000,78981922613b..78981922613b
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/getline-corruption.in
+++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.in
diff --cc contrib/one-true-awk/bugs-fixed/getline-corruption.ok
index 000000000000,3efb54597c6d..3efb54597c6d
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/getline-corruption.ok
+++ b/contrib/one-true-awk/bugs-fixed/getline-corruption.ok
diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.awk
index 000000000000,000000000000..6c066aad911d
new file mode 100644
--- /dev/null
+++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.awk
@@@ -1,0 -1,0 +1,11 @@@
++function foo() {
++	return "aaaaaab"
++}
++
++BEGIN { 
++	print match(foo(), "b")
++}
++
++{
++	print match(substr($0, 1), "b")     
++}
diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.bad
index 000000000000,000000000000..343ee5c2f6cb
new file mode 100644
--- /dev/null
+++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.bad
@@@ -1,0 -1,0 +1,2 @@@
++-1
++-1
diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.in
index 000000000000,000000000000..0d197e1b6a30
new file mode 100644
--- /dev/null
+++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.in
@@@ -1,0 -1,0 +1,1 @@@
++aaaaaab
diff --cc contrib/one-true-awk/bugs-fixed/matchop-deref.ok
index 000000000000,000000000000..49019db80789
new file mode 100644
--- /dev/null
+++ b/contrib/one-true-awk/bugs-fixed/matchop-deref.ok
@@@ -1,0 -1,0 +1,2 @@@
++7
++7
diff --cc contrib/one-true-awk/bugs-fixed/missing-precision.ok
index 000000000000,75e1e3d00446..75e1e3d00446
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/missing-precision.ok
+++ b/contrib/one-true-awk/bugs-fixed/missing-precision.ok
diff --cc contrib/one-true-awk/bugs-fixed/negative-nf.ok
index 000000000000,de97f8b27def..de97f8b27def
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/negative-nf.ok
+++ b/contrib/one-true-awk/bugs-fixed/negative-nf.ok
diff --cc contrib/one-true-awk/bugs-fixed/pfile-overflow.ok
index 000000000000,a0de50f9007f..a0de50f9007f
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok
+++ b/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok
diff --cc contrib/one-true-awk/bugs-fixed/rstart-rlength.awk
index 000000000000,f423f0168be3..f423f0168be3
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/rstart-rlength.awk
+++ b/contrib/one-true-awk/bugs-fixed/rstart-rlength.awk
diff --cc contrib/one-true-awk/bugs-fixed/rstart-rlength.ok
index 000000000000,961cb895b51b..961cb895b51b
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/rstart-rlength.ok
+++ b/contrib/one-true-awk/bugs-fixed/rstart-rlength.ok
diff --cc contrib/one-true-awk/bugs-fixed/system-status.awk
index 000000000000,8daf563e6f4f..8daf563e6f4f
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/system-status.awk
+++ b/contrib/one-true-awk/bugs-fixed/system-status.awk
diff --cc contrib/one-true-awk/bugs-fixed/system-status.bad
index 000000000000,a1317dba54a8..a1317dba54a8
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/system-status.bad
+++ b/contrib/one-true-awk/bugs-fixed/system-status.bad
diff --cc contrib/one-true-awk/bugs-fixed/system-status.ok
index 000000000000,737828f5ed7a..737828f5ed7a
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/system-status.ok
+++ b/contrib/one-true-awk/bugs-fixed/system-status.ok
diff --cc contrib/one-true-awk/bugs-fixed/system-status.ok2
index 000000000000,000000000000..f1f631e1cb33
new file mode 100644
--- /dev/null
+++ b/contrib/one-true-awk/bugs-fixed/system-status.ok2
@@@ -1,0 -1,0 +1,3 @@@
++normal status 42
++death by signal status 257
++death by signal with core dump status 262
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk
index 000000000000,67366ec75070..67366ec75070
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.awk
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in
index 000000000000,2e882af62a2c..2e882af62a2c
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.in
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok
index 000000000000,f337302be903..f337302be903
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-1.ok
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk
index 000000000000,34d77bf2c95f..34d77bf2c95f
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.awk
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in
index 000000000000,2de6e718fd3b..2de6e718fd3b
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.in
diff --cc contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok
index 000000000000,2387001bc1b2..2387001bc1b2
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok
+++ b/contrib/one-true-awk/bugs-fixed/unicode-fs-rs-2.ok
diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.awk
index 000000000000,0c056126922b..0c056126922b
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.awk
+++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.awk
diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.bad
index 000000000000,7cd35ff2d932..7cd35ff2d932
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.bad
+++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.bad
diff --cc contrib/one-true-awk/bugs-fixed/unicode-null-match.ok
index 000000000000,1ac142f8a895..1ac142f8a895
mode 000000,100644..100644
--- a/contrib/one-true-awk/bugs-fixed/unicode-null-match.ok
+++ b/contrib/one-true-awk/bugs-fixed/unicode-null-match.ok
diff --cc contrib/one-true-awk/testdir/T.csv
index 000000000000,e0f3d708edaf..e0f3d708edaf
mode 000000,100755..100755
--- a/contrib/one-true-awk/testdir/T.csv
+++ b/contrib/one-true-awk/testdir/T.csv
diff --cc contrib/one-true-awk/testdir/T.utf
index 000000000000,18f2b9c355cf..18f2b9c355cf
mode 000000,100755..100755
--- a/contrib/one-true-awk/testdir/T.utf
+++ b/contrib/one-true-awk/testdir/T.utf
diff --cc contrib/one-true-awk/testdir/T.utfre
index 000000000000,20e66cbde9a5..20e66cbde9a5
mode 000000,100755..100755
--- a/contrib/one-true-awk/testdir/T.utfre
+++ b/contrib/one-true-awk/testdir/T.utfre