FreeBSD awk behavior change proposal

From: Warner Losh <imp_at_bsdimp.com>
Date: Fri, 9 Jul 2021 02:27:48 -0600
Greetings,

I've posted  https://reviews.freebsd.org/D31114 which eliminates the last
delta we have from upstream one-true-awk. This delta has basically been
rejected by upstream as being a really bad idea. Let me give some
background.

In 2005, FreeBSD changed one-true-awk to honor the locale's collating order.
https://svnweb.freebsd.org/base/head/usr.bin/awk/b.c.diff?annotate=146322&pathrev=201988
This was billed as a temporary patch. It was also compatible with
the then-current behavior of gawk. That temporary patch has lasted 16
years now.

However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular
expressions outside of the "C" and "POSIX" locales to be undefined.

Starting in 2011, gawk 4.0 stopped using the locale for the range
regular expressions and used the traditional behavior only. The
maintainer had grown weary of answering why '[A-Z]' would sometimes
match lower-case expressions. The details about are explained here:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

To restore compatibility with other implementaitons of awk, revert this
patch. FreeBSD is the odd-system out. It also has the nice side effect
of eliminating the last of our differences with upstream one-true-awk.

I'd like to commit the change at least to -current. Ideally, I'd like to MFC
the change. I believe better compatibility with gawk and other awk
implementations justifies this change in behavior because the current
behavior is outside the mainstream enough to be considered a bug.

I'd like to solicit input before I do this, however.

Warner
Received on Fri Jul 09 2021 - 08:27:48 UTC

Original text of this message