Uppercase RE matching problems in FreeBSD 11

Greg Rivers gcr+freebsd-stable at tharned.org
Sun Nov 6 01:23:35 UTC 2016


I happened to run an old script today that uses sed(1) to extract the 
system boot time from the kern.boottime sysctl MIB. On 11.0 this no longer 
works as expected:

$ sysctl kern.boottime
kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov  5 16:18:34 2016
$ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/'
v  5 16:18:34 2016

sed passes over 'S' and 'N' until it hits 'v', which it considers 
uppercase apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it 
works as expected:

$ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/'
Nov  5 16:18:34 2016

Testing every lowercase character separately gives even more inconsistent 
results:

$ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
> p
> q
> r
> s
> t
> u
> v
> w
> x
> y
> z
> !
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

Here sed thinks every lowercase character except for 'a' is uppercase! 
This differs from the first test where sed did not think 'o' is uppercase. 
Again, the above behaves as expected with LANG=C.

Does anyone have any insight into this? This is likely to break a lot of 
existing code.

-- 
Greg


More information about the freebsd-stable mailing list