[Bug 264275] sed complaining about trailing backslash when using Umlauts
Date: Thu, 27 Oct 2022 13:43:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264275
Daniel Tameling <tamelingdaniel@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tamelingdaniel@gmail.com
--- Comment #1 from Daniel Tameling <tamelingdaniel@gmail.com> ---
The error comes from trying to compile the umlaut as a regex. I managed to
create a small reproducer that just calls regcomp.
The error seems to come from this snippet in the p_simp_re function in
lib/libc/regex/regcomp.c:
if ((c & BACKSL) == 0 || may_escape(p, wc))
ordinary(p, wc);
else
SETERROR(REG_EESCAPE);
Both checks in the if statement are false and thus we end up with the trailing
backslash error. In may_escape this is the return statement that gets taken:
if (isalpha(ch) || ch == '\'' || ch == '`')
return (false);
ch is the wint_t representation of the umlaut, which is 0xe4. In
de_DE.ISO8859-1, the isalpha call returns true. (If I do it with an UTF8 รค in
an UTF8 locale, ch becomes also 0xe4, but the isalpha call returns false, so
this doesn't trigger the trailing backslash error.)
--
You are receiving this mail because:
You are the assignee for the bug.