[Bug 264275] sed complaining about trailing backslash when using Umlauts

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 27 Oct 2022 13:43:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264275

Daniel Tameling <tamelingdaniel@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tamelingdaniel@gmail.com

--- Comment #1 from Daniel Tameling <tamelingdaniel@gmail.com> ---
The error comes from trying to compile the umlaut as a regex. I managed to
create a small reproducer that just calls regcomp.

The error seems to come from this snippet in the p_simp_re function in
lib/libc/regex/regcomp.c:

  if ((c & BACKSL) == 0 || may_escape(p, wc))
       ordinary(p, wc);
  else
       SETERROR(REG_EESCAPE);

Both checks in the if statement are false and thus we end up with the trailing
backslash error. In may_escape this is the return statement that gets taken:

  if (isalpha(ch) || ch == '\'' || ch == '`')
      return (false);

ch is the wint_t representation of the umlaut, which is 0xe4. In
de_DE.ISO8859-1, the isalpha call returns true. (If I do it with an UTF8 รค in
an UTF8 locale, ch becomes also 0xe4, but the isalpha call returns false, so
this doesn't trigger the trailing backslash error.)

-- 
You are receiving this mail because:
You are the assignee for the bug.