sed not working

Rein Kadastik wigry at uninet.ee
Mon Sep 5 04:35:45 PDT 2005


Tim Robbins wrote:

> Rein Kadastik wrote:
>
>> Giorgos Keramidas wrote:
>>
>>> On 2005-09-03 14:17, Rein Kadastik <wigry at uninet.ee> wrote:
>>>  
>>>
>>>> Rein Kadastik wrote:
>>>>  
>>>>
>>>>> Well I have one guess here. In estonian alphabet, the z comes
>>>>> immediately after s and before t. So as the regex orders [a-z] the
>>>>> characters t, u, v, w, x, y are left out
>>>>>
>>>>> How to order the sed to use english alphabet?
>>>>>     
>>>>
>>>>
>>>> Well, My guess was right. I have a following line in the /etc/profile:
>>>>
>>>> export LANG=et_EE.ISO8859-15
>>>>
>>>> After I expoerted LANG=en_US.ISO8859-1, the sed started to work.
>>>>
>>>> I did not thought that LANG parameter will also alter the alfabet and
>>>> therefore the expression [a-z] does not cover the full alphabet 
>>>> anymore.
>>>>   
>>>
>>>
>>>
>>> By using a character class:
>>>
>>>     [[:alpha:]]
>>>
>>> AFAIK, if you are using non-English locales, there's no guarantee that
>>> [a-z] will be the entire set of lowercase letters, or that it will only
>>> include lowercase letters, for that matter.
>>>
>>> _______________________________________________
>>> freebsd-hackers at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to 
>>> "freebsd-hackers-unsubscribe at freebsd.org"
>>>
>>>  
>>>
>> Yep, I know but it does not matter. The form [a-z] is used all over 
>> the place in the FreeBSD source (1629 lines in 4.11-RELEASE-p11 and 
>> almost 1600 in 5-STABLE). Totally hopeless. Seems, that no developer 
>> have ever heard about character classes and it VERY UNSAFE to try to 
>> compile (and actually even run) FreeBSD with some other locale than 
>> C/en_US.ISO8859-1.
>>
>> I actually searched for existance of character classes in source 
>> code. Found around 30 matches. Mostly in manual pages. Perl configure 
>> script checks if tr supports them, but it actually never uses the 
>> featuire (even if available).
>>
>> I am totally dissappointed about this. I thought about reporting a 
>> bug, but as it is everywhere, there is no point to do so.
>
>
> I think you're blowing things out of proportion. Providing that you 
> build world as root (which most people do), and that you don't change 
> the LANG setting for root (think single-user mode), the following 
> command will give you an approximate idea of which utilities are 
> affected:
> $ find /usr/src -name \*.c | xargs grep -e '".*a-z' -e '".*A-Z'
>      25
>
> Of these 25 hits, about half are in comments or test code that is 
> never built. The utilities that are genuinely affected are: kbdmap, 
> scon, ppp (when using ATM), m4 (in GNU compatibility mode), fdisk, 
> named, cvs, diff and vi.
>
> Tim
>
Well not quite. For starters, the modules that fail for my buildworld 
are ncurses, csh/tcsh and gdb (interesting that so few as the problem 
itself is way bigger). Secondly there are not 25 results but a bit more 
(most of the regex'es are not in .c files). Third, I already sent email 
to Ruslan and am waiting fore a response. I am fully aware of the size 
of such a project and quite willing to try to make things better.

And BTW my systemwide LANG is set to et_EE.ISO8859-15 which I personally 
like. As the system provides localization functionality, it must handle 
it in every situation apropriately (which is not the case right now).

Peace
-- Rein

Rein


More information about the freebsd-hackers mailing list