[patch] glib20, UTF-8 and string collation

Alexandre "Sunny" Kovalenko alex.kovalenko at verizon.net
Wed Jan 9 20:58:15 PST 2008


On Thu, 2008-01-10 at 13:18 +0900, Alexander Nedotsukov wrote:
> Alexandre,
> 
> The problem you exposed have its roots in Evo code. g_utf8_* stuff 
> defined to work on *utf-8* strings only and have undefined behaviour on 
> MBCS strings. It may sound stupid but crashes are allowed in this case 
> :-) Even we apply your latest patch the true problem solution will be 
> only postponed. We have to continue with Evo source. Find subject parser 
> part and ensure that it will be utf-8 encoded string at the end.
<rant>
While I see this as the noble goal, I fail to understand why asserts are
OK 10 lines above my latest patch, but not in this specific place. Nor
does this latest patch mask any issues in Evo -- you still get
Glib-CRITICAL assert. Hell, you even get assert without the patch if
your CHARSET is ASCII, but core dump if your CHARSET is xx_YY.UTF-8. If
we are talking noble goals here, how about some consistency in the error
handling?
</rant>

On the more productive note: I have picked up more glib programming
today than I had before (or cared to) -- it's easy to improve when you
started from the veritable zero ;) However, it might not be sufficient
to do what needs to be done to Evolution. One thing you can help me
with, is to suggest glib function, I can feed arbitrary string of bytes
(either counted or zero-terminated) and it will tell me whether this is
valid UTF-8. g_print() apparently does this somehow, so there must be a
way.

As promised, I'll crawl back under my rock and wait for rainy weekend to
read some Evolution code.
> 
> All the best,
> Alexander.
> 
BTW: What happened to "no top posting" rule?
> Alexandre "Sunny" Kovalenko wrote:
> > On Wed, 2008-01-09 at 20:16 -0500, Joe Marcus Clarke wrote: 
> >   
> >> On Wed, 2008-01-09 at 19:40 -0500, Alexandre "Sunny" Kovalenko wrote:
> >>     
> >>> On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote: 
> >>>       
> >>>> On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
> >>>>         
> >>>>> I have seen recent commit WRT string collation in devel/glib20 by
> >>>>> marcus, so I have decided to check if there is an interest to fix SEGV
> >>>>> in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate.
> >>>>>           
> >>>> Any commits I have made in the area of UTF-8 are completely accidental.
> >>>> I am not the UTF-8 guy.  Both bland and jylefort have expressed interest
> >>>> in this.  Perhaps one of them will comment.
> >>>>         
> >>> I hope so. Just in case, they would decide to, I have reduced the
> >>> situation to the small program below. I get 
> >>>
> >>> GLib-CRITICAL **: g_convert: assertion `str != NULL' failed
> >>>
> >>> and no core dump from this simple program, whereas Evolution manages to
> >>> pass NULL to strcoll further down in g_utf8_collate and get SEGV for its
> >>> pains.
> >>>       
> >> That sounds like a no-no for Evolution to be dereferencing a NULL
> >> pointer.  Hopefully they'd fix this to prevent the problem.
> >>     
> >
> > It's not Evolution, it is glib, specifically g_utf8_collate, which would
> > call strcoll(3) blindly on the return of g_utf8_normalize inside
> > gunicollate.c. And now, I can get core dumped out of this simple program
> > as well, merely by setting CHARSET=en_US.UTF-8 (I had it is ASCII in the
> > terminal window, which would trigger different path within
> > g_utf8_collate).
> >
> >   
> >>> Conversely, if the answer still is "Evolution should not have done
> >>> that", I will happily crawl back under my rock and keep my patch
> >>> locally.
> >>>       
> >> I can't imagine you're alone in this.  But then again, any Cyrillic mail
> >> that comes my way is always spam, so what do I know.
> >>     
> >
> > More importantly, it is UTF-8 spam -- in order to trigger this, you need
> > KOI8-R or CP1251, and in the sorted column to boot. I suspect that
> > Latin1 or ShiftJIS would do the trick too.
> >
> > Now, how about this: would you be amenable to this Really Harmless(tm)
> > patch, which merely adds error checking along the lines used in the same
> > function, about dozen lines up ;)
> >
> > --- glib/gunicollate.c.B 2008-01-09 20:48:25.000000000 -0500
> > +++ glib/gunicollate.c	2008-01-09 20:49:35.000000000 -0500
> > @@ -166,6 +166,9 @@
> >    str1_norm = g_utf8_normalize (str1, -1, G_NORMALIZE_ALL_COMPOSE);
> >    str2_norm = g_utf8_normalize (str2, -1, G_NORMALIZE_ALL_COMPOSE);
> >  
> > +  g_return_val_if_fail (str1_norm != NULL, 0);
> > +  g_return_val_if_fail (str2_norm != NULL, 0);
> > +
> >    if (g_get_charset (&charset))
> >      {
> >        result = strcoll (str1_norm, str2_norm);
> >
> > I can add it to your files/extra-patch-glib_gunicollate.c, or package 
> > it separately -- I really hate it when I start Evolution after portupgrade
> > to write some E-mails real quick, only to find out that I have forgotten
> > to patch glib... again.
> >
> >   
> 

-- 
Alexandre "Sunny" Kovalenko (Олександр Коваленко)



More information about the freebsd-gnome mailing list