Report #4: Unicode support

Dmitry Selyutin at
Mon Jul 21 11:24:15 UTC 2014

Hello everyone,

here comes my report on progress during these two weeks. Pedro, David,
excuse me for duplication, please: I should have just included you
into this letter instead of sending you two letters. I've just
realized that I've forgotten to write the report. :-(

I've been intensively testing my normalization implementation and
discovered that it was working incorrectly. Moreover, it's code seems
to be completely cryptic, so I've rewritten it from the scratch. Now
it seems to work correctly (at least it passes Unicode tests). The
things that I've completely ignored are canonicalization and combining
characters classes. I've decided to publish it in git repo and
integrate it to head later, since it's a real pain to recompile the
entire system every several hours after changes in source code
(especially if changes are not large).

I've also thought about your message where you doubt about project
structure. We'll have `uniext.h' header, which is included if
UNICODE_ADDENDA macro is defined. This header defines the following
functions: strcanon, strcanon_l, wcscanon, strnorm, strnorm_l,
wcsnorm, wcclass. The last one was written as a helper function which
is used inside wcscanon and wcsnorm, but I thought that it also may be
useful as a standalone function.

I've rewritten algorithms: now everithing is performed using binary
search and hashes, so it's really fast (before the search was linear).
Now it works really fast (e.g. for decomposition it works from 10 to
12 times faster than Python's decomposition algorithm). I've also
tested it on the wide strings, and it works as expected (at least!).
So this part seems to be finished. The last thing to do is to place
everything in the right place into the FreeBSD source tree.

Here is my testing repo: Just
use `git clone'.
P.S. You need to use gmake if you want to use my Makefile (I don't
know BSD Makefile syntax well). However, all what you need is to add
`-Iinclude' flag to CFLAGS, compile everithing in `src', compile
`main.c' and link it all together.

With best regards,
Dmitry Selyutin

More information about the soc-status mailing list