Report #4: Unicode support

Mon Jul 21 12:24:54 UTC 2014

On 7/21/14, Dmitry Selyutin <ghostman.sd at gmail.com> wrote:
> Hello everyone,
>
> here comes my report on progress during these two weeks. Pedro, David,
> excuse me for duplication, please: I should have just included you
> into this letter instead of sending you two letters. I've just
> realized that I've forgotten to write the report. :-(
>
> I've been intensively testing my normalization implementation and
> discovered that it was working incorrectly. Moreover, it's code seems
> to be completely cryptic, so I've rewritten it from the scratch. Now
> it seems to work correctly (at least it passes Unicode tests). The
> things that I've completely ignored are canonicalization and combining
> characters classes. I've decided to publish it in git repo and
> integrate it to head later, since it's a real pain to recompile the
> entire system every several hours after changes in source code
> (especially if changes are not large).

Dimitry, take a look at this build script:
http://svnweb.freebsd.org/socsvn/soc2014/op/tools/build_kernel_64bit_dirty.csh?revision=271052&view=co

It defines a DNO_CLEAN make property, so only those file will
rebuilded, which you modified. This speed up the build time.
>
> I've also thought about your message where you doubt about project
> structure. We'll have `uniext.h' header, which is included if
> UNICODE_ADDENDA macro is defined. This header defines the following
> functions: strcanon, strcanon_l, wcscanon, strnorm, strnorm_l,
> wcsnorm, wcclass. The last one was written as a helper function which
> is used inside wcscanon and wcsnorm, but I thought that it also may be
> useful as a standalone function.
>
> I've rewritten algorithms: now everithing is performed using binary
> search and hashes, so it's really fast (before the search was linear).
> Now it works really fast (e.g. for decomposition it works from 10 to
> 12 times faster than Python's decomposition algorithm). I've also
> tested it on the wide strings, and it works as expected (at least!).
> So this part seems to be finished. The last thing to do is to place
> everything in the right place into the FreeBSD source tree.
>
> Here is my testing repo: https://github.com/ghostmansd/uniext. Just
> use `git clone https://github.com/ghostmansd/uniext'.
> P.S. You need to use gmake if you want to use my Makefile (I don't
> know BSD Makefile syntax well). However, all what you need is to add
> `-Iinclude' flag to CFLAGS, compile everithing in `src', compile
> `main.c' and link it all together.
>
> --
> With best regards,
> Dmitry Selyutin
> _______________________________________________
> soc-status at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/soc-status
> To unsubscribe, send any mail to "soc-status-unsubscribe at freebsd.org"
>