[OT?] write C program with UTF16LE

Adam Bozanich abozan01 at ccsf.edu
Mon Mar 15 02:28:18 PST 2004



On Mon, 15 Mar 2004, Zhang Weiwu wrote:

> Hello. Although I write some php/perl script, I don't write C program. Now
> I have a very large text file in UTF16LE format, the rule is strings are
> seperated by numbers. Say
>
> 0300 6100 6200 6300 0400 6700 5400 9800 7400 0300 ....
>
> Leading 0300 means the following 3 characters (6 bytes) is a string, and
> the next 0400 means the following 4 characters makes another string.
>

Here's an example using the fgets function.  see 'man fgets'.  There are
probably a bunch of ways to go about this, but this one is nice and
simple.

#include<stdio.h>

#define CHUNKSIZE 5  /* 4 characters and a space */

/* max number of encoded chars if you are using 2 decimal places for the count*/
#define MAX_CHUNK_COUNT 99

int main(int argc, char **argv) {

    char  delbuf[CHUNKSIZE];
    char  chunks[CHUNKSIZE * MAX_CHUNK_COUNT];

    int  chunk_count;

    while(fgets(delbuf,CHUNKSIZE+1,stdin) != NULL)
    {
        /* you may not want to destroy this */
        delbuf[2] = '\0';
        chunk_count = atoi(delbuf);

        if(fgets(chunks, (CHUNKSIZE * chunk_count) + 1 , stdin) == NULL){
            fprintf(stderr,"can't read all of the string\n");
            break;
        }

        fprintf(stdout,"\n%s",chunks);
    }
    exit(0);
}


This worked for the numbers you gave, but I'm sure that you need to add
some better error handling and what not.  You probably also don't want
to trash the buffer holding the string length.

Try running with this:

./a.out < inputfile > outputfile

> but I am the kind of newbie don't know if I am using glibc at all. When I
> just write
> #include <stdio.h>
> Am i using the stdio.h from glibc?
>

Yes, on FreeBSD you are using GNU's libc

I hope this gives you some ideas, good luck!

-Adam



More information about the freebsd-questions mailing list