problems with java.util.zip and diacritical characters in file names

Palle Girgensohn girgen at pingpong.net
Thu Jun 10 10:18:03 GMT 2004


I've tried this on Linux, seems to act in the same way. One problem is Java 
converting the entries to unicode (this is NOT done by jazzlib, it seems to 
keep the name in a byte array instead of a String). Anther problem is 
winzip uses the character set cp850 (! I though this was dead for ages...), 
so there really seems to be no hope unless I hack up jazzlib and convert 
the file names somehow?

/Palle

--On Thursday, June 10, 2004 02:25:28 +0200 Palle Girgensohn 
<girgen at pingpong.net> wrote:

> Hi,
>
> Well, the problem is about character sets. A zip file seems to have no
> attribute telling which charset it uses for representing file names. Not
> very surprising.
>
> Java seems to handle this by reading filenames correctly and converting
> them to java Strings (in unicode). But when fetching data, it uses the
> unicode byte sequence to find and fetch the entry, and comes out empty
> handed, the getInputString returns null. I know of no way to tell
> java.util.zip that it should use some other character set?
>
> Hexdumping the resulting zip file, it is obvious that it has used unicode
> in the zip file when saving the file name entries. I'm not sure how
> winzip would react, but I assume it will show them as latin1, i.e. ä ->
> À. While this is really bad for me, since there is no standard I'm not
> quite sure this is wrong?
>
> BTW, there is a plugin pure java implementation on sourceforge,
> <http://jazzlib.sourceforge.net/>. It seems to result in same filenames
> on input and output.
>
> In  (getName): z/
> Out (getName): z/
> In  (getName): z/åäöÅÄÖ.txt
> Out (getName): z/åäöÅÄÖ.txt
> in is null
>
> with java.util.zip, in is null and the file is renamed to same thing but
> in unicode, and is zero bytes in the zip file.
>
> with jazzlib, this seems to work, in is not null and the åäöÅÄÖ.txt file
> is not empty
>
>
> I'm running this in a shell with
> $ echo $LC_ALL
> sv_SE.ISO8859-1
>
> Regards,
> Palle
>
>
> --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis
> <glewis at eyesbeyond.com> wrote:
>
>> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote:
>>> java.util.zip cannot inflate a zip archive that contains eight bit
>>> characters in file names, it simply crashes. I haven't been able to try
>>> it  on ither platforms yet, but I'd like to hear from others who might
>>> have  seen this problem. Odd thing is there is no exception or anything
>>> it just  stops when the first character comes up, and returns null.
>>>
>>> Anyone else seen this? Is it just FreeBSD?
>>
>> If you send a small test programme and zip I can quickly try it on
>> Linux to compare.
>>
>> --
>> Greg Lewis                          Email   : glewis at eyesbeyond.com
>> Eyes Beyond                         Web     : http://www.eyesbeyond.com
>> Information Technology              FreeBSD : glewis at FreeBSD.org
>
>
>






More information about the freebsd-java mailing list