problems with java.util.zip and diacritical characters in file names

Palle Girgensohn girgen at pingpong.net
Thu Jun 10 00:25:42 GMT 2004


Hi,

Well, the problem is about character sets. A zip file seems to have no 
attribute telling which charset it uses for representing file names. Not 
very surprising.

Java seems to handle this by reading filenames correctly and converting 
them to java Strings (in unicode). But when fetching data, it uses the 
unicode byte sequence to find and fetch the entry, and comes out empty 
handed, the getInputString returns null. I know of no way to tell 
java.util.zip that it should use some other character set?

Hexdumping the resulting zip file, it is obvious that it has used unicode 
in the zip file when saving the file name entries. I'm not sure how winzip 
would react, but I assume it will show them as latin1, i.e. ä -> À. While 
this is really bad for me, since there is no standard I'm not quite sure 
this is wrong?

BTW, there is a plugin pure java implementation on sourceforge, 
<http://jazzlib.sourceforge.net/>. It seems to result in same filenames on 
input and output.

In  (getName): z/
Out (getName): z/
In  (getName): z/åäöÅÄÖ.txt
Out (getName): z/åäöÅÄÖ.txt
in is null

with java.util.zip, in is null and the file is renamed to same thing but in 
unicode, and is zero bytes in the zip file.

with jazzlib, this seems to work, in is not null and the åäöÅÄÖ.txt file is 
not empty


I'm running this in a shell with
$ echo $LC_ALL
sv_SE.ISO8859-1

Regards,
Palle


--On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis 
<glewis at eyesbeyond.com> wrote:

> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote:
>> java.util.zip cannot inflate a zip archive that contains eight bit
>> characters in file names, it simply crashes. I haven't been able to try
>> it  on ither platforms yet, but I'd like to hear from others who might
>> have  seen this problem. Odd thing is there is no exception or anything
>> it just  stops when the first character comes up, and returns null.
>>
>> Anyone else seen this? Is it just FreeBSD?
>
> If you send a small test programme and zip I can quickly try it on
> Linux to compare.
>
> --
> Greg Lewis                          Email   : glewis at eyesbeyond.com
> Eyes Beyond                         Web     : http://www.eyesbeyond.com
> Information Technology              FreeBSD : glewis at FreeBSD.org



-------------- next part --------------
import java.io.*;
import java.util.*;
import java.util.zip.*;
//import net.sf.jazzlib.*;


/**
   Text a zip file. run as "java ZipText infile.zip filetocreate.zip"
*/

public class ZipTest {

  public static void main(String[] args) {
    try {
      ZipFile zipIn = new ZipFile(args[0]);
      ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(args[1]));

      Enumeration inFiles = zipIn.entries();

      while(inFiles.hasMoreElements()) {
	ZipEntry inEntry = (ZipEntry) inFiles.nextElement();
	System.out.print("In  (getName): ");
	System.out.println(inEntry.getName());
	
	ZipEntry outEntry = new ZipEntry(inEntry.getName());
	System.out.print("Out (getName): ");
	System.out.println(outEntry.getName());
	zipOut.putNextEntry(outEntry);

	if (inEntry.isDirectory()) { continue; }

	copy(zipIn.getInputStream(inEntry), zipOut);
	zipOut.closeEntry();
      }
      zipOut.close();
      zipIn.close();

    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  private static void copy(InputStream in, OutputStream out) 
    throws IOException {
    if (in == null) { System.out.println("in is null"); return ; }
    synchronized (in) {
      synchronized (out) {
        byte[] buffer = new byte[2048];
        while(true) {
          int bytesRead = in.read(buffer);
          if (bytesRead == -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  }
}


More information about the freebsd-java mailing list