cvs commit: src/lib/libarchive archive_read_support_format_tar.c archive_write_set_format_pax.c src/lib/libarchive/test Makefile test_pax_filename_encoding.c test_pax_filename_encoding.tar.gz.uu

Tim Kientzle kientzle at FreeBSD.org
Fri Mar 14 18:43:59 PDT 2008


kientzle    2008-03-15 01:43:59 UTC

  FreeBSD src repository

  Modified files:
    lib/libarchive       archive_read_support_format_tar.c 
                         archive_write_set_format_pax.c 
    lib/libarchive/test  Makefile 
  Added files:
    lib/libarchive/test  test_pax_filename_encoding.c 
                         test_pax_filename_encoding.tar.gz.uu 
  Log:
  A subtle point: "pax interchange format" mandates that all strings
  (including pathname, gname, uname) be stored in UTF-8.  This usually
  doesn't cause problems on FreeBSD because the "C" locale on FreeBSD
  can convert any byte to Unicode/wchar_t and from there to UTF-8.  In
  other locales (including the "C" locale on Linux which is really
  ASCII), you can get into trouble with pathnames that cannot be
  converted to UTF-8.
  
  Libarchive's pax writer truncated pathnames and other strings at the
  first nonconvertible character.  (ouch!)  Other archivers have worked
  around this by storing unconvertible pathnames as raw binary, a
  practice which has been sanctioned by the Austin group.  However,
  libarchive's pax reader would segfault reading headers that weren't
  proper UTF-8.  (ouch!)  Since bsdtar defaults to pax format, this
  affects bsdtar rather heavily.
  
  To correctly support the new "hdrcharset" header that is going into
  SUS and to handle conversion failures in general, libarchive's pax reader
  and writer have been overhauled fairly extensively.  They used to do
  most of the pax header processing using wchar_t (Unicode); they now do
  most of it using char so that common logic applies to either UTF-8 or
  "binary" strings.
  
  As a bonus, a number of extraneous conversions to/from wchar_t have
  been eliminated, which should speed things up just a tad.
  
  Thanks to: Bjoern Jacke for originally reporting this to me
  Thanks to: Joerg Sonnenberger for noting a bad typo in my first draft of this
  Thanks to: Gunnar Ritter for getting the standard fixed
  MFC after: 5 days
  
  Revision  Changes    Path
  1.67      +240 -209  src/lib/libarchive/archive_read_support_format_tar.c
  1.43      +126 -50   src/lib/libarchive/archive_write_set_format_pax.c
  1.17      +1 -0      src/lib/libarchive/test/Makefile
  1.1       +161 -0    src/lib/libarchive/test/test_pax_filename_encoding.c (new)
  1.1       +10 -0     src/lib/libarchive/test/test_pax_filename_encoding.tar.gz.uu (new)


More information about the cvs-all mailing list