How to get the deterministic result for FreeBSD tar(1)?
    Yuri 
    yuri at rawbw.com
       
    Tue Dec  8 10:59:53 UTC 2015
    
    
  
I have two identical directories (no diffs, all identical mtime 
attributes) compressed by this command:
find dir -print0 | LC_ALL=C sort -z | tar cf archive.tgz --format=bsdtar 
--no-recursion --null -T -
The results are different: 3 files out of 10,000 have pax attributes set 
that are different:
- 27 ctime=1449566560.642715
+27 ctime=1449566903.167521
src/contrib/libarchive/archive_write_set_format_by_name.c suggests that 
format=bsdtar should force ARCHIVE_FORMAT_TAR_PAX_RESTRICTED format (no 
attributes), unless need_extension=1 is set on a per-file basis in 
archive_write_set_format_pax.c.
need_extension=1 is triggered by these conditions:
* too long or non-ASCII path
* too long or non-ASCII link
* too large file
* too long GID or UID
* too long or non-ASCII group name or user name
* ACL entries and extended attributes
* sparse info
In my case file hierarchy is indeed very deep, and these three files 
also have the "path" attribute.
I think this is a bug that in archive_write_set_format_pax.c ctime 
attribute is written in case one of the above conditions are satisfied, 
because ctime can't be controlled by the user, and will always cause the 
difference.
So I have two questions:
1. How do I actually achieve the output determinism for tar(1)?
2. Is there an agreement that this is a bug that too long or non-ASCII 
path name triggers the leakage of ctime into a tar file?
Yuri
    
    
More information about the freebsd-hackers
mailing list