pax misbehavior

Andriy Gapon avg at icyb.net.ua
Wed Oct 10 03:57:12 PDT 2007


Sorry for top-posting, but I am replying to myself and the context is
rather lengthy.

It seems the issue is that our pax has an internal heuristic to apply -s
transformations not only to file names, but to hard- and sym- link
targets also.

On one hand this seems to be beneficial, on the other hand this can lead
to some confusion, because symlink targets can be relative and their
pathnames can match quite unexpected patterns as compared to normal file
pathnames. What makes this behavior is even less obvious to understand
is that if link target is transformed into an an empty string then link
is omitted altogether. This, of course, makes certain sense: there can
not be a link without any target at all. On the other hand, POSIX
explicitly gives one and only one reason to omit a file - when its
_name_ is transformed to empty string. So this looks like a POSIX
violation and unexpected behavior.

I have several proposals on fixing this situation:
1. since link target modifying behavior is something that POSIX is
silent about then it seems to be an extension and it would be nice to
provide extended options to turn on/off (and maybe control some aspects
of) this behavior.  AIX pax, for instance, doesn't do that. Solaris and
Linux seem to have the same behavior.
2. I think that regardless if #1 is implemented pax man page should
describe this behavior and even warn about it.
3. symlink target modification heuristic may be updated to exclude the
most trivial and probably widespread case of symlinks into the same
directory, i.e. its target doesn't contain any '/'.
4. symlink target modification heuristic may be updated to leave link
target alone if its substitution results in empty string (rather than
throwing the symlink out as it is done now).

There is, of course, a workaround for my particular case which is to
never use kill-all substitution -s '#.*##', but instead to explicitly
list all archive hierarchies roots like -s '#^root1/.*##' -s
'#^root2/.*##' ...
But even then there might be some unpleasant and hard-to-debug surprises
with other patterns being misapplied where no one expected them to be
applied.

on 20/09/2007 19:09 Andriy Gapon said the following:
> Preparation first:
> $ mkdir xxxxx
> $ cd xxxxx/
> $ touch yyyyy
> $ ln -s yyyyy yyyyy.0
> $ ln -s yyyyy.0 yyyyy.0.0
> $ cd ..
> 
> Demonstration of expected behavior:
> $ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" xxxxx
> $ pax -vf xxxxx.tar
> drwxr-xr-x  2 ...    0 20 Sep 18:51 zzzzz
> -rw-r--r--  1 ...    0 20 Sep 18:51 zzzzz/yyyyy
> lrwxr-xr-x  1 ...    0 20 Sep 18:51 zzzzz/yyyyy.0 => yyyyy
> lrwxr-xr-x  1 ...    0 20 Sep 18:51 zzzzz/yyyyy.0.0 => yyyyy.0
> pax: ustar vol 1, 4 files, 10240 bytes read, 0 bytes written.
> 
> Demonstration of misbehavior:
> $ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" -s "#.*##" xxxxx
> $ pax -vf xxxxx.tar
> drwxr-xr-x  2 ...    0 20 Sep 18:51 zzzzz
> -rw-r--r--  1 ...    0 20 Sep 18:51 zzzzz/yyyyy
> pax: ustar vol 1, 2 files, 10240 bytes read, 0 bytes written.
> 
> 
> The only thing added in the second test is -s "#.*##" option _after_ the
> first -s option. Mysteriously it caused all symlinks to not be included
> into an archive. But this should not happen if the behavior in the first
> test is correct and pax follows POSIX specification: if an entry is
> handled by the first -s (which it was in the first test), then further
> -s options should not be applied to it. Our man page also says it:
> 
>    Multiple -s expressions can be specified.  The
>    expressions are applied in the order they are specified on the com-
>    mand line, terminating with the first successful substitution.
> 
> Of course, this synthetic test is a simplification of something done for
> a real task with a real purpose. -s "#.*##" is meant to exclude from an
> archive all "other" files and the side-effect of excluding symlinks as
> well is very unfortunate.
> 
> Should I file a PR ?
> 


-- 
Andriy Gapon


More information about the freebsd-stable mailing list