PERFORCE change 176831 for review

Wed Apr 14 08:00:20 UTC 2010

And yet one more clerical error after I reread this...

On Wed, Apr 14, 2010 at 12:56 AM, Garrett Cooper <gcooper at freebsd.org> wrote:
> On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper <gcooper at freebsd.org> wrote:
>> On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle <kientzle at freebsd.org> wrote:
>>>>> ... I would also put in a simple verification
>>>>> that the entry you found is a regular entry:
>>>>>  archive_entry_filetype(entry) == AE_IFREG
>>>>
>>>> Ok. What does a regular entry correspond to (I assume not a regular file)?
>>>
>>> Sorry, "regular entry" == "regular file" (or at least, what
>>> will become a regular file once you write it to disk, if you
>>> want to be really pedantic about it ;-).  Coincidentally,
>>> AE_IFREG == S_IFREG.  The separate AE_IFxxx constants
>>> are there mostly portability shims (e.g., Windows doesn't
>>> have block devices, so doesn't define S_IFBLK,
>>> but AE_IFBLK is always available).
>>
>> Ok. I think I got everything you were trying to say before, but just
>> to be clear:
>>
>> To prevent race conditions [with regular files only], I should use
>> open(2). For all other files I would use the standard extract method
>> (so I don't get too fancy?). It seems like the file creation times
>> should be sufficiently fast with other file types that this kind of
>> behavior of open(2), blah wouldn't be required.
>>
>>>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly into
>>>>>> exit
>>>>>> + * when doing piped tar commands for copying hierarchies *hint*,
>>>>>> *hint*.
>>>>>
>>>>> Why do you want to copy hierarchies?
>>>>> Seems a waste of disk bandwidth.  *hint* *hint* ;-)
>>>>
>>>> There's a fair amount of tar cpf - -C <dir_a> . | tar xpf - -C <dir_b>
>>>> in libinstall . This is being done to preserve file attributes,
>>>> fflags, modes, ownership, permissions, etc. ... Installing directly to
>>>> hierarchies works in theory as well, but it's
>>>> considerably more dangerous unless you do an information transfer
>>>> between the pkg contents and the install base, and unfortunately that
>>>> is tough to achieve with 100% clarity from what I've seen.
>>>
>>> That's exactly what I was getting to:  Someday, pkg_add
>>> should write the files directly to their final location
>>> and avoid the copy.  (You're exactly right that you
>>> don't want to build a "copy tree" facility.)
>>>
>>> Clearly, direct install is not something you want to
>>> tackle in this stage of the rewrite.  There's a lot of
>>> gains from exactly what you're doing.
>>
>> Yeah, I've been thinking about what you said... and there's ultimately
>> nothing to be gained through a staged install (in fact there's a lot
>> lost). So unless two files overwrite one another (which suggest broken
>> or conflicting packages, or not so clever users), people shouldn't be
>> installing conflicting packages or partially installing multiple
>> versions of the same package unless they're using -f, and if they're
>> doing that, they're basically damning the consequences of doing so.
>>
>> Until we have functional tests in place to actually test this though,
>> I need to stick with a conservative, functional analog to piped
>> tar(1)'ing though.
>>
>>> But I really do believe that single-pass direct
>>> install is feasible and is eventually where we want
>>> to be.  The key insight for me was when Florent recently
>>> pointed out that you could just read the +CONTENTS,
>>> then do a verify pass, then extract everything.
>>> (Any conflict during the extraction pass
>>> would be a fatal error:  delete everything
>>> extracted so far and scream loudly.)
>>
>> Agreed on the final point. I'm kind of interested though about the
>> first point -- the verify pass -- are you verifying that the contents
>> are sane on the disk or in the tarball? If so, why not just verify the
>> contents after the fact [with mtree -- it's has checksumming, mode and
>> permissions checking, you name it... everything minus some special
>> facilities like ACLs, fflags, etc -- I guess I'll need to add those],
>> then roll everything back if it isn't?
>>
>> One the entire operation is atomic and exclusive -- only one package
>> operation is allowed to run at any given time on a particular package,
>> running into this problem with the packaging software will be a
>> non-issue, but instead a problem with an external source (user, etc),
>> or a bad package.
>>
>>> Dealing with dependent packages is still
>>> a little tricky but I think it's all doable.
>>
>> Yeah, there's a crapload of overlap between multiple packages, like
>> cad-salome, and with some packages where you're need to build up a
>> hierarchy that's common between several packages, it becomes a royal
>> pain in the ass, but it's required otherwise stuff won't get setup and
>> torn down properly.
>>
>> So I'm guessing for right now we should just avoid squealing loudly
>> except with regular files; maybe whining a bit with non-regular files
>> would be a good idea though because it would force people to clean up
>> packages, and then things could be completely fixed later on so that
>> they're hard errors.
>>
>>>> I saw that in the original comments and while I think that you have a
>>>> point, I still am leery about someone specifying a package with
>>>> contents that start with + because it immediately breaks the
>>>> assumption. ...
>>>
>>> Unless I'm mistaken, pkg_add has used "+*" to identify
>>> metadata files for a long time, so I think your own
>>> argument from compatibility might be against you here.
>>> (Though I suspect that you're right in the long term that
>>> we should be obeying the +CONTENTS declarations instead of
>>> using filename conventions.)
>>
>> I still want to avoid this potential corner case. Users have an
>> incapability to surprise developers, and I don't want to have to deal
>
> s/in/unerring /
>
>> with this item in the future. A few less CPU cycles processing strings
>> is worth sacrificing over trying to be clever and producing a broken
>> program.
>>
>>>> unpack_file, blah with the appropriate metadata filenames considering
>>>> that it's a lot quicker than having to go through the entire tarball
>>>> end to end (especially if it's a large tarfile like openoffice),
>>>> unless someone unfortunately put the files at the end of the tarball
>>>> (in that case they do need to modify how the package is created).
>>>
>>> As for the speed argument:  Do you really need to
>>> extract the metadata files before you extract
>>> everything else?  Those files are going to a
>>> different place, but that's not an issue with a
>>> libarchive-driven extraction.  At one point, I had
>>> convinced myself that you could do it all in one
>>> pass, just writing the files to different places
>>> depending on whether they were metadata files
>>> (however identified) or not.  I recall, for example,
>>> that the +MTREE file needs to be used only at the
>>> end of the extraction (to fix-up permissions).
>>
>> Yes. pkg_info's primary uses (not -W) read the metadata if the
>> metadata isn't on the disk already in $PKGDBDIR; it's an optimization,
>> and I don't want to reduce performance in pkg_info.
>>
>> Other than that, pkg_install really doesn't care one iota about those
>> files (minus +CONTENTS).

+CONTENTS, +REQUIRE, and +REQUIRED_BY; the other files are used at
various times during the entire lifetime of installing a package
(+INSTALL, +MTREE_DIRS, +POST-DEINSTALL, etc) or inspecting a package
(+COMMENTS, +DESC, etc), but they're all fairly late in the process
and not nearly as critical as +CONTENTS. If +CONTENTS is gone, game
over... you're just dealing with a tarball.

>>>> Thanks again for the comments :)... it would be helpful BTW if the
>>>> manpages linked together because archive_read(3) doesn't reference the
>>>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the
>>>> right APIs to use, and I feel like I'm catching more boots than fish
>>>> sometimes... but it's a learning experience and I don't expect to get
>>>> it right the first time.
>>>
>>> Please let me know any doc shortcomings; the documentation
>>> could use a lot of work, I know.  I also have a growing
>>> Examples page on the libarchive Wiki and conversations
>>> like this help me a lot to better understand what needs
>>> to go there.
>>
>> Will definitely help whenever I can :).
>>
>> Also, getting back to a topic you mentioned earlier, the issue with
>> creating the decompressor multiple times can be alleviated like so:
>>
>> Make the basic extract API follow a format like so:
>>
>> struct package_archive;
>>
>> struct package_archive {
>>    /*
>>     * File descriptor for the archive. Set to NULL if you don't want
>> to reuse the
>>     * descriptor, i.e. want to only use the initializer once.
>>     */
>>    int *pkg_fd;
>>    /*
>>     * Compare a package filename vs an expression.
>>     *
>>     * Could be a small wrapper above fnmatch (for simple glob expressions),
>>     * strncmp (for exact matches -- would be set to PATH_MAX),
>> glob(3), or it could
>>     * always return 0 (for match all). Would return 0 if the files
>> matched, non-zero
>>     * if they didn't match.
>>     *
>>     * This would help us avoid hardcoding and at the small cost of a stack
>>     * push, pop, the additional overhead would be fairly negligible I would
>>     * think, and could potentially be optimized out using a smart compiler.
>>     */
>>    int (*pkg_file_comparator) (const struct *package_archive, const char *);
>> };
>>
>>    I'd need to develop new and free `methods', but that's the general
>> idea I had in mind.
>> Thanks!
>> -Garrett
>>
>