Finding slowdowns in pkg_install (continuations of previous threads)

Garrett Cooper youshi10 at
Fri Jul 13 16:02:27 UTC 2007

Garrett Cooper wrote:
> Tim Kientzle wrote:
>>>    -I tried ... buffering ...  the +CONTENTS file parsing function, 
>>> and the
>>> majority of the time it yielded good results ....
>> One approach I prototyped sometime back was to use
>> libarchive in pkg_add as follows:
>>   * Open the archive
>>   * Read +CONTENTS directly into memory (it's
>> guaranteed to always be first in the archive)
>>   * Parse all of +CONTENTS at once
>>   * Continue scanning the archive, disposing
>> of each file as it appears in the archive.
>> Based on my experience with this, I would
>> suggest you just read all of +CONTENTS
>> directly into memory at once and parse
>> the whole thing in a single shot.
>> fopen(), then fstat() to get the size,
>> then allocate a buffer and read the whole
>> thing, then fclose().  You can then
>> parse it all at once.
>> As a bonus, your parser then becomes a nice
>> little bit of reusable code that reads
>> a block of memory and returns a structure describing
>> the package metadata.
>> Tim Kientzle
> I'm not 100% sure because I'm not comparing apples (virtual disk on 
> desktop via VMware) to apples (real disk on server), but I'm showing a 
> 2.5-fold speedup after adding the simple parser:
> Virtual disk:
>        4.42 real         1.37 user         1.47 sys
> Real disk:
>       10.26 real         5.36 user         0.99 sys
> I'll run a battery of tests just to ensure whether or not that's the 
> case.
> Be back with results in a few more days.
> -Garrett
    As promised, here are some results for my work:

    By modifying the parser and heuristics in plist_cmd I appear to have 
decreased all figures (except plist_cmd, which I will note later) from 
their original values to much lower values. The only drawback is that I 
appear to have stimulated a bug with either malloc'ing memory, 
printf/vargs, or transferring large amounts of data via pipes where some 
of my debug messages are making it into plist_cmd(..) from 
obtainbymatch(..), which represents the the 3-fold increase in reported 
plist_cmd(..) iterations.

    I'm going to try replacing the debug commands with standard print 
statements wherever possible, then replace all tar commands with 
libarchive APIs, and see if the problem solves itself.

1. This sample is based off x11-libs/atk.
2. It isn't the final set of results.
3. Graphs coming soon (need to simulate values in Excel on work machine 
and convert to screenshots later on when I have a break -- thinking 
around noon). I'll repost when I have them available.
4. CSV files available at:

