Finding slowdowns in pkg_install (continuations of previous threads)

Garrett Cooper youshi10 at u.washington.edu
Fri Jul 13 16:02:27 UTC 2007


Garrett Cooper wrote:
> Tim Kientzle wrote:
>>>    -I tried ... buffering ...  the +CONTENTS file parsing function, 
>>> and the
>>> majority of the time it yielded good results ....
>>
>> One approach I prototyped sometime back was to use
>> libarchive in pkg_add as follows:
>>   * Open the archive
>>   * Read +CONTENTS directly into memory (it's
>> guaranteed to always be first in the archive)
>>   * Parse all of +CONTENTS at once
>>   * Continue scanning the archive, disposing
>> of each file as it appears in the archive.
>>
>> Based on my experience with this, I would
>> suggest you just read all of +CONTENTS
>> directly into memory at once and parse
>> the whole thing in a single shot.
>> fopen(), then fstat() to get the size,
>> then allocate a buffer and read the whole
>> thing, then fclose().  You can then
>> parse it all at once.
>>
>> As a bonus, your parser then becomes a nice
>> little bit of reusable code that reads
>> a block of memory and returns a structure describing
>> the package metadata.
>>
>> Tim Kientzle
> I'm not 100% sure because I'm not comparing apples (virtual disk on 
> desktop via VMware) to apples (real disk on server), but I'm showing a 
> 2.5-fold speedup after adding the simple parser:
>
> Virtual disk:
>        4.42 real         1.37 user         1.47 sys
>
> Real disk:
>       10.26 real         5.36 user         0.99 sys
>
> I'll run a battery of tests just to ensure whether or not that's the 
> case.
>
> Be back with results in a few more days.
>
> -Garrett
Hello,
    As promised, here are some results for my work:

    By modifying the parser and heuristics in plist_cmd I appear to have 
decreased all figures (except plist_cmd, which I will note later) from 
their original values to much lower values. The only drawback is that I 
appear to have stimulated a bug with either malloc'ing memory, 
printf/vargs, or transferring large amounts of data via pipes where some 
of my debug messages are making it into plist_cmd(..) from 
obtainbymatch(..), which represents the the 3-fold increase in reported 
plist_cmd(..) iterations.

    I'm going to try replacing the debug commands with standard print 
statements wherever possible, then replace all tar commands with 
libarchive APIs, and see if the problem solves itself.

Notes:
1. This sample is based off x11-libs/atk.
2. It isn't the final set of results.
3. Graphs coming soon (need to simulate values in Excel on work machine 
and convert to screenshots later on when I have a break -- thinking 
around noon). I'll repost when I have them available.
4. CSV files available at: 
http://students.washington.edu/youshi10/posted/atk-results.tgz.


More information about the freebsd-ports mailing list