freebsd-update's install_verify routine excessive stating

Oliver Fromme olli at
Thu Jan 22 04:17:46 PST 2009


I've Cc'ed Colin Percival, the author of freebsd-update.

Bert JW Regeer wrote:
 > Recently I decided it would be time to upgrade an older laptop that  
 > was still running 6.2-RELEASE to a more recent release 7.1-RELEASE
 > [...]
 > Everything went well, including the kernel update. It was not until  
 > after I rebooted and ran:
 > ./ -f freebsd-update.conf install
 > That I started noticing something rather weird. It had been running  
 > for a good 30 minutes before I started wondering what was going on.  
 > top gave me nothing of value, other than that all of the time was  
 > spent running sh, and that it was spending all of its time in system,  
 > not user where I would have expected it. Thinking went wrong I stopped  
 > the process and started it again.
 > After a ktrace and kdump I found the culprit in install_verify in the  
 > freebsd-update utility, it would read one line, and then run stat on  
 > one of the many files that was listed in /usr/upgrade/files/.
 > install_verify () {
 >          # Generate a list of hashes
 >          cat $@ |

That should be "$@" with double quotes.  $@ doesn't make
sense without the quotes.  Apart from that, it's a typical
case of "useless use of cat".

 >              cut -f 2,7 -d '|' |
 >              grep -E '^f' |
 >              cut -f 2 -d '|' |
 >              sort -u > filelist

It's unclear why there are two "cut" commands.  The 7th
field isn't used at all.  Also, the -E option to grep
is superfluous here.  Finally, using awk might be more
efficient than cut + grep.

So I would suggest to replace the whole pipe with this:

   awk -F "|" '$2 ~ /^f/ {print $2}' "$@" |
   sort -u > filelist

 >          # Make sure all the hashes exist
 >          while read HASH; do
 >                  if ! [ -f files/${HASH}.gz ]; then
 >                          echo -n "Update files missing -- "
 >                          echo "this should never happen."
 >                          echo "Re-run '$0 fetch'."
 >                          return 1
 >                  fi
 >          done < filelist
 >          # Clean up
 >          rm filelist
 > }
 > running find /usr/upgrade/files/ | wc -l showed over 64000 files. So  
 > what was happening here is that is doing due  
 > diligence in checking that all the files exist, however it uses the  
 > built in test utility to do so. While in normal situations this may  
 > not be that big of a deal since the changeset is likely to be small,  
 > running stat on 64000 individual files in a loop is rather slow.
 > In my case I have just removed the offending code and hoped all went  
 > well, however this is off course not an acceptable solution. I have  
 > not yet come up with an acceptable way to fix the offending code,  
 > hence my post to hackers. I am also not sure as to how I would file a  
 > pr bug report for this situation, any guidance would be greatly  
 > appreciated.

You are right, that loop doesn't scale very well.  I'm not
surprised it is horribly slow with 64000 files on an old
laptop that probably has a disk that isn't too fast.

It would be much better to generate two lists:
 - The list of hashes, as already done ("filelist")
 - A list of gzipped files present, stripped to the hash:

   (cd files; echo *.gz) |
   tr ' ' '\n' |
   sed 's/\.gz$//' > filespresent

Note we use "echo" instead of "ls", in order to avoid the
kern.argmax limit.  64000 files would certainly exceed that
limit.  Also note that the output is already sorted because
the shell sorts wildcard expansions.

Now that we have those two files, we can use comm(1) to
find out whether there are any hashes in filelist that are
not in filespresent:

   if [ -n "$(comm -23 filelist filespresent)" ]; then
           echo -n "Update files missing -- "

That solution scales much better because no shell loop is
required at all.

Best regards

Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:

"Life is short (You need Python)"
        -- Bruce Eckel, ANSI C++ Comitee member, author
           of "Thinking in C++" and "Thinking in Java"

More information about the freebsd-hackers mailing list