Disparity between /etc/services and /var/db/services.db

Wed Dec 3 10:39:46 UTC 2014

On Tue, 2 Dec 2014, John-Mark Gurney wrote:

> Rui Paulo wrote this message on Tue, Dec 02, 2014 at 11:51 -0800:
>> On Dec 2, 2014, at 08:13, Garrett Cooper <yaneurabeya at gmail.com> wrote:
>>>
>>> On Dec 1, 2014, at 11:28, Benjamin Kaduk <kaduk at MIT.EDU> wrote:
>>>
>>>> On Mon, 1 Dec 2014, Garrett Cooper wrote:
>>>>
>>>>> $ ls -l /scratch/2/etc/services /scratch/2/var/db/services.db
>>>>> -rw-r--r--  1 ngie  wheel    86802 Nov 27 02:23 /scratch/2/etc/services
>>>>> -rw-r--r--  1 ngie  wheel  2097920 Nov 27 02:23 /scratch/2/var/db/services.db
>>>>
>>>> One's a text file and the other a Berkeley DB file ... I wouldn't expect
>>>> them to be the same size.
>>>
>>> Shoot. I didn?t mean for this message to get sent out without a lot of context. For that I apologize...
>>>
>>> Basically what I was going to comment on was the fact that the .db file was so large, and by adjusting the number of entries I was able to reduce the size of the file by 4 (it?s bloated by a couple thousand):

4 bytes is not much smaller :-).

I wonder how much slower using a database is at all.  Perhaps it was faster
in 1992 when disks were slow and CPUs were slower.  Now CPUs are relatively
faster, they may be able to parse a whole large text file in less time than
a database lookup in a much larger database file, depending on how many
disk i/o ops are needed.

>>> From usr.sbin/services_mkdb/services_mkdb.c:
>>>
>>> 70 HASHINFO hinfo = {
>>> 71         .bsize = 256,
>>> 72         .ffactor = 4,
>>> 73         .nelem = 32768,
>>> 74         .cachesize = 1024,
>>> 75         .hash = NULL,
>>> 76         .lorder = 0
>>> 77 };
>>
>> I doubt you'll find any history without contacting the original author (ume@).  If I had to guess, I think this was a premature optimisation.  The database just needs to contain a two level hash up: port number and service number.  If you can prove that reducing nelem size doesn't cause a performance regression, then we could change it.  4MB is way too much on an embedded system.
>
> I'd say we don't even need the proof...  Do you really look up service
> numbers in tight loops?  As long as the size is resonable, it'll be
> fine..
>
> If anything, maybe services_mkdb.c should warn, or even preprocess the
> file to get the number of entries before creating it..

I think none of the small databases for services, login, passwd or termcap
should exist, except possibly passwd on systems where it is not small
(/etc/passwd is ~1K on my systems and ~55K on freefall).  The lookups are
just too rare to be worth optimizing.

Quick test for termcap:

-r--r--r--  1 root  wheel   204798 Jun  7  2004 /usr/share/misc/termcap
-r--r--r--  1 root  wheel  1310720 Mar 21  2004 /usr/share/misc/termcap.db

1000 tgetent()'s of last entry in the text file:
     0.30 millisecs each using the database
     1.56 millisecs each using the text file

Well, the database is actually an optimization in the worst case for the
text file.

1000 tgetent()'s of first entry in the text file:
     0.12 millisecs each using the database
     0.12 millisecs each using the text file

1000 tgetent()'s of middle entry in the text file:
     0.14 millisecs each using the database
     0.78 millisecs each using the text file

The performance for the text file is linear in the position of the entry.

The performance is very low anyway -- so slow that is slower than fork+exec
of the program to test it unless that is pessimized too:

1000 fork-execs of program to test this (0 tgetents):
     0.18 millisecs each statically linked (-current is much slower)
     0.82 millisecs each dynamically linked (-current is much slower)

1000 fork-execs of program to test + 1 tgetent each (typical use):
     1.32 millisecs each dynamically linked, last entry, database lookup
     2.72 millisecs each dynamically linked, last entry, text lookup
     0.56 millisecs each statically linked, last entry, database lookup
     1.72 millisecs each statically linked, last entry, text lookup

If anyone wants to actually optimize this, then one method is to set TERMCAP
in ther environment.  Then there is nothing to look up, and tgetent() takes
about 0.12 milliseconds for the initial lookup and 0.04 milliseconds for
repeated lookups.

Password lookup is more interesting since programs like ls and tar sometimes
do thousands of id lookups.  A less exhaustive test gave:

1000 fork-execs of program to test + 1 user_from_uid() each:
     0.39 millisecs each statically linked, last entry, database lookup
     0.31 millisecs each statically linked, last entry, text lookup FAILED

The text lookup was faster, but also didn't work -- it returned a string
representation of the id.  The text database is apparently not supported
for pwd (except of course to convert from it).  user_from_uid() apparently
doesn't trust database lookup to be fast, since it uses a small cache
internally.  That should make repeated lookups fast no matter what the
database lookup does.

0.39 msec is still very slow.  That is longer for the database lookup
than for the fork-exec of the program to do it (0.18 for the statically
linked program and 0.21 for the database initialization and read of 1
entry).  Reading 2 entries instead of 1 takes the same 0.39 msec to
within the measurement accuracy.  To be faster, the text method basically
needs to read and parse the whole file in less than the 0.21 msec needed
to initialize the database.  This is very easy for my small /etc/passwd
file, since even reading it all using the slow getchar() method takes only
0.03 msec.

Bruce