BSDstats Project v2.0 ...

Wed Aug 9 18:30:26 UTC 2006

Marc G. Fournier wrote:
> On Wed, 9 Aug 2006, Howard Jones wrote:
> 
>> Marc G. Fournier wrote:
>>
>>> Right, and the bad thing is if yu alias another IP on that device, the
>>> hash totally changes, so we see that one host now as being two different
>>> ones :)  That's why we disqualified using ifconfig right at the
>>> beginning ...
>>
>> But didn't you say that you effectively wipe the database once a month,
>> (or expire entries over that age)? I can't find the post that mentioned
>> that now, naturally... :-) if you aren't using the 'key' as a database
>> key, then what do you care that it changes as long as it uniquely
>> identifies the system (which it definitely would)?
>>
>> I don't know how typical I am, but I don't really remember the last time
>> I added an IP alias on a running server, for our few dozen production
>> systems. I would imagine that those types of changes might well be lost
>> of systems coming and going.
> 
> I add/remove IPs from our servers several times each week, as we add VPS
> and remove them, or move then between boxes ...

This problem is intractable: any scheme you can think of to generate a
unique identifying number on a random host out there on the net will either
fail to actually be unique, or suffer from mutating over time as machine
configuration changes.

How about the following.  Use the bsdstats.hub.org to generate a random
token and hand it to the client.  128 bits of randomness gives a sufficiently
large domain (340,282,366,920,938,463,463,374,607,431,768,211,456 different
possible combinations) that given a good RNG collisions are not a problem.
You can generate that sort of token easily by, for example:

    % openssl rand -base64 16
    KSOWkPuK03Od99S5vaPGdQ==

Base64 encoded strings will have to be URL escaped if they are passed as
parameters in a HTTP GET -- perhaps encoding as a string of hex digits might
be a better idea:

    % openssl rand 16 | hexdump -e '16/1 "%01x" "\n"'
    566fc9f2374a7e999d9587dc143373fc

Anyhow, that's just implementation detail.

So the transaction would go like this the first time a client machine tried to
report its configuration:

Client                                Server
-----------------------------------------------------------------------------
Check for cached ID token
Not found
Request new token from server ------> Generate token
                                      Record it in DB
                                      Return token to client
                              <------
Cache token in file
Generate OS version info
Send to server with ID token -------> If token is known, record data in DB

Generate Driver info
Send to server with ID token -------> If token is known, record data in DB

etc. etc.
-----------------------------------------------------------------------------

Because the server generates the tokens, it knows which ones are valid, and
can discard any data sent to it without a valid token.  That doesn't prevent
any vandal-minded person from requesting a metric butt-load of tokens to spam
the database with, but that's no worse than the current situation.  The neat
thing is, the number of available tokens is so huge that it is infeasible to
guess or accidentally collide with someone else's token. Eg. At 100Mb/s it would
take about 10^33 seconds or 10^25 years to exhaustively search the whole token
space.  Thus spammed data will just time out at the end of the month without
affecting anyone else's real data.  Stealing an existing ID token by breaking
into a machine or snooping on the net would be possible, but presumably
sufficiently difficult to do in a large enough quantity that it wouldn't have a
significant effect on the overall statistics.  If snooping turns out to be a
real problem, then using HTTPS is a possibility, but that will ramp up the load
on the server quite a bit.

For subsequent updates, the client machine just reuses the same token out of
its cache file.  If the cached token gets deleted, then the client machine will
just have to request a new one and rely on the old data timing out at the end of
the month.

Saving away the token should be simple -- just make the server return the data
to a 'get_token' query as MIME type text/plain and have fetch dump it in
a cache file somewhere.  /var/db/bsdstats for example.  I can code up the client
side of this in about 5 minutes, but the server end of things will take a little
more work.

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                       7 Priory Courtyard
                                                      Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey         Ramsgate
                                                      Kent, CT11 9PW

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20060809/1ba4b9c3/signature-0001.pgp