BSDStats - What is involved ... ?

Tue Aug 29 01:53:34 UTC 2006

On Tue, Aug 29, 2006 at 11:39:23AM +1000, Antony Mawer wrote:
> On 29/08/2006 6:07 AM, Marc G. Fournier wrote:
> >On Mon, 28 Aug 2006, Brooks Davis wrote:
> >
> >>While I understand (or think I understand) the motivations for this 
> >>design goal, it's contrary to allowing collection of statistics from 
> >>many people.  I'd love to be able to publish data from the FreeBSD 
> >>systems (300+) at work, but unless I can do it in an anonymized 
> >>aggregate form it's not going to happen.  I just can't justify leaking 
> >>that much internal configuration information given a policy of hiding 
> >>it (right or wrong and not subject to debate).  If I could run my own 
> >>stats server and publish from it that might be possible.
> >
> >Agreggate submissions will never be possible, as it will definitely 
> >break any attempts at keeping the data 'clean' :(  I do understand that 
> >we will never be able to get *everyone* reporting, but we will try as 
> >much as possible to make it easy for as many as possible to report 
> >*within* limits ...
> >
> >I'm going to work on an 'email submission' method in September, that 
> >would allow repoting to go *thru* one mailbox, and will include a 
> >confirmation/challenge stage *per* server though ...
> 
> Brooks, what sort of information are you looking to "anonymise" before 
> sending it out? Aggregating to say that I have X of this kind of CPU, Y 
> of this IDE chipset, etc, rather than linking it specifically to each 
> machine? Where would you feel a comfortable balance lay? Obviously some 
> effort needs to be made to minimise fraudulent entries
> 
> Perhaps aggregate submissions could be conducted using a registration 
> mechanism...
> 
> Other thoughts would be having a local stats aggregation server that 
> pushes summaries up to the master server... the aggregation server keeps 
> the individual details, and some sort of challenge mechanism could be 
> randomly selected by the master server to reduce the ease with which the 
> numbers can be 'faked'?
> 
> ... just rambling as I thought of potential ways around this ...

I'd prefer not to expose host names or IP addresses, hardware
information and OS version aren't really a problem if they can't be
traced to a host name.  The requirement to register an aggregation
server would be fine with me.  A challenge mechanism would be tricky
because it would have to occur during a push to the central server since
connects back are not really possible.

-- Brooks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20060829/c2c52e94/attachment.pgp