[CFR][CFT] counter(9): new API for faster and raceless counters
alfred at ixsystems.com
Wed Apr 3 17:37:05 UTC 2013
Hey folks, sorry for the top post here, but I just came into this thread.
Here at iXsystems we've just developed a set of scripts to scrape the
various FreeBSD user land utilities (sysctl, netstat, nfsstat, vmstat,
etc, etc) and put them into graphs based on time.
The goal is to be able to line up all these metrics with whatever
benchmark we are currently running and be able to see what may be
Potentially you should be able to scroll through the graphs and see
things like "ran out of mbufs @time", "vm system began paging at @time",
"buffer deaemon went nuts @time"
Then we can take the information back and leverage it to make tuning
decisions, or potentially change kernel algorithms.
The only problem we have is that every user land tool has its own
format, so along with my team we have written some shell to coerce the
output from the various programs into pseudo-CSV (key/value pair) which
can then be post processed by tools to convert to CSV which can then be
put into something like open office, or put through an R program to
I'm hoping to have something shortly.
What I was hoping to do over the next few days was discuss with people
how we can (or should we even) fix the user land statistics tools to
output machine readable output that can be easily parsed.
Example: netstat -m (hard to parse) versus 'vmstat -z | grep mbuf' easy
The idea of outputting xml is good, CSV is OK, however CSV is
problematic as in the case of sysctl, if new nodes appear, then we can't
begin to emit them, we must either ignore them, or abort, or log them to
auxiliary files. Anything that makes life easier is good.
I should be able to share our scripts within the next couple of days.
On 4/3/13 3:04 AM, Pawel Jakub Dawidek wrote:
> On Wed, Apr 03, 2013 at 02:28:46AM +0200, Luigi Rizzo wrote:
>> On Wed, Apr 03, 2013 at 01:26:07AM +0200, Pawel Jakub Dawidek wrote:
>>> On Mon, Apr 01, 2013 at 03:51:28PM +0400, Gleb Smirnoff wrote:
>>>> Together with Konstantin Belousov (kib@) we developed a new API that is
>>>> initially purposed for (but not limited to) collecting statistical
>>>> data in kernel.
>>> Is there any plan to implement universal way of exporting those
>>> statistics out of the kernel?
>>> Solaris has a framework for in-kernel statistics, which are exported via
>>> kstat tool. For ZFS I export them via sysctl. If you have ZFS loaded you
>>> can try 'sysctl kstat'.
>>> It would be nice for counter_u64_alloc() to take additional argument
>>> 'name' and to create sysctl for the counter automatically. We could then
>>> slowly start migrating userland tools to use sysctls (or some wrapper
>>> userland API), but we immediately make those statistics available for
>>> use in scripts.
>> that is an interesting idea but i believe it can be effectively
>> built as a wrapper on top of the counter_u64_alloc() routine:
>> name_counter(counter_t c, const char *fmt, ...);
>> free_named_counter(counter_t c);
>> After all the name->counter mapping is unidirectional,
>> and possibly not even necessary on every single counter
>> (think of ipfw dynamic rules, created on packet arrivals, so
>> the counter alloc/dealloc needs to be fast).
> Right, although I'd optimize API naming and usage for the common case.
> Eventhough we do want to able to alloc/free counters quickly sometimes,
> most of the time we don't care about alloc/free speed and we would like
> to have a name. Having a name argument that could be NULL for
> short-living counter would allow to call only one allocation function in
> the common case (actually in every case).
>> It might be useful for the name_counter() routine to support
>> a printf-style argument to make it easy to build names.
>>>> o Tiny API for counter(9):
>>>> counter_u64_alloc(int wait);
>>>> counter_u64_free(counter_u64_t cnt);
>>>> counter_u64_add(counter_u64_t cnt, uint64_t inc);
>>>> counter_u64_fetch(counter_u64_t cnt);
>>> Do you really expect other types in the future? If so, could we at least
>>> create generic counter_t that internally keeps the type?
>> I read the u64 in the name mostly as a reminder to users
>> of the counter size.
> Should the users care? As a user of this KPI I'd prefer to have simpler
> name and just assume the counter is big enough.
>> It might actually make sense is to change the type to s64.
>> This way we could have counters that go negative,
>> and also use them to accumulate sbintime_t values.
> Agreed, int64_t seems better.
>> But otherwise i am not sure that we want other types.
>> u32/s32 might save atomic/critical_enter ops on some archs,
>> but they saturate so quickly that probably are a bad idea.
>> And 63/64 bits are quite large already.
> Right, I don't think 32bit counters are needed at all and I can't find
> any use for 128bit counters either.
More information about the freebsd-arch