schedgraph.d experience, per-CPU buffers, pipes

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Fri, 24 Dec 2021 13:08:08 UTC
I would like to share some experience or maybe rather a warning about using 
DTrace for tracing scheduling events.  Unlike KTR which has a global circular 
buffer, DTrace with bufpolicy=ring uses per-CPU circular buffers.  So, if there 
is an asymmetry in processor load, the buffers will fill and wrap-around at 
different speeds.  In the end, they might have approximately equal numbers of 
events but those may cover very different time intervals.  So, some additional 
post-processing is required to find the latest event among first ones of each 
per-CPU buffer.  Any traces from before that would have information gaps 
("missing" processors) and would be very confusing.

Also, I noticed that processes passing a lot of data through pipes produce a lot 
of scheduling events as they seem to get blocked and unlocked every few 
microseconds (on a modern performant system with the default pipe sizing 
configuration).  That contributes to a quick wrap-around of circular buffers.

-- 
Andriy Gapon