Re: vminfo provider for FreeBSD

From: Mark Johnston <markj_at_freebsd.org>
Date: Tue, 07 Jun 2022 17:17:06 UTC
On Mon, Jun 06, 2022 at 12:21:42PM -0400, Peter Johnson wrote:
> Thanks for the detailed reply.
> 
> I'm coming at this from a bit of a weird angle: I use FreeBSD as the basis for
> my operating systems course, part of which asks students to write programs
> that exercise operating system functionalities like scheduling and virtual
> memory in specific ways and then use DTrace to confirm that their programs are
> doing what they're supposed to do vis-a-vis those functionalities (eg, "write
> a program that induces swapping and a D script that proves it works").

Cool!

> Currently, the best way I've figured out for them to do this in the context of
> virtual memory is to use vmstat(1), but it would be nice to have those
> statistics at a process granularity.

That makes sense.  30 minutes ago I was wishing I could check whether a
given process took the "optimized COW fault" path in vm_fault.c (see the
v_cow_optim counter).  vmstat -s doesn't give a reliable answer, merely
running that command itself causes the counter to increment.

> The Illumos documentation on the vminfo provider itself suggests as a use case
> getting more fine-grained information than the Illumos implementation of
> vmstat makes available [1]---"more fine-grained" meaning both "per process
> statistics" and, eg, "more information about individual faults".
> 
> Given the reservations you note/confirm (potential onerous overhead of SDT
> probes in VM code, lack of clear mapping between Illumos probes and FreeBSD
> codebase, scalability of arg1 especially wrt SMP systems, complexity of
> per-domain NUMA stats) I propose the following as a first step:
> 
>     Implement probes that fire whenever values in the "page" category of
>     vmstat(1) output change; that is: a page fault occurs (flt), a page is
>     reactivated (re), a page is paged in (pi), a page is paged out (po), a
>     page is freed (fr), a page is scanned by the page daemon (sr).
> 
> I am unfamiliar with the codebase, but it seems likely to me that all of those
> use counter(9), and so we would be able to correctly populate arg1.

Most of them use counter(9), yes.  "sr" is a bit more complicated.
Basically, there is a counter in each page queue
(PQ_{ACTIVE,INACTIVE,LAUNDRY} times the number of NUMA domains) which is
updated once per "batch" of scanned pages.  The global "sr" value is
computed on demand by summing the per-pagequeue counters.

> This would be a very modest amount of work (at least relative to transferring
> the entire vmfino provider as it exists in Illumos) and give a starting point
> for measuring SDT overhead.  

Yep, that sounds perfectly reasonable.

> Once those proposed probes are in place, we can decide whether to implement
> other probes from the Illumos set, add new probes that we determine useful,
> optimize the SDT implementation, address SMP or NUMA considerations, etc.
> 
> Thoughts?

This makes sense to me.  The other thing we might consider is whether
it's worth including additional arguments (e.g., the physical vm_page_t)
in some cases.  That could always be added later though.

> pete
> 
> [1] https://illumos.org/books/dtrace/chp-vminfo.html#chp-vminfo-3
> 
> On Fri, Jun 03, 2022 at 03:47:31PM -0400, Mark Johnston wrote:
> > On Thu, Jun 02, 2022 at 01:08:27PM -0400, Peter Johnson wrote:
> > > Hi there --
> > > 
> > > I would find the probes in Illumos' vminfo provider [1] really handy to have
> > > in FreeBSD and I'm happy to do the work to make it happen.  The only
> > > FreeBSD-related mention of the vminfo provider I can find is an old mailing
> > > list post [2] that I interpret to mean that the existing fbt probes aren't a
> > > meaningful alternative (not to mention that using fbt probes effectively
> > > requires more understanding of the source code than is perhaps desirable given
> > > DTrace's intended purpose/audience).
> > > 
> > > My first question is: would such an addition be welcome?  I can make a more
> > > detailed case for its inclusion if that would be helpful/persuasive.
> > 
> > I think it'd be welcome.  My major reservation is that SDT probes have
> > non-zero overhead even when disabled, especially on FreeBSD as currently
> > implemented.  The vminfo provider effectively adds a probe to various VM
> > counter increments, which can occur very very frequently in some
> > workloads, so I think we'd also want to
> > 1) try to measure that overhead, perhaps using some micro-benchmarks,
> > 2) possibly use the results to help motivate some long-overdue
> >    improvements to the SDT implementation.
> > I'd be interested in helping with both of these.
> > 
> > It'd be helpful to see an example or two demonstrating how the vminfo
> > provider would be useful in diagnosing a particular problem.
> > 
> > > If it is welcome, my plan would be to get very well-acquainted with FreeBSD's
> > > VM subsystem, identify where each of the vminfo probes described in the
> > > Illumos documentation should go, and then develop a patch to add those probes,
> > > seeking feedback from both freebsd-dtrace folks and whichever group has
> > > dominion over the VM stuff.
> > > 
> > > My second question is: does this sound like a reasonable plan?  It is,
> > > admittedly, almost uselessly high level, but I expect I will need more than a
> > > little familiarity with the codebase before I can get more specific.
> > 
> > Looking through the provider documentation, I suspect it'll be difficult
> > to implement some of the probes on FreeBSD, as you note below.  For
> > instance, I'm not sure that execfree can be implemented at all; FreeBSD
> > doesn't have any (cheap) way to determine whether a given physical page
> > belongs to an executable image.  At least, I can't think of one.
> > 
> > A second issue is in the description of "arg1" for vminfo probes.  In
> > FreeBSD, frequently-updated counters are implemented using counter(9),
> > which provides per-CPU counters.  To get the global value of such a
> > counter, one must iterate over all per-CPU elements, summing them up.
> > That's quite expensive and wasteful if you're doing it every time a
> > vminfo probe fires.  I'm not sure how best to deal with that problem.
> > 
> > Yet another consideration is how one might expose per-NUMA domain
> > counters.  We could simply ignore that consideration and just provide
> > global values, but per-domain info can be very useful.
> > 
> > FreeBSD's VM system has a number of counters, exposed in various
> > subtrees of the "vm" sysctl node.  One might start by looking at the
> > existing counters to see how closely they match vminfo probes, or simply
> > define FreeBSD's vminfo provider in terms of the existing counters,
> > possibly adding new ones.
> > 
> > > Given the mailing list post I mentioned above, it seems possible that some of
> > > the vminfo probes described in the Illumos documentation don't make sense in
> > > the context of FreeBSD (eg, if FreeBSD doesn't have a distinct paging daemon,
> > > then the pgrrun, rev, and scan probes aren't suited for transfer).  On the
> > > other hand, there may be aspects on the FreeBSD side which would be beneficial
> > > to monitor, but for which Illumos does not define probes.
> > 
> > I agree.  FreeBSD does have a paging daemon, implemented in vm_pageout.c.
> > 
> > > Therefore, my third question is: how important is it for a vminfo provider
> > > implementation in FreeBSD to hew closely to the Illumos implementation?  Would
> > > it be acceptable to not transfer some probes that don't make sense and add
> > > some new probes that do?  Documentation is obviously vital for any deviations,
> > > and I will make darn sure to make it a central part of the work.
> > 
> > Having ported the ip/tcp/udp providers based on illumos documentation,
> > and having gone through some effort to make them compatible, I'm fairly
> > skeptical that it's important to maintain compatibility.  Most
> > non-trivial D scripts that I've seen and written which use these
> > providers will also make use of FBT probes here and there, so some
> > porting work is needed regardless.  Based on that, and on the
> > observations above, compatibility shouldn't be a priority IMHO.
> > 
> > > Any and all feedback is most appreciated.
> > > 
> > > Thanks.
> > > 
> > > pete
> > > 
> > > 
> > > [1] https://illumos.org/books/dtrace/chp-vminfo.html
> > > [2] https://lists.freebsd.org/pipermail/freebsd-dtrace/2014-April/000209.html
> > > 
> > > 
> >