Partial kvm dumps

Mon Aug 24 07:46:04 UTC 2009

Hi,

I would like to discuss the idea of partial kvm dumps -- the possibility of
creating dumps of some parts of the kernel memory from the live system, which
later could be read via KVM interface.

Why this could be useful. I suppose many people here happened to set up
scripts to run utilies like ps, vmstat, top etc periodically to collect system
statistics and analyse system behaviour when problems happened. I did this so
often that eventually wrote perl script -- wrapper around these utilities to
make the setup and later data analysis easier.

http://code.google.com/p/gatherit/wiki/README

Currently I run this script on most of my servers collecting various
statistics about a system.

But I feel some discomfort from the fact that this is rather inefficient. Here
is typical list of commands I use to collect data:

$ ./gather show utils
--------------------------------------------------------------------------------
name       cmd                               desc
--------------------------------------------------------------------------------
devstat    /usr/local/bin/devstat            devstat output
df         /bin/df                           df output
fstat      /usr/bin/fstat                    fstat output
netstat-La /usr/bin/netstat -nLa             netstat listening socket statistics
netstat-a  /usr/bin/netstat -na              netstat socket statistics
netstat-i  /usr/bin/netstat -ni              netstat interface statistics
netstat-m  /usr/bin/netstat -m               netstat mbuf statistics
netstat-r  /usr/bin/netstat -nr              netstat routing tables
netstat-rs /usr/bin/netstat -rs              netstat routing statistics
netstat-s  /usr/bin/netstat -s               netstat system wide statistics
nfsstat    /usr/bin/nfsstat                  nfsstat output
ps         /bin/ps auxww                     processes statistics (-u flag)
ps-l       /bin/ps alxww                     processes statistics (-l flag)
sockstat   /usr/bin/sockstat                 sockstat output
sysctl     /sbin/sysctl -a                   sysctl variables
top        /usr/bin/top -d1 -S -b 1000       top output (cpu mode)
top-mio    /usr/bin/top -d1 -S -mio -b 1000  top output (io mode)
uptime     /usr/bin/uptime                   system uptime
vmstat     /usr/bin/vmstat                   vmstat output
vmstat-i   /usr/bin/vmstat -ai               vmstat interupts statistics

Note, many utilities are run several times but with different parameters, also
there are comands that do almost the same (e.g. netstat -a and sockstat),
processing the same kernel structures. I want them all to run because I don't
know in advance what output will turn out more usefull in certain
circumstances.

It would be more efficient to have some one utility that whould traverse
kernel structures extracting all necessary data and later on need this data
would be converted to human readable output. And actually we have almost
everything for this to work. Many of the system utilities can output data not
only from live system but from core dumps too. So if we created dumps from
live systems periodically then later we would use them to extract system
statistics.

Of course there is a little sense in dumping the whole kernel memory. We could
extend our KVM interface to have the possibility of creating and then later
reading dumps that would contain only necessary parts of kernel memory.

As proof of the concept I have written pkvmdump utility that creates partial
dumps with some kernel statistics, which can be later exctracted by vmstat and
ps utilities.

The details of the current implementation:

Generated dump has simple format: dump header (struct minidumphdr is used with
PKVMDUMP_MAGIC) and data entries. Each data entry has header (address of
extracted data in kvm and its lenth) + data itself. So the generation of a
dump is very simple -- kvm_open(3) /dev/mem, read necessary regions of memory
and write to dump prepending with [addr, len] header.

To read the dump the libkvm interface has been extended. The following trick
(hack? :-) is used:

On kvm_open():

   1) create temporary (unlinked) file;

   2) for every data entry from the dump do in tempfile:
   lseek(addr, SEEK_SET), write(data, len);

   3) close dump file and set kd->pmfd to point to tempfile.

On kvm_read() the request is translated to direct read from the tempfile.

This format/algorithm has been chosen becase of simplicity of implementation,
just to start experimenting with this.

You can find the source here:

http://code.google.com/p/trociny/downloads/list

I would like to hear what other people think about this. It looks very useful
for me. At least as a first step it would be nice to extend KVM to work with
partial dumps so the users could try this and see if it turned out to be
useful.

P.S. The final goal I would like to achive is to make snapshots of system
state, which could be used for later analysis if necessary. May be the
approach I try here is wrong. E.g. SNMP looks like more proper alternative
solution -- this is standard, also snmpd is actually that program which
"traverse kernel structures extracting all necessary data". But SNMP has its
own limitations, statistics provided via SNMP are rather limited and currently
I don't see how I could use it effectively to echieve my goal, althogh I
haven't think much in this direction yet...

-- 
Mikolaj Golub