RFC: hyperv disk i/o performance vs. data integrity

Sun Feb 2 05:25:00 UTC 2014

Disclaimer: This is more of a thinking out loud then it is an definative
set of suggestions on the matter.   Also a cleaned up version of this will
likely become PetiteCloud's white paper on storage and disaster recovery.
I do not make any promises to when any of it might be implemented and/or if
it will be implemented in the manner described here.

Looking at the link Peter provided in an other thread:

http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaatbpkvmguestcache.htm

I at first did my standard "OpenStack got it wrong and PetiteCloud got it
right" reaction to it.   I then read deeper and saw that every last
modethat offered reasonable performance also was considered to be
untrustworthy especially in the case of a power failure.   The one
exception seems to be if your going straight to physical disk then if you
can use "none" and get reasonable performance without the issues associated
with abrupt disconnects like power failure or the sudden death of the
hyperv process.

So it seems we are stuck with sucky disk performance.   That is until we
make an other interesting observation that TCP offers the guarantee of
never been more then a few packets out of sync and being 100% reliable if
the network is functioning properly.   At first it might not seem that a
network would ever be faster then disk.  We forget though that we are
talking virtualization and not real networks here so there is no reason why
we can not form networks between instances on the same host and no matter
how inefficent the packet drivers are they are surely faster then any disk
if we only consider transport on the host's motherboard and not between
hosts.

Craig Rodrigues and the FreeNAS team have done a fantastic job already (I
have not personally tried FreeNAS yet but I have heard nothing but good
things about it) and making it so it can run on a bhyve instance.   Given
that the following local machine only archicture might make sense to act as
a solution to the performance vs. safety problem in the hyperv's:

Host +------ Storage (both local and remote)
        |
        +------ FreeNAS instance (as little RAM and VCPU's as possible)
        |
        +------ Production instances

The FreeNAS node would distribute it's storage via iSCSI or the equiv.
Setting the rule that all "primary" iSCSI sessions/devices be local (in
case vs. on rack or somewhere else in the data center) would eliminate the
power failure nightmare that OpenStack seems to have
http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-to-openstack-compute.html#section_nova-disaster-recovery-processwithout
killing performance (in many cases increasing it).   The reason is
it is not an issue since we isolated all the remote disk sessions to one
instance we have used the "blast wall" capability of virtualization.
Namely if FreeNAS blows up we just swap in an other FreeNAS instance with
the same devices attached and the using normal OS (host and guest)
facilities it should be trivial to reconnect the device to the guest (you
will just have to give up the idea of a session that outlives the devices
power cycle though) and do basic recovery.   Now that we have offloaded the
storage from the hyperv all the other aspects of backup/recovery can be
done using normal OS facilities instead of the cloud platform.

(Real) network storage will need to use a completely different model likely
though if you allow it to be passed to the guest OS.

-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org