Storage 'failover' largely kills FreeBSD 10.x under XenServer?
Karl Pielorz
kpielorz_lst at tdx.co.uk
Wed Sep 20 14:54:26 UTC 2017
--On 20 September 2017 at 12:44:18 +0100 Roger Pau Monné
<roger.pau at citrix.com> wrote:
>> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant
>> of the I/O delays that occur during a storage fail over?
>
> Do you know whether the VMs saw the disks disconnecting and then
> connecting again?
I can't see any evidence the drives actually get 'disconnected' from the
VM's point of view. Plenty of I/O errors - but no "device destroyed" type
stuff.
I have seen that kind of error logged on our test kit - when deliberately
failed non-HA storage, but I don't see it this time.
> Hm, I have the feeling that part of the problem is that in-flight
> requests are basically lost when a disconnect/reconnect happens.
So if a disconnect doesn't happen (as it appears it isn't) - is there any
tunable to set the I/O timeout?
'sysctl -a | grep timeout' finds things like:
kern.cam.ada.default_timeout=30
I might see if that has any effect (from memory - as I'm out of the office
now - it did seem to be about 30 seconds before the VM's started logging
I/O related errors to the console).
As it's a pure test setup - I can try adjusting this without fear of
breaking anything :)
Though I'm open to other suggestions...
fwiw - Who's responsibility is it to re-send lost "in flight" data, e.g. if
a write is 'in flight' when an I/O error occurs in the lower layers of
XenServer is it XenServers responsibility to retry that - before giving up,
or does it just push the error straight back to the VM - expecting the VM
to retry it? [or a bit of both?] - just curious.
-Karl
More information about the freebsd-xen
mailing list