problems with gmirror on ggate over slow link
petefrench at ticketswitch.com
Fri Oct 23 10:56:31 UTC 2009
[ originally sent to geom, but am throwing it open to a wider
audience as I didn;t get any replies there]
I am using 7.2-STABLE from October 7th on all amchines, but this
has been going on a while. Very simply I am mirroring together a pair
of discs, one local, one remote. The remote disc is accessed using ggate.
If the remote diisc is actually on a very close machine - e.g. a server
plugged into the same ether net - then all works fine. If I make
the remote disc somewhere actually substantially further away on the
nbetwork, however, then when I attach the disc it starts to rebuild the
mirror but then fails a fraction of a second later thus:
GEOM_MIRROR: Device mysql0: rebuilding provider ggate1a.
GEOM_MIRROR: Synchronization request failed (error=5). ggate1a[WRITE(offset=1310720, length=131072)]
GEOM_MIRROR: Device mysql0: provider ggate1a disconnected.
GEOM_MIRROR: Device mysql0: rebuilding provider ggate1a stopped.
The interesting this is that the problem is only with gmirror, not with
the underlying ggate disc which remains attached and accessible. I tested
this by adding a second partition (ggate1b in the example above) and
mounting a UFS filesystem on that.
I've looked at the kernel code briefly, but it is not clear to me
what is causing that write to fail. My conjecture would be that a buffer
somewhere is filling up, causing a write to fail, and instead of gmirror
waiting and retrying, instead it just fails the synchronisation.
Any ideas ? Is this actually a bug ? I am wondering if it would also happen
if mirroring a very fast disc against a very slow one (i.e. maybe it is
independent of ggate)
More information about the freebsd-stable