HAST - detect failure and restore avoiding an outage?
Pawel Jakub Dawidek
pjd at FreeBSD.org
Sun Feb 24 14:41:47 UTC 2013
On Sun, Feb 24, 2013 at 12:05:06PM +0200, Mikolaj Golub wrote:
> On Sat, Feb 23, 2013 at 09:51:03PM +0100, Pawel Jakub Dawidek wrote:
>
> > I'm fine with the patchi except for missing breaks in switch added to
> > hastd/primary.c.
>
> Oops. Fixed. Thanks!
>
> > I'm also wondering... You count all those errors separately just to
> > print them as one number. If we do that already let's print them
> > separately, eg.
> >
> > local i/o errors: read(0), write(3), delete(5), flush(9)
>
> The idea was that hastd provided all available counters, and hastctl
> showed only aggregated counter just to save a screen space, but if one
> wanted to write its own utility to monitor hastd, which would talk
> directly to hastd via socket, she would be able to see all counters
> separately.
>
> But your idea with writing errors in one string looks better, as it
> allows to save a screen space and provide more detailed info. I would
> prefer a little different output though:
>
> role: secondary
> provname: test
> localpath: /dev/md102
> extentsize: 2097152 (2.0MB)
> keepdirty: 0
> remoteaddr: kopusha:7771
> replication: memsync
> status: complete
> dirty: 0 (0B)
> statistics:
> reads: 13
> writes: 521
> deletes: 0
> flushes: 0
> activemap updates: 0
> local i/o errors:
> read: 13, write: 425, delete: 0, flush: 0
>
> but don't have a strong opinion and will be ok with yours if you don't
> like my version.
My only comment would be to keep that in one line so it is easier to
grep. And merging those two lines won't exceed 80 chars.
> > BTW. Why not to count activemap update errors as write and flush errors?
>
> I need (internally) separate counters for activemap errors because
> they are updated by the different thread and I wouldn't want to
> introduce locking for error counter update operations. As hastctl was
> supposed to show an aggregated counter I didn't bother much how to
> make activemap update errors to count as write and flush errors. I
> improved this too in the updated patch:
>
> http://people.freebsd.org/~trociny/hast.stat_error.2.patch
The patch looks good.
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20130224/50090eb6/attachment.sig>
More information about the freebsd-questions
mailing list