HAST - detect failure and restore avoiding an outage?

Pawel Jakub Dawidek pjd at FreeBSD.org
Sun Feb 24 14:41:47 UTC 2013


On Sun, Feb 24, 2013 at 12:05:06PM +0200, Mikolaj Golub wrote:
> On Sat, Feb 23, 2013 at 09:51:03PM +0100, Pawel Jakub Dawidek wrote:
> 
> > I'm fine with the patchi except for missing breaks in switch added to
> > hastd/primary.c.
> 
> Oops. Fixed. Thanks!
> 
> > I'm also wondering... You count all those errors separately just to
> > print them as one number. If we do that already let's print them
> > separately, eg.
> > 
> > 	local i/o errors: read(0), write(3), delete(5), flush(9)
> 
> The idea was that hastd provided all available counters, and hastctl
> showed only aggregated counter just to save a screen space, but if one
> wanted to write its own utility to monitor hastd, which would talk
> directly to hastd via socket, she would be able to see all counters
> separately.
> 
> But your idea with writing errors in one string looks better, as it
> allows to save a screen space and provide more detailed info. I would
> prefer a little different output though:
> 
>   role: secondary
>   provname: test
>   localpath: /dev/md102
>   extentsize: 2097152 (2.0MB)
>   keepdirty: 0
>   remoteaddr: kopusha:7771
>   replication: memsync
>   status: complete
>   dirty: 0 (0B)
>   statistics:
>     reads: 13
>     writes: 521
>     deletes: 0
>     flushes: 0
>     activemap updates: 0
>     local i/o errors:
>       read: 13, write: 425, delete: 0, flush: 0
> 
> but don't have a strong opinion and will be ok with yours if you don't
> like my version.

My only comment would be to keep that in one line so it is easier to
grep. And merging those two lines won't exceed 80 chars.

> > BTW. Why not to count activemap update errors as write and flush errors?
> 
> I need (internally) separate counters for activemap errors because
> they are updated by the different thread and I wouldn't want to
> introduce locking for error counter update operations. As hastctl was
> supposed to show an aggregated counter I didn't bother much how to
> make activemap update errors to count as write and flush errors. I
> improved this too in the updated patch:
> 
> http://people.freebsd.org/~trociny/hast.stat_error.2.patch

The patch looks good.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20130224/50090eb6/attachment.sig>


More information about the freebsd-questions mailing list