HAST and CARP
hiroshi at soupacific.com
hiroshi at soupacific.com
Fri Jul 2 05:16:11 UTC 2010
Thanks for your info.
> Doing 'hastctl create' on every switching is wrong. Note, after 'hastctl
> create' hast metadata on the disk are lost and synchronization of all
blocks
That's I was afraid of it !
> Split-brain means that you or your scripts did something wrong: both nodes
> acted as primary (either simultaneously or one then another but there was no
> communication between them so both made changes to the data that was not
> synced to another node).
>
> The easy way to get split-brain is to change the role on secondary to
primary
> without changing the role on the primary host, make some changes on the
> secondary (acting as a primary) and change back its role to secondary.
I checked that both node communication is established. And here is log
without hastctl create.
Seems ServerB once became MASTER, then back to BACKUP.
This situation cause unhappy split-brain happened.
hastctl dump shows prevrole: primary
error debug los is this
Jul 2 12:31:37 fw01B kernel: Clearing /tmp (X related).
Jul 2 12:31:37 fw01B kernel: Updating motd:
Jul 2 12:31:37 fw01B kernel: .
Jul 2 12:31:37 fw01B kernel: Configuring syscons:
Jul 2 12:31:37 fw01B kernel: blanktime
Jul 2 12:31:37 fw01B kernel: .
Jul 2 12:31:38 fw01B sm-mta[879]: gethostbyaddr(211.19.53.206) failed: 2
Jul 2 12:31:38 fw01B sm-mta[879]: gethostbyaddr(211.19.53.202) failed: 2
Jul 2 12:31:38 fw01B sm-mta[880]: starting daemon (8.14.4):
SMTP+queueing at 00:30:00
Jul 2 12:31:38 fw01B sm-msp-queue[884]: starting daemon (8.14.4):
queueing at 00:30:00
Jul 2 12:31:38 fw01B kernel: Starting cron.
Jul 2 12:31:38 fw01B kernel: Starting background file system checks in
60 seconds.
Jul 2 12:31:38 fw01B kernel:
Jul 2 12:31:38 fw01B kernel: Fri Jul 2 12:31:38 UTC 2010
Jul 2 12:31:40 fw01B kernel: carp0: INIT -> BACKUP
Jul 2 12:31:40 fw01B kernel: alc0: link state changed to UP
Jul 2 12:31:40 fw01B kernel: carp0: 2 link states coalesced
Jul 2 12:31:40 fw01B kernel: carp0: link state changed to DOWN
Jul 2 12:31:43 fw01B login: login on ttyv0 as root
Jul 2 12:31:43 fw01B login: ROOT LOGIN (root) ON ttyv0
Jul 2 12:31:43 fw01B kernel: Jul 2 12:31:43 fw01B login: ROOT LOGIN
(root) ON ttyv0
Jul 2 12:31:48 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:31:48 fw01B hastd: Connection from tcp4://211.19.53.201:20070
to tcp4://211.19.53.206:8457.
Jul 2 12:31:48 fw01B hastd: tcp4://211.19.53.201:20070: resource=zfshast
Jul 2 12:31:48 fw01B hastd: [zfshast] (init) We act as init for the
resource and not as secondary as requested by tcp4://211.19.53.201:20070.
Jul 2 12:31:48 fw01B kernel: Jul 2 12:31:48 fw01B hastd: [zfshast]
(init) We act as init for the resource and not as secondary as requested
by tcp4://211.19.53.201:20070.
Jul 2 12:31:53 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:31:53 fw01B hastd: Connection from tcp4://211.19.53.201:11542
to tcp4://211.19.53.206:8457.
Jul 2 12:31:53 fw01B hastd: tcp4://211.19.53.201:11542: resource=zfshast
Jul 2 12:31:53 fw01B hastd: [zfshast] (init) We act as init for the
resource and not as secondary as requested by tcp4://211.19.53.201:11542.
Jul 2 12:31:53 fw01B kernel: Jul 2 12:31:53 fw01B hastd: [zfshast]
(init) We act as init for the resource and not as secondary as requested
by tcp4://211.19.53.201:11542.
Jul 2 12:31:58 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:31:58 fw01B hastd: Connection from tcp4://211.19.53.201:49777
to tcp4://211.19.53.206:8457.
Jul 2 12:31:58 fw01B hastd: tcp4://211.19.53.201:49777: resource=zfshast
Jul 2 12:31:58 fw01B hastd: [zfshast] (init) We act as init for the
resource and not as secondary as requested by tcp4://211.19.53.201:49777.
Jul 2 12:31:58 fw01B kernel: Jul 2 12:31:58 fw01B hastd: [zfshast]
(init) We act as init for the resource and not as secondary as requested
by tcp4://211.19.53.201:49777.
Jul 2 12:31:58 fw01B hastd: [zfshast] (init) Role changed to secondary.
Jul 2 12:32:03 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:32:03 fw01B hastd: Connection from tcp4://211.19.53.201:17014
to tcp4://211.19.53.206:8457.
Jul 2 12:32:03 fw01B hastd: tcp4://211.19.53.201:17014: resource=zfshast
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Initial connection
from tcp4://211.19.53.201:17014.
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Incoming connection
from tcp4://211.19.53.201:17014 configured.
Jul 2 12:32:03 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:32:03 fw01B hastd: Connection from tcp4://211.19.53.201:42420
to tcp4://211.19.53.206:8457.
Jul 2 12:32:03 fw01B hastd: tcp4://211.19.53.201:42420: resource=zfshast
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Outgoing connection
to tcp4://211.19.53.201:42420 configured.
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) hastd_secondary
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) calling init_local()
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) init_local
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Obtained info about
/dev/ad4p4.
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Locked /dev/ad4p4.
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) inside metadata.c
res->hr_role !=HAST_ROLE_PRIMAR
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) inside mettadata
secondary_localcnt: 1 secondary_remotecnt: 0
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) calling init_remote()
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) init_remote()
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) humhum secondary
local 1: secondary remote 0
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) init
hr_secondary_remotecnt: 0 hr_primary_remotecnt: 0
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) secondary_remotecnt
0, primary_remotecnt 0
Jul 2 12:32:03 fw01B kernel: Jul 2 12:32:03 fw01B hastd: [zfshast]
(secondary) secondary_remotecnt 0, primary_remotecnt 0
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) secondary_localcnt 1,
primary_localcnt 1
Jul 2 12:32:03 fw01B kernel: Jul 2 12:32:03 fw01B hastd: [zfshast]
(secondary) secondary_localcnt 1, primary_localcnt 1
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Split-brain detected,
exiting.
Jul 2 12:32:03 fw01B kernel: Jul 2 12:32:03 fw01B hastd: [zfshast]
(secondary) Split-brain detected, exiting.
Jul 2 12:32:03 fw01B hastd: [zfshast] (secondary) Worker process exited
ungracefully (pid=979, exitcode=78).
Jul 2 12:32:03 fw01B kernel: Jul 2 12:32:03 fw01B hastd: [zfshast]
(secondary) Worker process exited ungracefully (pid=979, exitcode=78).
Jul 2 12:32:08 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:32:08 fw01B hastd: Connection from tcp4://211.19.53.201:53033
to tcp4://211.19.53.206:8457.
Jul 2 12:32:08 fw01B hastd: tcp4://211.19.53.201:53033: resource=zfshast
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Initial connection
from tcp4://211.19.53.201:53033.
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Incoming connection
from tcp4://211.19.53.201:53033 configured.
Jul 2 12:32:08 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 12:32:08 fw01B hastd: Connection from tcp4://211.19.53.201:50656
to tcp4://211.19.53.206:8457.
Jul 2 12:32:08 fw01B hastd: tcp4://211.19.53.201:50656: resource=zfshast
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Outgoing connection
to tcp4://211.19.53.201:50656 configured.
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) hastd_secondary
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) calling init_local()
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) init_local
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Obtained info about
/dev/ad4p4.
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Locked /dev/ad4p4.
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) inside metadata.c
res->hr_role !=HAST_ROLE_PRIMAR
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) inside mettadata
secondary_localcnt: 1 secondary_remotecnt: 0
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) calling init_remote()
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) init_remote()
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) humhum secondary
local 1: secondary remote 0
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) init
hr_secondary_remotecnt: 0 hr_primary_remotecnt: 0
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) secondary_remotecnt
0, primary_remotecnt 0
Jul 2 12:32:08 fw01B kernel: Jul 2 12:32:08 fw01B hastd: [zfshast]
(secondary) secondary_remotecnt 0, primary_remotecnt 0
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) secondary_localcnt 1,
primary_localcnt 1
Jul 2 12:32:08 fw01B kernel: Jul 2 12:32:08 fw01B hastd: [zfshast]
(secondary) secondary_localcnt 1, primary_localcnt 1
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Split-brain detected,
exiting.
Jul 2 12:32:08 fw01B kernel: Jul 2 12:32:08 fw01B hastd: [zfshast]
(secondary) Split-brain detected, exiting.
Jul 2 12:32:08 fw01B hastd: [zfshast] (secondary) Worker process exited
ungracefully (pid=980, exitcode=78).
Jul 2 12:32:08 fw01B kernel: Jul 2 12:32:08 fw01B hastd: [zfshast]
(secondary) Worker process exited ungracefully (pid=980, exitcode=78).
Jul 2 12:32:13 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
When debuglog working properly.
Jul 2 10:24:10 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 10:24:10 fw01B hastd: tcp4://211.19.53.201:26965: resource=zfshast
Jul 2 10:24:15 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 10:24:15 fw01B hastd: tcp4://211.19.53.201:50280: resource=zfshast
Jul 2 10:24:20 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 10:24:20 fw01B hastd: tcp4://211.19.53.201:27929: resource=zfshast
Jul 2 10:24:25 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 10:24:25 fw01B hastd: tcp4://211.19.53.201:24357: resource=zfshast
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) Initial connection
from tcp4://211.19.53.201:24357.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) Incoming connection
from tcp4://211.19.53.201:24357 configured.
Jul 2 10:24:25 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457.
Jul 2 10:24:25 fw01B hastd: tcp4://211.19.53.201:18217: resource=zfshast
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) Outgoing connection
to tcp4://211.19.53.201:18217 configured.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) hastd_secondary
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) calling init_local()
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) init_local
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) Obtained info about
/dev/ad4p4.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) Locked /dev/ad4p4.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) inside metadata.c
res->hr_role !=HAST_ROLE_PRIMAR
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) inside mettadata
secondary_localcnt: 0 secondary_remotecnt: 0
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) calling init_remote()
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) init_remote()
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) humhum secondary
local 0: secondary remote 0
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) init
hr_secondary_remotecnt: 0 hr_primary_remotecnt: 0
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) recv: Taking free
request.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) disk: Taking request.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) disk: No requests,
waiting.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) recv: (0x8013ea2e0)
Got request.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) send: Taking request.
Jul 2 10:24:25 fw01B hastd: [zfshast] (secondary) send: No requests,
waiting.
Jul 2 10:24:26 fw01B hastd: [zfshast] (secondary) disk: (0x8013ea2e0)
Moving request to the send queue.
Jul 2 10:24:26 fw01B hastd: [zfshast] (secondary) disk: Taking request.
Jul 2 10:24:26 fw01B hastd: [zfshast] (secondary) disk: No requests,
waiting.
hastctl role seconary xxx should reset some value of master to backup?
Hope this logs can help you ! If you need to make me debug bit more,
give me some idea to check!
Thanks
Hiroshi Katayama
More information about the freebsd-fs
mailing list