HAST instability
Mikolaj Golub
trociny at freebsd.org
Fri Jun 10 17:05:49 UTC 2011
On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote:
DK> Well, apparently my HAST joy was short. On a second run, I got stuck with
DK> Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive
DK> reply header: Operation timed out.
DK> on the primary. No messages on the secondary.
DK> On primary:
DK> # netstat -an | grep 8457
DK> tcp4 0 0 10.2.101.11.42659 10.2.101.12.8457 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.11.62058 10.2.101.12.8457 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.11.34646 10.2.101.12.8457 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.11.11419 10.2.101.12.8457 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.11.37773 10.2.101.12.8457 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.11.21911 10.2.101.12.8457 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.11.40169 10.2.101.12.8457 CLOSE_WAIT
DK> tcp4 0 97749 10.2.101.11.44360 10.2.101.12.8457 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.11.8457 *.* LISTEN
DK> on secondary
DK> # netstat -an | grep 8457
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.42659 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.62058 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.34646 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.11419 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.37773 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.21911 CLOSE_WAIT
DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.40169 FIN_WAIT_2
DK> tcp4 66415 0 10.2.101.12.8457 10.2.101.11.44360 FIN_WAIT_2
DK> tcp4 0 0 10.2.101.12.8457 *.* LISTEN
DK> on primary
DK> # hastctl status
DK> data0:
DK> role: primary
DK> provname: data0
DK> localpath: /dev/gpt/data0
DK> extentsize: 2097152 (2.0MB)
DK> keepdirty: 64
DK> remoteaddr: 10.2.101.12
DK> sourceaddr: 10.2.101.11
DK> replication: fullsync
DK> status: complete
DK> dirty: 0 (0B)
DK> data1:
DK> role: primary
DK> provname: data1
DK> localpath: /dev/gpt/data1
DK> extentsize: 2097152 (2.0MB)
DK> keepdirty: 64
DK> remoteaddr: 10.2.101.12
DK> sourceaddr: 10.2.101.11
DK> replication: fullsync
DK> status: complete
DK> dirty: 0 (0B)
DK> data2:
DK> role: primary
DK> provname: data2
DK> localpath: /dev/gpt/data2
DK> extentsize: 2097152 (2.0MB)
DK> keepdirty: 64
DK> remoteaddr: 10.2.101.12
DK> sourceaddr: 10.2.101.11
DK> replication: fullsync
DK> status: complete
DK> dirty: 6291456 (6.0MB)
DK> data3:
DK> role: primary
DK> provname: data3
DK> localpath: /dev/gpt/data3
DK> extentsize: 2097152 (2.0MB)
DK> keepdirty: 64
DK> remoteaddr: 10.2.101.12
DK> sourceaddr: 10.2.101.11
DK> replication: fullsync
DK> status: complete
DK> dirty: 0 (0B)
DK> Sits in this state for over 10 minutes.
DK> Unfortunately, no KDB in kernel. Any ideas what other to look for?
Could you please try this patch?
http://people.freebsd.org/~trociny/hastd.no_shutdown.patch
After patching you need to rebuild hastd and restart it (I expect only on
secondary is enough but it is better to do this on both nodes). No server
restart is needed.
--
Mikolaj Golub
More information about the freebsd-stable
mailing list