Issue with hast replication

Tue Mar 13 20:43:23 UTC 2012

On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote:

 PR> Mikolaj Golub (to.my.trociny) writes:
 >> 
 >> It looks like in the case of hastd this was send(2) who returned ENOMEM, but
 >> it would be good to check. Could you please start synchronization again,
 >> ktrace primary worker process when ENOMEM errors are observed and show output
 >> here?

 PR>     Ok, took a little while, as running ktrace on the hastd does slow it down
 PR>     significantly, and the error normally occurs at 30-90 sec intervals.

 PR>        0x0f90 b2f3 3ad5 e657 7f0f 3e50 698f 5deb 12af  |..:..W..>Pi.]...|
 PR>        0x0fa0 740d c343 6e80 75f3 e1a7 bfdf a4c1 f6a6  |t..Cn.u.........|
 PR>        0x0fb0 ea85 655d e423 bd5e 42f7 7e9a 05d2 363a  |..e].#.^B.~...6:|
 PR>        0x0fc0 025e a7b5 0956 417c f31c a6eb 2cd9 d073  |.^...VA|....,..s|
 PR>        0x0fd0 2589 e8c0 d76a 889f 8345 eeaf f2a0 c2d6  |%....j...E......|
 PR>        0x0fe0 b89e aaef fee2 6593 e515 7271 88aa cf66  |......e...rq...f|
 PR>        0x0ff0 d272 411a 7289 d6c9 6643 bdbe 3c8c 8ae8  |.rA.r...fC..<...|
 PR>  50959 hastd    RET   sendto 32768/0x8000
 PR>  50959 hastd    CALL  sendto(0x6,0x8024bf000,0x8000,0x20000<MSG_NOSIGNAL>,0,0)
 PR>  50959 hastd    RET   sendto -1 errno 12 Cannot allocate memory
 PR>  50959 hastd    CALL  clock_gettime(0xd,0x7fffff3f86f0)
 PR>  50959 hastd    RET   clock_gettime 0
 PR>  50959 hastd    CALL  getpid
 PR>  50959 hastd    RET   getpid 50959/0xc70f
 PR>  50959 hastd    CALL  sendto(0x3,0x7fffff3f8780,0x84,0,0,0)
 PR>  50959 hastd    GIO   fd 3 wrote 132 bytes
 PR>        "<27>Mar 12 23:42:43 hastd[50959]: [hvol] (primary) Unable to sen\
 PR>         d request (Cannot allocate memory): WRITE(8626634752, 131072)."  
 PR>  50959 hastd    RET   sendto 132/0x84
 PR>  50959 hastd    CALL  close(0x7)
 PR>  50959 hastd    RET   close 0

Ok. So it is send(2). I suppose the network driver could generate the
error. Did you tell what network adaptor you had?

 >> If it is send(2) who fails then monitoring netstat and network driver
 >> statistics might be helpful. Something like
 >> 
 >> netstat -nax
 >> netstat -naT
 >> netstat -m
 >> netstat -nid

 PR>     I could run this in a loop, but that would be a lot of data, and might
 PR>     not be appropriate to paste here.

 PR>     I didn't see any obvious errors, but I'm not sure what I'm looking for.
 PR>     netstat -m didn't show anything close to running out of buffers or
 PR>     clusters...

 >> sysctl -a dev.<nic>
 >>
 >> And may be
 >> 
 >> vmstat -m
 >> vmstat -z

 PR>     No obvious errors there either, but again what should I look out for ?

I would look at sysctl -a dev.<nic> statistics and try to find if there is correlation
between ENOMEM failures and growing of error counters.

 PR>     In the meantime, I've also experimented with a few different scenarios, and
 PR>     I'm quite puzzled.

 PR>     For instance, I configured one of the other gigabit cards on each host to
 PR>     provide a dedicated replication network. The main difference is that up
 PR>     until now this has been running using tagged vlans. To be on the safe side,
 PR>     I decided to use an untagged interface (the second gigabit adapter in each
 PR>     machine).
 PR>     
 PR>     Here's where I observed, and it is very odd:
 PR>     
 PR>     - doing a dd ... | ssh dd fails in the same fashion as before

 PR>     - I created a second zvol + hast resource of just 1 GB, and it replicated
 PR>       without any problems, peaking at 75 MB / sec (!) - maybe 1GB is too small
 PR>       ?
 PR>     
 PR>       (side note: hastd doesn't pick up configuration changes even with SIGHUP,
 PR>        which makes it hard to provision new resources on the fly) 

 PR>     - I restarted replication on the 100 G hast resource, and it's currently
 PR>       replicating without any problems over the second ethernet, but it's
 PR>       dragging along at 9-10 MB/sec, peaking at 29 MB/sec occasionally.

Looking at buffer usage from 'netstat -nax' output ran during synchronization
(on both hosts) could provide useful info where the bottleneck is. top -HS
output might be useful too.

 PR>       Earlier, I was observing peaks at 65-70 MB sec in between failures...

 PR>     So I don't really know what to conclude :-| 

-- 
Mikolaj Golub