Issue with hast replication
Mikolaj Golub
to.my.trociny at gmail.com
Mon Mar 12 18:33:09 UTC 2012
On Mon, 12 Mar 2012 15:31:27 +0100 Phil Regnauld wrote:
PR> Phil Regnauld (regnauld) writes:
>>
>> 7) ktrace on the destination dd:
>>
>> fstat(0,{ mode=p--------- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
>> lseek(0,0x0,SEEK_CUR) ERR#29 'Illegal seek'
PR> [...]
>> Illegal seek, eh ? Any clues ?
>>
>> The boxes are identical (HP DL380 G6), though the RAM config is different.
>>
>> Summary:
>>
>> - ssh works fine
>> - h1 zvol to h2 zvol over ssh fails
>> - h1 zvol to h2 /tmp/x over ssh is fine
>> - h2 /dev/zero locally to h2 zvol is fine
>> - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...
PR> A few more data points: dd from a local zvol to a local zvol on either
PR> machine works fine.
PR> Using nc instead of ssh, this time it's the sender nc dying:
PR> ktrace on the sender:
PR> 47704 nc CALL write(0x3,0x7fffffff5450,0x800)
PR> 47704 nc RET write -1 errno 32 Broken pipe
PR> 47704 nc PSIG SIGPIPE SIG_DFL code=0x10006
PR> truss on the sender:
PR> poll({3/POLLIN 0/POLLIN},2,-1) = 2 (0x2)
PR> read(3,0x7fffffff5450,2048) ERR#54 'Connection reset by peer'
PR> close(3) = 0 (0x0)
PR> On tcpdump, I do see the receiver send a FIN when using nc.
PR> When using ssh, the sender is sending the FIN.
PR> Anything else I can look for ?
It looks like in the case of hastd this was send(2) who returned ENOMEM, but
it would be good to check. Could you please start synchronization again,
ktrace primary worker process when ENOMEM errors are observed and show output
here?
If it is send(2) who fails then monitoring netstat and network driver
statistics might be helpful. Something like
netstat -nax
netstat -naT
netstat -m
netstat -nid
sysctl -a dev.<nic>
And may be
vmstat -m
vmstat -z
--
Mikolaj Golub
More information about the freebsd-stable
mailing list