HAST instability
Daniel Kalchev
daniel at digsys.bg
Tue May 31 15:09:12 UTC 2011
On 31.05.11 17:08, Mikolaj Golub wrote:
> As I wrote privately, it would be nice to see both netstat and hast logs (from both nodes) for the same rather long period, when several cases occured. It would be good to place them somewere on web so other guys could access them too, as I will be offline for 7-10 days and will not be able to help you until I am back.
The test finished running for almost three hours, and so here is the
collected data:
(for the duration of test, on the secondary node)
systat -if
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average
Interface Traffic Peak Total
lo0 in 0.000 KB/s 0.000 KB/s 1.126 KB
out 0.000 KB/s 0.000 KB/s 1.126 KB
ix1 in 0.003 KB/s 230.590 MB/s 614.688 GB
out 0.054 KB/s 7.425 MB/s 19.910 GB
igb0 in 0.025 KB/s 3.636 KB/s 566.897 KB
out 0.072 KB/s 4.296 KB/s 1.091 MB
The primary node is b1a, the secondary node is b1b.
kernel (built just after csup update):
FreeBSD b1a 8.2-STABLE FreeBSD 8.2-STABLE #1: Mon May 30 14:17:50 EEST
2011 root at b1a:/usr/obj/usr/src/sys/GENERIC amd64
from primary
messages: http://news.digsys.bg/~admin/hast/test31may/b1a-messages
netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat -in
netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat-s
from secondary
messages: http://news.digsys.bg/~admin/hast/test31may/b1b-messages
netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat -in
netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat-s
> DK> One additional note: while playing with this setup, I tried to
> DK> simulate local disk going away in the hope HAST will switch to using
> DK> the remote disk. Instead of asking someone at the site to pull out the
> DK> drive, I just issued on the primary
>
> DK> hastctl role init data0
>
> DK> which resulted in kernel panic. Unfortunately, there was no sufficient
> DK> dump space for 48GB. I will re-run this again with more drives for the
> DK> crash dump. Anything you want me to look for in particular? (kernels
> DK> have no KDB compiled in yet)
>
> Well, removing physical disk (device /dev/gpt/data0 consumed by hastd
> dissapears) and switching a resource to init role (devive /dev/hast/data0
> consumed by FS dissapears) are two different things. Sure you should not
> normally change the resource role (destroy hast device) before unmounting
> (exporting) FS.
Then how do I proceed with a failed drive? Or a flaky drive that is
still visible to the OS, that I want to remove from HAST and replace
with a different one? How do I ask HAST to switch I/O to the secondary?
Is there other way to get a drive out of HAST? In any case, even if this
is not allowed operation, it should not panic.
I am now going to reboot and run the same tests without checksums.
Daniel
More information about the freebsd-stable
mailing list