proliant server lockups with freebsd-amd64-stable (2010-03-10)
Kai Gallasch
gallasch at free.de
Fri Mar 12 21:27:04 UTC 2010
Am Fri, 12 Mar 2010 20:38:31 +0200
schrieb Alexander Motin <mav at FreeBSD.org>:
> Pawel Jakub Dawidek wrote:
> > On Thu, Mar 11, 2010 at 01:39:16PM +0100, Kai Gallasch wrote:
> >> I have some trouble with an opteron server locking up
> >> spontaneously. It looses all networks connectivity and even
> >> through console I can get no shell.
> >>
> >> Lockups occur mostly under disk load (periodic daily, bacula backup
> >> running, make buildworld/buildkernel) and I can provoke them
> >> easily.
> > [...]
> >> 4 0 0 0 LL *cissmtx 0xffffff04ed820c00
> >> [g_down]
> > [...]
> >> 100046 L *cissmtx 0xffffff04ed820c00
> >> [irq257: ciss0]
> > [...]
> >
> > I was analizing similar problem as potential ZFS bug. It turned out
> > to be bug in ciss(4) and I believe mav@ (CCed) has fix for that.
>
> That my patch is already at 8-STABLE since r204873 of 2010-03-08. Make
> sure you have it.
Updating collection src-all/cvs
..
..
Edit src/sys/dev/ciss/ciss.c
Edit src/sys/dev/ciss/cissvar.h
Didn't have it! Must have been just a few hours I missed it,
when I built the last kernel. So I will rebuild my kernel and
give it a spin later on.
> In this case trap stopped process at ciss_get_request(), which indeed
> called holding cissmtx lock. But there is no place to sleep or loop
> there, so may be it was just spontaneous. With bugs I was fixing there
> was a chance to loop indefinitely between ciss and CAM on resource
> constraint. That increases chance for such situation to be caught.
>
> You may try also look what's going on with `top -HS` and `systat -vm
> 1`.
FYI. Without your patch of ciss.c and cissvar.h a lockup with top -HS
and systat -vm running gives following information below. Is there a
pattern visible relating to your patch of the ciss driver?
Kai.
------ vmstat -vm ------
3 users Load 0.21 0.36 0.45 Mar 12 21:47
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 2805456 40804 62699320 79936 12358k count
All 6182560 95796 1136820k 212452 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt cow 16018 total
491 533 28 576 19 179 174 zfod atkbd0 1
ozfod uart0 irq4
12.9%Sys 12.5%Intr 0.1%User 0.0%Nice 74.5%Idle %ozfod ata0 irq14
| | | | | | | | | | | daefr uhci0 45
======+++++++ prcfr 2000 cpu0: time
87 dtbuf totfr 19 bce0 256
Namei Name-cache Dir-cache 100000 desvn react ciss0 257
Calls hits % hits % 40811 numvn pdwak 2000 cpu1: time
17300 frevn pdpgs 2000 cpu4: time
intrn 2000 cpu5: time
Disks da0 da1 da2 da3 da4 pass0 pass1 4995516 wire 2000 cpu6: time
KB/t 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2593276 act 1999 cpu7: time
tps 0 0 0 0 0 0 0 369560 inact 2000 cpu3: time
MB/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9568 cache 2000 cpu2: time
%busy 0 0 0 0 0 0 0 12348752 free
1252272 buf
----------- top -HS ------------
last pid: 42561; load averages: 0.35, 0.38, 0.46 up 0+11:24:36 21:53:39
658 processes: 13 running, 623 sleeping, 21 waiting, 1 lock
CPU: 0.6% user, 0.0% nice, 12.6% system, 25.0% interrupt, 61.8% idle
Mem: 2559M Active, 363M Inact, 4892M Wired, 9548K Cache, 1223M Buf, 12G Free
Swap: 21G Total, 21G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 128K CPU3 3 672:22 100.00% {idle: cpu3}
11 root 171 ki31 0K 128K CPU1 1 663:50 100.00% {idle: cpu1}
11 root 171 ki31 0K 128K RUN 4 649:18 100.00% {idle: cpu4}
12 root -32 - 0K 384K CPU7 0 6:38 100.00% {swi4: clock}
4 root -8 - 0K 16K CPU5 5 1:16 100.00% g_down
12 root -64 - 0K 384K CPU2 0 1:15 100.00% {swi2: cambio}
11 root 171 ki31 0K 128K CPU6 6 672:13 97.27% {idle: cpu6}
11 root 171 ki31 0K 128K CPU0 0 622:18 96.29% {idle: cpu0}
11 root 171 ki31 0K 128K RUN 7 676:57 10.99% {idle: cpu7}
2046 zope1 46 0 468M 379M ucond 6 7:30 1.76% {python}
11 root 171 ki31 0K 128K RUN 5 663:27 0.00% {idle: cpu5}
11 root 171 ki31 0K 128K RUN 2 656:03 0.00% {idle: cpu2}
12 root -68 - 0K 384K WAIT 0 14:54 0.00% {irq256: bce0}
2046 zope1 46 0 468M 379M ucond 4 6:02 0.00% {python}
1719 zope2 48 0 263M 178M ucond 3 2:00 0.00% {python2.4}
1724 zope2 46 0 321M 235M ucond 5 1:52 0.00% {python2.4}
6314 root 45 0 10432K 2564K ttyin 4 1:36 0.00% systat
3 root -8 - 0K 16K - 2 1:29 0.00% g_up
6301 zope4 46 0 311M 225M ucond 1 1:28 0.00% {python2.4}
2320 zope5 44 0 281M 197M ucond 1 1:22 0.00% {python2.4}
20 root 44 - 0K 16K zio->i 6 1:21 0.00% syncer
6301 zope4 46 0 311M 225M ucond 5 1:19 0.00% {python2.4}
1729 zope6 44 0 311M 228M ucond 4 1:15 0.00% {python2.4}
2320 zope5 45 0 281M 197M ucond 4 1:13 0.00% {python2.4}
2320 zope5 50 0 281M 197M ucond 7 1:08 0.00% {python2.4}
6301 zope4 44 0 311M 225M ucond 4 1:02 0.00% {python2.4}
5138 dave 44 0 341M 248M select 4 0:58 0.00% {initial thread}
1729 zope6 45 0 311M 228M ucond 6 0:56 0.00% {python2.4}
8158 19421 44 0 27644K 7560K select 4 0:52 0.00% centerim
1743 zope7 44 0 287M 201M zio->i 1 0:51 0.00% {python}
2046 zope1 44 0 468M 379M select 1 0:48 0.00% {initial thread}
19943 19420 44 0 31740K 13044K select 4 0:46 0.00% centerim
6301 zope4 44 0 311M 225M ucond 0 0:40 0.00% {python2.4}
1743 zope7 44 0 287M 201M select 0 0:38 0.00% {initial thread}
2052 zope8 44 0 300M 212M select 3 0:37 0.00% {initial thread}
1724 zope3 46 0 321M 235M ucond 4 0:33 0.00% {python2.4}
2320 zope5 44 0 281M 197M ucond 1 0:31 0.00% {python2.4}
2052 zope8 45 0 300M 212M ucond 7 0:30 0.00% {python}
2320 zope5 44 0 281M 197M select 6 0:29 0.00% {initial thread}
12 root -64 - 0K 384K *cissm 1 0:28 0.00% {irq257: ciss0}
6301 zope4 44 0 311M 225M select 6 0:28 0.00% {initial thread}
0 root -16 0 0K 816K - 0 0:25 0.00% {spa_zio}
2052 zope8 47 0 300M 212M ucond 2 0:24 0.00% {python}
1729 zope6 44 0 311M 228M select 3 0:24 0.00% {initial thread}
1719 zope2 44 0 263M 178M select 1 0:23 0.00% {initial thread}
1724 zope3 44 0 321M 235M select 1 0:23 0.00% {initial thread}
12 root -48 - 0K 384K WAIT 0 0:22 0.00% {swi0: uart uart}
1734 zope9 44 0 249M 163M select 4 0:21 0.00% {initial thread}
0 root -16 0 0K 816K - 4 0:16 0.00% {spa_zio_6}
0 root -16 0 0K 816K - 3 0:16 0.00% {spa_zio_4}
0 root -16 0 0K 816K - 0 0:16 0.00% {spa_zio_1}
0 root -16 0 0K 816K - 4 0:16 0.00% {spa_zio_2}
0 root -16 0 0K 816K - 1 0:16 0.00% {spa_zio_5}
0 root -16 0 0K 816K - 2 0:16 0.00% {spa_zio_0}
0 root -16 0 0K 816K - 1 0:16 0.00% {spa_zio_7}
0 root -16 0 0K 816K - 5 0:16 0.00% {spa_zio_3}
4288 mysql 44 0 62460K 21608K select 4 0:16 0.00% {mysqld}
59 root -8 - 0K 92K tx->tx 0 0:12 0.00% {txg_thread_enter}
6129 mysql 44 0 62788K 23420K select 0 0:12 0.00% {mysqld}
59 root -8 - 0K 92K vgeom: 5 0:09 0.00% {vdev da2}
59 root -8 - 0K 92K vgeom: 6 0:09 0.00% {vdev da3}
4288 mysql 44 0 62460K 21608K select 0 0:08 0.00% {mysqld}
6129 mysql 44 0 62788K 23420K select 0 0:08 0.00% {mysqld}
13 root 44 - 0K 16K - 0 0:08 0.00% yarrow
0 root -16 0 0K 816K - 0 0:08 0.00% {spa_zio_6}
0 root -16 0 0K 816K - 7 0:08 0.00% {spa_zio_5}
0 root -16 0 0K 816K - 5 0:08 0.00% {spa_zio_2}
0 root -16 0 0K 816K - 2 0:08 0.00% {spa_zio_4}
0 root -16 0 0K 816K - 3 0:08 0.00% {spa_zio_3}
0 root -16 0 0K 816K - 4 0:07 0.00% {spa_zio_1}
0 root -16 0 0K 816K - 0 0:07 0.00% {spa_zio_7}
0 root -16 0 0K 816K - 1 0:07 0.00% {spa_zio_0}
5099 root 44 0 11380K 4292K CPU4 4 0:07 0.00% top
12 root -44 - 0K 384K WAIT 0 0:07 0.00% {swi1: netisr 0}
4864 news 45 0 24964K 12108K nanslp 6 0:06 0.00% perl
5138 dave 76 0 341M 248M ucond 7 0:05 0.00% {python}
4863 news 76 0 8260K 1920K wait 5 0:05 0.00% sh
5155 root 44 0 442M 6592K kqread 1 0:04 0.00% {cache-kqueue}
1734 zope9 44 0 249M 163M ucond 4 0:04 0.00% {python2.4}
4862 news 44 0 139M 43180K select 6 0:04 0.00% innd
8173 19421 44 0 30928K 8496K select 4 0:04 0.00% irssi
More information about the freebsd-stable
mailing list