Strange Behaviour with Jail

Wed Jan 21 05:40:07 PST 2004

Hello people, good morning.

In around two weeks i have noticed some strange behaviour with freebsd 
releng_5 and Jails. I know we may be wrong at first, 'cos our enviroment 
is not the most recent as it could. Actually our jail systems is build 
on 5.1-RELEASE-p10. Sure we plan rebuild it on top of RELENG_5_2, but 
now please, check the enviroment.

Both, jails and the main system were built on 5.1-RELEASE-p10. The 
actual testing enviroment is made up of 4 jails. My choice to limit disk 
space usage was to set it based on vnodes, currently it is up as

/dev/md38 on /usr/jail/69.59.167.38 (ufs, local)
devfs on /usr/jail/69.59.167.38/dev (devfs, local)
procfs on /usr/jail/69.59.167.38/proc (procfs, local)
/dev/md34 on /usr/jail/69.59.167.34 (ufs, local)
devfs on /usr/jail/69.59.167.34/dev (devfs, local)
procfs on /usr/jail/69.59.167.34/proc (procfs, local)
/dev/md41 on /usr/jail/69.59.167.41 (ufs, local)
devfs on /usr/jail/69.59.167.41/dev (devfs, local)
procfs on /usr/jail/69.59.167.41/proc (procfs, local)
/dev/md33 on /usr/jail/69.59.167.33 (ufs, local)
devfs on /usr/jail/69.59.167.33/dev (devfs, local)
procfs on /usr/jail/69.59.167.33/proc (procfs, local)

The mounting point as well as the vnode image files are under

/dev/ar0s1h on /usr/jail (ufs, local, soft-updates)

which is UFS2 FS, built with default newfs args (16K block, 2K fragment 
size), and the vnode images are 1024k made built from dd(1) -- dd 
if=/dev/zero of=default.vnode bs=1024k...).

The enviroment used to run better on 4.8-STABLE, at least with the same 
configuration as above, and about 4 to 12 jails. The annoying behaviour 
is that, suddenly (ie, when a jail may have a few mail load, as about 2k 
simultaneous messages) the whole system gets unavailable. It does not 
crash, but get really slow; the load avg, in a known slow circunstance, 
that ill put the data now, is almost about 0.09, 0.03, 0.01

vmstat -w1 gives me:

  procs      memory      page                    disks     faults      cpu
  r b w     avm    fre  flt  re  pi  po  fr  sr ad0 ad1   in   sy  cs us 
sy id
  1 95 0  586800  41540 2258   0   0   0 218  16   0   0  361    0 433 
1  4 95
  0 95 0  586800  41540    2   0   0   0   2   0   0   0  340    0 256 
0  3 97
  0 95 0  586800  41540    0   0   0   0   0   0   0   0  337    0 254 
0  2 98
  0 95 0  587916  41396   29   0   0   0  13   0   0   0  340    0 261 
0  3 97
  0 95 0  587916  41396    0   0   0   0   0   0   0   0  340    0 258 
0  3 97

It is the most significant behaviour i have fount. Under low load 
circunstances, the "blocked for resources" column is at least about 20. 
Under a little bit more load, it gets easily up to 40, 50, and when it 
gets ~ 90, system almost stops responding to user/application commands 
and requests.

systat -vm gives me:

     2 users    Load  0.08  0.03  0.00                  Jan 21 11:06

Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
         Tot   Share      Tot    Share    Free         in  out     in  out
Act   97212   15228   454132    39168   54964 count
All 1021520   29528   776344    64828         pages
                                                        23 zfod   Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt     12 cow     244 total
           25153       352   51  286  363    5   46 128016 wire 
stray 0
                                                    120712 act 
stray 6
  0.3%Sys   0.2%Intr  0.2%User  0.0%Nice 99.4%Idl   720028 inact 
stray 7
|    |    |    |    |    |    |    |    |    |      53156 cache 
npx0 13
                                                      1808 free     16 
fxp0 9
                                                           daefr 
ata0 14
Namei         Name-cache    Dir-cache                  21 prcfr 
ata1 15
     Calls     hits    %     hits    %                     react 
atapci1 1
      1157     1148   99                                   pdwak 
atkbd0 1
                                                           pdpgs 
ppc0 7
Disks   ad0   ad1   ad4   ad6   ar0   fd0                 intrn   100 clk 0
KB/t   0.00  0.00  0.00  0.00 24.00  0.00          114160 buf     128 rtc 8
tps       0     0     0     0     0     0              87 dirtybuf
MB/s   0.00  0.00  0.00  0.00  0.01  0.00           70150 desiredvnodes
% busy    0     0     0     0     0     0           15102 numvnodes

where i dont find anything suspect to be wrong (even interrupts), and 
top(1) shows CPU states:  0.0% user,  0.0% nice,  0.8% system,  5.6% 
interrupt, 93.7% idle

There are about 277 jailed processes. Under, what i would call, denial 
of service circunstances (where everything gets as slow as possible) the 
disk access is still ridiculously low:

serv1001# iostat -w1
       tty             ad0              ad1              ad4             cpu
  tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
    0   26 114.02   8  0.92  18.33   1  0.02   0.00   0  0.00   0  0  1 
  0 98
    0  230  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   1  0  1  0 98
    0   76  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   7  0  3  0 90
    0   76  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  0 
0 100

About the hardware:

Disks:

ad0: 76319MB <WDC WD800BB-00DKA0> [155061/16/63] at ata0-master UDMA33
ad1: 76319MB <WDC WD800BB-00CAA1> [155061/16/63] at ata0-slave UDMA33
ad4: 76319MB <WDC WD800BB-32CCB0> [155061/16/63] at ata2-master UDMA100
ad6: 76319MB <WDC WD800BB-32CCB0> [155061/16/63] at ata3-master UDMA100
ar0: 152638MB <ATA SPAN array> [19458/255/63] status: READY subdisks:

(ad0 and ad1 are backup only)

NIC is an Intel 82557/8/9 EtherExpress Pro/100(B) Ethernet with about 20 
IP addresses (aliases) for jails (currently only 4 is used);

real memory  = 1073741824 (1024 MB);
4G swap (2 per disk);

CPU: Intel(R) Pentium(R) III CPU family      1133MHz (1132.79-MHz 
686-class CPU)

The custom kernel has device polling, HZ frequency on "1000" (i have 
already lowed it at the default 100, but the behaviour did not change), 
no TCP/IP tuning, customized at most on hardware support (only what is 
needed) and the only performance customization besides device polling is 
maxuser set to 1024. No debug.

The jail systems run essencially qmail/vpopmail, Apache13+SSL+PHP, MySQL 
and pure-ftpd.

I have read -- and it was not only once -- that setting jail as vnode is 
not recomended. I consider having one partition per jail, but i am not 
sure at the moment if it will solve the main problem. Sure, still need 
to test under 5.2-R, what will soon be done, but are those changes 
relevant to an enviroment with only 4 jails?

Any ideas on what i could set better, and what might be wrong? Any other 
data I should send about system behaviour?

BTW, the jailstatemod third party kernel module by Bjoern A. Zeeb is 
been used (it just does not allow the jail enviroment to "see" the other 
disk devices, slices and such -- only its own).

Thanks for yer attention.

--
Patrick Tracanelli