Swapfile problem in 6?

Fri Jan 6 09:24:50 PST 2006

On  2 Jan, Lars Kristiansen wrote:
>> Attempting to catch up with my backlog of unread email, only 12K unread
>> messages to go ...
>>
>> On 24 Nov, Rob wrote:
>>
>>> I have cvsup'ed the sources to STABLE as of Nov. 23rd
>>> 2005.
>>> After recompiling/installing world and debug-kernel,
>>> I again get a kernel deadlock when using swapfile:
>>>    http://surfion.snu.ac.kr/~lahaye/swapfile2.txt
>>>
>>> Previous deadlocks are still documented here
>>>    http://surfion.snu.ac.kr/~lahaye/swapfile.txt
>>>
>>> I hope this is of use for fixing this bug in 6.
>>> If further investigation is needed, then please let me
>>> know.
>>
>> This is a deadlock caused by memory exhaustion.  The pagedaemon only has
>> a limited number of bufs that it uses for writing dirty pages to swap to
>> prevent it from saturating the I/O subsystem with large numbers of
>> writes.  In this case, pagedaemon is trying to free up memory by writing
>> dirty pages, and it has used up all of its bufs and is waiting for the
>> write requests to complete and the bufs the bufs to be returned to it.
>> This isn't happening because md0 is stuck waiting for memory.  This is a
>> little bit suprising to me because it looks like writes to vnode backed
>> devices are done synchronously by default.
>>
>> If you have a chance to test this again, a stack trace of md0 in the
>> deadlock state would be interesting.  I'd like to know where md0 is
>> getting stuck.
>>
>> I wonder if pagedaemon should scan ahead and more agressively discard
>> clean pages when it has run out of bufs to write dirty pages, especially
>> in low memory situations.  Preventing the creation of more dirty pages
>> would be nice, but I don't know how to do that ...
> 
> Just in case it can help. Do not have this machine available for testing
> at the moment but this is the last debuginfo I did get from it.
> Here is a trace from a situation when a possible idle system got stuck
> during the night and db showed only one locked vnode:
> 
> db> show lockedvnods
> Locked vnodes
> 
> 0xc1309330: tag ufs, type VREG
>     usecount 1, writecount 1, refcount 154 mountedhere 0
>     flags ()
>     v_object 0xc12cb39c ref 0 pages 606
>      lock type ufs: EXCL (count 1) by thread 0xc126b900 (pid 178)
>         ino 8155, on dev ad0s1f
> db> trace 178
> Tracing pid 178 tid 100058 td 0xc126b900
> sched_switch(c126b900,0,1) at 0xc066a4db = sched_switch+0x17b
> mi_switch(1,0) at 0xc065f49e = mi_switch+0x27e
> sleepq_switch(c09b2a98,c484bacc,c065f0e3,c09b2a98,0) at 0xc0677f00 =
> sleepq_switch+0xe0
> sleepq_wait(c09b2a98,0,0,c08ad92d,37b) at 0xc0678100 = sleepq_wait+0x30
> msleep(c09b2a98,c09b2d00,244,c08adb6a,0) at 0xc065f0e3 = msleep+0x333
> vm_wait(c12cb39c,0,c08990f3,ad7,c06512a4) at 0xc07c6a71 = vm_wait+0x91
> allocbuf(c28fa9d8,4000,354000,0,354000) at 0xc06a2f89 = allocbuf+0x4e9
> getblk(c1309330,d5,0,4000,0) at 0xc06a29cb = getblk+0x4eb
> cluster_read(c1309330,10000000,0,d5,0) at 0xc06a5d65 = cluster_read+0xe5
> ffs_read(c484bc9c) at 0xc07a631f = ffs_read+0x28f
> VOP_READ_APV(c09309a0,c484bc9c) at 0xc0838aab = VOP_READ_APV+0x7b
> mdstart_vnode(c1310800,c1634294,c1310820,1,c0566e10) at 0xc056688c =
> mdstart_vnode+0xec
> md_kthread(c1310800,c484bd38,c1310800,c0566e10,0) at 0xc0566f7f =
> md_kthread+0x16f
> fork_exit(c0566e10,c1310800,c484bd38) at 0xc0645618 = fork_exit+0xa8
> fork_trampoline() at 0xc0816f3c = fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xc484bd6c, ebp = 0 ---

The md thread is stuck waiting for memory to be freed by pagedaemon.
Pagedaemon is stuck waiting for at least one of its pageout requests to
complete.  The pageout requests are probably all stuck waiting for md.

I had expected that the problem is that while pagedaemon is allowed to
dig deeper into the free page pool, I didn't think that the md thread
would be allowed to, allowing the md thread to get wedged first.  That
does not appear to be the case because the vm_page_alloc() call in
allocbuf() has the VM_ALLOC_SYSTEM flag set, which should match
vm_page_alloc()'s treatment of requests by pagedaemon.

I don't see how the md thread could be consuming a large number of
reserved pages, but it looks like that must be what is happening.