NFS-related hang in 5.4?

Mon Jun 20 00:12:31 GMT 2005

On 19. jun. 2005, at 20.06, Robert Watson wrote:

>
> On Sun, 19 Jun 2005, Eirik Øverby wrote:
>
>
>> when doing large file transfers (backing up jails using tar+gzip  
>> to a neighboring server), NFS has a tendency to lock up on me.  
>> This usually happens after quite a while - like a few hours or so.  
>> Also, before the hang, performance is generally bad.
>>
>
> Hmm.  Looks like a bug in dummynet.  ipfw should not be directly re- 
> injecting UDP traffic back into the input path from an outbound  
> path, or it risks re-entering, generating lock order problems, etc.  
> It should be getting dropped into the netisr queue to be processed  
> from the netisr context.

This problem would exist across all 5.4 installations, both i386 and  
amd64? Would it depend on heavy load, or could it theoretically  
happen at any time when there's traffic? All three of my fbsd5  
servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing  
random hangs with ~a few weeks between, impression is that if running  
single-cpu mode they are all stable. All using dummynet in a  
comparable manner. Ideas?

> Is it possible to configure dummynet out of your configuration, and  
> see if the problem goes away?

I'm running a test right now, will let you know in the morning.

>
> Robert N M Watson
>
>
>>
>> KDB trace:
>>
>> db> trace
>> Tracing pid 56 tid 100064 td 0xc1a18600
>> kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
>> siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1 
>> +0xe7
>> siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
>> intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at  
>> intr_execute_handlers+0x88
>> lapic_handle_intr(34) at lapic_handle_intr+0x3a
>> Xapic_isr1() at Xapic_isr1+0x33
>> --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp =  
>> 0xd5480818 ---
>> _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
>> udp_input(c2d40000,14,c1a99000,1,0) at udp_input+0x257
>> ip_input(c2d40000,0,0,0,0) at ip_input+0x590
>> transmit_event(c1c64100,20940000,0,c1d58a80,7f4220) at  
>> transmit_event+0x107
>> ready_event_wfq(c1c64100,20940000,0,c1d58a80,c06d860a) at  
>> ready_event_wfq+0x511
>> dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
>> ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
>> pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at  
>> pfil_run_hooks+0x138
>> ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
>> udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
>> udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
>> sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
>> nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
>> nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
>> nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at  
>> nfs_writerpc+0x2a0
>> nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
>> nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
>> fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
>> fork_trampoline() at fork_trampoline+0x8
>> --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 ---
>>
>> I cannot seem to kill process 56 (nfsiod), so I have to reset the  
>> box.
>>
>> Anyone got a clue? What can I do to ease debugging here? Next time  
>> it happens I can probably make a dump, at least I will have a  
>> debug kernel running then.
>>
>> /Eirik
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable- 
>> unsubscribe at freebsd.org"
>