NFS-related hang in 5.4?

Sun Jun 19 18:06:07 GMT 2005

On Sun, 19 Jun 2005, Eirik Øverby wrote:

> when doing large file transfers (backing up jails using tar+gzip to a 
> neighboring server), NFS has a tendency to lock up on me. This usually 
> happens after quite a while - like a few hours or so. Also, before the 
> hang, performance is generally bad.

Hmm.  Looks like a bug in dummynet.  ipfw should not be directly 
re-injecting UDP traffic back into the input path from an outbound path, 
or it risks re-entering, generating lock order problems, etc. It should be 
getting dropped into the netisr queue to be processed from the netisr 
context.

Is it possible to configure dummynet out of your configuration, and see if 
the problem goes away?

Robert N M Watson

>
> KDB trace:
>
> db> trace
> Tracing pid 56 tid 100064 td 0xc1a18600
> kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
> siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7
> siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
> intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at 
> intr_execute_handlers+0x88
> lapic_handle_intr(34) at lapic_handle_intr+0x3a
> Xapic_isr1() at Xapic_isr1+0x33
> --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 ---
> _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
> udp_input(c2d40000,14,c1a99000,1,0) at udp_input+0x257
> ip_input(c2d40000,0,0,0,0) at ip_input+0x590
> transmit_event(c1c64100,20940000,0,c1d58a80,7f4220) at transmit_event+0x107
> ready_event_wfq(c1c64100,20940000,0,c1d58a80,c06d860a) at 
> ready_event_wfq+0x511
> dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
> ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
> pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138
> ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
> udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
> udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
> sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
> nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
> nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
> nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at 
> nfs_writerpc+0x2a0
> nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
> nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
> fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 ---
>
> I cannot seem to kill process 56 (nfsiod), so I have to reset the box.
>
> Anyone got a clue? What can I do to ease debugging here? Next time it happens 
> I can probably make a dump, at least I will have a debug kernel running then.
>
> /Eirik
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>