Panics on large writes to NFS mounted FAT filesystems

Sat May 31 07:41:40 PDT 2003

Hello,

	A 4.8-R server NFS-exports its fat32 filesystem "/E" (called
this way on both hosts) to a 4.8-R client through a 100Mbps network
(FWIW, those are the only hosts on the network). Any large write (such
as  "cat /dev/zero > /E/zero") results in 10-30M being written to the
server, and then the server page faults. The fat32 filesystem is clean
and has 10G empty before the killer write, but afterwards it is
(naturally) damaged.

	It does not seem to be a hardware problem, because just the same
transfer made by netcat listening/writing to /E/zero on the server and
sending on the client does not cause any problem, and writing to a
UFS share on the same server does not, either.

	The kernel is custom with nfs and fat compiled in (the problem
is reproducible under GENERIC kernel, and the results, including the
trace, are quite alike). The fault message is as follows:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x1
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc015d36f
stack pointer           = 0x10:0xccf70d80
frame pointer           = 0x10:0xccf70d9c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 102 (nfsd)
interrupt mask          = net tty bio cam
trap number             = 12
panic: page fault
syncing disks... panic: lockmgr: non-zero exclusive count
Uptime: 3m0s

	This is the kernel backtrace (AFAICT the relevant part is from
#14 down):

#0  0xc0161c9a in dumpsys ()
#1  0xc0161a6b in boot ()
#2  0xc0161e90 in poweroff_wait ()
#3  0xc015c3c9 in lockmgr ()
#4  0xc018c934 in vop_stdlock ()
#5  0xc0217f65 in ufs_vnoperate ()
#6  0xc0196a89 in vn_lock ()
#7  0xc018f85b in vget ()
#8  0xc021016f in ffs_sync ()
#9  0xc0191887 in sync ()
#10 0xc0161806 in boot ()
#11 0xc0161e90 in poweroff_wait ()
#12 0xc028500a in trap_fatal ()
#13 0xc0284cdd in trap_pfault ()
#14 0xc02848c7 in trap ()
#15 0xc015d36f in malloc ()
#16 0xc01dd6aa in nfsrv_dorec ()
#17 0xc01e1bd0 in nfssvc_nfsd ()
#18 0xc01e1863 in nfssvc ()
#19 0xc028522e in syscall2 ()
#20 0xc0278da5 in Xint0x80_syscall ()
#21 0x804813e in ?? ()

	Is this a known problem? Comments? (Dont-do-it-then is all right,
but that's not interesting...)

				DoubleF