kern/58391: Trap 12 with heavy disk load on ide vinum mirror

Wed Oct 22 12:30:23 PDT 2003

>Number:         58391
>Category:       kern
>Synopsis:       Trap 12 with heavy disk load on ide vinum mirror
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Oct 22 12:30:20 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Alexander Haderer
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
Charite Hospital Berlin - Germany
>Environment:
System: FreeBSD ogava.str.charite.de 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Tue May 13 20:42:13 CEST 2003 root at ogava.str.charite.de:/usr/src/sys/compile/OGAVAD i386

>Description:
Setup: A single CPU x86 running 4.8R with GENERIC kernel, a SCSI disk 
holds OS and two 160G Maxtor disks (ad0, ad2) setup as vinum mirror 
to hold data (whole disk). Ata write cache is off as well as hw.ata.tags.

When doing high io load to the ide disks the kernel panics with trap 12:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x3f83c8c7
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0162d10
stack pointer           = 0x10:0xcd336cb4
frame pointer           = 0x10:0xcd336cc8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 273 (ls)
interrupt mask          = bio 
trap number             = 12
panic: page fault

We have seen this behaviour on P4 Systems with 845 chipset as well as on 
AMD CPU on Via chipset. Ask for details if necessary.

/var/log/messages and console are silent.

I have now 10 crashdumps,  'gdb -k ... where' gives differnent results. What 
is in common: all start with: 

...
#43 0xc023e41d in syscall2 ()
#44 0xc0232575 in Xint0x80_syscall ()

Please ask for more details if you are unable to reproduce this panic.

>How-To-Repeat:

- Setup a machine that boots 4.8R from SCSI disk (not tested if necessary).
- disable ata write cache (not tested if this really matters)
- attach two identical ide disks to ad0 and ad2 plus do dd/fdisk/disklabel
- setup a vinum mirror onto this disks that covers the whole ide disks

    drive disk0 device /dev/ad0s1e
    drive disk2 device /dev/ad2s1e

    volume mirror setupstate
      plex org concat
       sd len 0s drive disk0
      plex org concat
       sd len 0s drive disk2

- newfs -m 0 -b 16384 -f 2048 -i 131072 -g 300000 -h 200 /dev/vinum/mirror
  (not tested  if -migh options matter)
- mount /dev/vinum/mirror /mnt
- now start the disk stress, do the tasks below in parallel

  job 1
      cd /usr; find ports -depth -print | cpio -pdm /mnt
      (start once and wait a little bit)

  job 2
      ls -lRt /mnt (do it again when finished)

  job 3		(trash writer)
      dd if=/dev/zero of=/mnt/trash count=1000000

  job 4 	(diskreader 0): 
      dd if=/dev/ad0s1e of=/dev/null bs=1024k

  job 5 	(diskreader 2): 
      dd if=/dev/ad2s1e of=/dev/null bs=1024k

  do some variations: stop (^C) and restart job 3 - 5

  The final crash usually comes after these steps:
	stop job 3	(the trash writer)
	stop job 4 & 5  (the disk readers)
	start job 3 again and let it run for a few seconds, then stop
	start job 4 or 5 ----> trap 12

Although this all reads somewhat stupid this is the way I found to reproduce
the crash within a few  minutes.

Note: I was unable to reproduce the crashs when using the ide disks w/o vinum.

>Fix:
None.
>Release-Note:
>Audit-Trail:
>Unformatted: