kern/131796: Introducing new USB or eSATA disks results in atapci1+ irq storm until reboot

Colin Faber cfaber at gmail.com
Tue Feb 17 14:50:04 PST 2009


>Number:         131796
>Category:       kern
>Synopsis:       Introducing new USB or eSATA disks results in atapci1+ irq storm until reboot
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Feb 17 22:50:03 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Colin Faber
>Release:        7.1-RELENG-p2 amd64 kernel
>Organization:
n/a
>Environment:
System: FreeBSD hal.fpsn 7.1-RELEASE-p2 FreeBSD 7.1-RELEASE-p2 #2: Sun Feb 15 12:21:19 MST 2009 cfaber at hal.fpsn:/usr/obj/usr/src/sys/HAL amd64

>Description:
It appears that when plugging in any USB, or eSATA disk drive, atapci1+ gets stuck in a massive irq storm, running about 13k irqs a second. This results in about 90% interrupt load on the system and makes everything unstable and slow.

Note - this load appears even with ZERO activity on the device in question and little or no geom activity.

Also Note - this machine is running two geom_raid5 array's made of geom labeled providers


Non-storm:
cfaber at hal:~$ vmstat -i
interrupt                          total       rate
irq1: atkbd0                           6          0
irq14: ata0                         4316          0
irq16: mpt0                      1782986        243
irq21: ohci0                      213266         29
irq23: atapci1+                  2182108        298
cpu0: timer                     14628721       1999
irq256: nfe0                      215036         29
Total                           19026439       2601


Storm:
cfaber at hal:~$ vmstat -i
interrupt                          total       rate
irq1: atkbd0                           6          0
irq14: ata0                         1534          1
irq16: mpt0                          674          0
irq21: ohci0                        1169          0
irq22: ehci0                         193          0
irq23: atapci1+                 20420307      14230
cpu0: timer                      2868072       1998
irq256: nfe0                        1171          0
Total                           23293126      16232

>How-To-Repeat:
cvsup to RELENG_7_1

configure an amd64 kernel with the following parameters:

cpu		HAMMER
ident		HAL
options 	GEOM_JOURNAL
options 	SCHED_ULE		# ULE scheduler
options 	PREEMPTION		# Enable kernel thread preemption
options 	INET			# InterNETworking
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	UFS_GJOURNAL		# Enable gjournal-based UFS journaling
options 	MD_ROOT			# MD is a potential root device
options 	NFSCLIENT		# Network Filesystem Client
options 	NFSSERVER		# Network Filesystem Server
options 	NFSLOCKD		# Network Lock Manager
options 	NTFS			# NT File System
options 	MSDOSFS			# MSDOS Filesystem
options 	CD9660			# ISO 9660 Filesystem
options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_PART_GPT		# GUID Partition Tables.
options 	GEOM_LABEL		# Provides labelization
options 	COMPAT_43TTY		# BSD 4.3 TTY compat [KEEP THIS!]
options 	COMPAT_IA32		# Compatible with i386 binaries
options 	COMPAT_FREEBSD4		# Compatible with FreeBSD4
options 	COMPAT_FREEBSD5		# Compatible with FreeBSD5
options 	COMPAT_FREEBSD6		# Compatible with FreeBSD6
options 	SCSI_DELAY=5000		# Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	STACK			# stack(9) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	ADAPTIVE_GIANT		# Giant mutex is adaptive.
options 	STOP_NMI		# Stop CPUS using NMI instead of IPI
options 	AUDIT			# Security event auditing
options         COMPAT_LINUX32
options         LINPROCFS
options 	SMP			# Symmetric MultiProcessor Kernel
device		cpufreq
device		acpi
device		pci
device		ata
device		atadisk		# ATA disk drives
device		ataraid		# ATA RAID drives
device		atapicd		# ATAPI CDROM drives
device		atapifd		# ATAPI floppy drives
device		atapist		# ATAPI tape drives
options 	ATA_STATIC_ID	# Static device numbering
device		mpt		# LSI-Logic MPT-Fusion
device		scbus		# SCSI bus (required for SCSI)
device		ch		# SCSI media changers
device		da		# Direct Access (disks)
device		sa		# Sequential Access (tape etc)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)
device		ses		# SCSI Environmental Services (and SAF-TE)
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse
device		kbdmux		# keyboard multiplexer
device		vga		# VGA video card driver
device		splash		# Splash screen and screen saver support
device		sc
device		miibus		# MII bus support
device		nfe		# nVidia nForce MCP on-board Ethernet
device		loop		# Network loopback
device		random		# Entropy device
device		ether		# Ethernet support
device		pty		# Pseudo-ttys (telnet etc)
device		md		# Memory "disks"
device		firmware	# firmware assist module
device		bpf		# Berkeley packet filter
device		uhci		# UHCI PCI->USB interface
device		ohci		# OHCI PCI->USB interface
device		ehci		# EHCI PCI->USB interface (USB 2.0)
device		usb		# USB Bus (required)
device		ugen		# Generic
device		uhid		# "Human Interface Devices"
device		ukbd		# Keyboard
device		ulpt		# Printer
device		umass		# Disks/Mass storage - Requires scbus and da
device		ums		# Mouse
device		uscanner	# Scanners
device		ucom		# Generic com ttys
device		sound		# Provide general sound support


Next, connect any umass device, or sata device. atapci1+ will go nuts the second the kernel sees the device online.
>Fix:
So far, the only way to get the problem to stop is to reboot the system. If the device is connected already, then there are no problems with irq storms on atapci1+.

I'm willing to try and debugging procedures provided. As well as provide additional information once requested.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list