Bizarre IO errors

Sun Apr 5 00:55:37 PDT 2009

First the background

FreeBSD nfs4.tarhost.com 7.1-RELEASE-p3 FreeBSD 7.1-RELEASE-p3 #0: Sat  
Mar  7 00:15:02 EST 2009     root at nfs4.tarhost.com:/usr/obj/usr/src/ 
sys/GENERIC  amd64

(Two of these processors)

CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2500.10-MHz K8- 
class CPU)

Memory

usable memory = 17165377536 (16370 MB)
avail memory  = 16626044928 (15855 MB)

The Disk system

aac0: <Adaptec RAID 52445> mem 0xd8000000-0xd81fffff irq 48 at device  
0.0 on pci1
aac0: Enabling 64-bit address support
aac0: Enable Raw I/O
aac0: Enable 64-bit array
aac0: New comm. interface enabled
aac0: [ITHREAD]
aac0: Adaptec 52445, aac driver 2.0.0-1

    Controller Status                        : Optimal
    Channel description                      : SAS/SATA
    Controller Model                         : Adaptec 52445
    Controller Serial Number                 : 8A321083874
    Physical Slot                            : 5
    Temperature                              : 60 C/ 140 F (Normal)
    Installed memory                         : 512 MB
    Copyback                                 : Disabled
    Background consistency check             : Disabled
    Automatic Failover                       : Enabled
    Global task priority                     : High
    Performance Mode                         : Default/Dynamic
    Defunct disk drive count                 : 0
    Logical devices/Failed/Degraded          : 2/0/0

Logical device number 0
    Logical device name                      : system
    RAID level                               : 1
    Status of logical device                 : Optimal
    Size                                     : 953334 MB
    Read-cache mode                          : Enabled
    Write-cache mode                         : Enabled (write-back)
    Write-cache setting                      : Enabled (write-back)
    Partitioned                              : Yes
    Protected by Hot-Spare                   : Yes
    Dedicated Hot-Spare                      : 0,4
    Dedicated Hot-Spare                      : 0,12
    Bootable                                 : Yes
    Failed stripes                           : No
    --------------------------------------------------------
    Logical device segment information
    --------------------------------------------------------
    Segment 0                                : Present (0,0)
    Segment 1                                : Present (0,8)

Logical device number 1
    Logical device name                      : data
    RAID level                               : 10
    Status of logical device                 : Optimal
    Size                                     : 5720064 MB
    Stripe-unit size                         : 256 KB
    Read-cache mode                          : Enabled
    Write-cache mode                         : Disabled (write-through)
    Write-cache setting                      : Enabled (write-back)  
when protected by battery
    Partitioned                              : Yes
    Protected by Hot-Spare                   : No
    Bootable                                 : No
    Failed stripes                           : No

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/aacd0s1a     15G    417M     14G     3%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/aacd0s1e     15G     30K     14G     0%    /tmp
/dev/aacd0s1f     62G    3.2G     54G     6%    /usr
/dev/aacd0s1g    762G    354M    701G     0%    /usr/local
/dev/aacd0s1d     15G    3.5G     11G    24%    /var
/dev/aacd1p1     5.3T    874G    4.0T    18%    /usr/local/export

This system is setup as a NFS File server. It handled several stress  
tests for two weeks before going online. The files were transfered to  
the system and it was placed online and ran fine for a few days. There  
are 10 web servers which access the file server, but the file servers  
have a cluster of caches in front of them so the load isn't too bad. I  
see peaks around 60Mbit/sec of traffic from the NFS server when taking  
backups and 30 MBit/sec otherwise. IO is minimized due to the large  
amount of available ram. It takes about three hours before disk  
caching fills the available memory, so there's not a lot of really hot  
data it's going to the disks for, mostly just writes. iostat generally  
reports about 1 MB/sec, maybe 2 at most. During stress tests I'd seen  
in excess of 800 MB/sec though usually the 300-400 MB/sec range.

Now for the strange. First notice the first two g_vfs_done lines. The  
offset is negative. After repeating the second 290 times we see this  
odd pattern of the error taking many lines to display. After that the  
logs continue occasionally showing a full error line then followed by  
a line broken up as below. The first 8 or so lines after the first  
have the same offset. After that the offset switches. to  
offset=1666490991559323648. That's in excess of 1 Exabyte. I've got a  
lot of disk but not that much. That seems to indicate that it's not  
really an IO error since the offset is way off the end of what could  
possibly be disk (Can someone confirm or deny that?). The offset  
occasionally changes to a negative number or some other value but this  
particular offset is repeated over and over. Between 1 and 2 AM over 2  
GB of this log was generated.

About two minutes after this started NFS stopped responding to the NFS  
clients in a prompt manner.

Once the server was restarted it runs fine for sometime but this  
pattern soon (within minutes) repeats.

I have noticed that several read errors seem to get logged several at  
a time then there will be occasional pauses but this is a sample of  
just the "last message repeated" which have 10 or more times repeated

At the moment I can only assume that the negative and exceedingly  
large offsets are a symptom of something beyond simply "disk problems"  
since the messages logged indicate the OS was attempting to do  
something it shouldn't. Can this be confirmed or denied. The  
controller reports no problems. Is there anything else you can point  
me to?

Start of problems in syslog

Apr  4 00:00:00 nfs4 newsyslog[986]: logfile turned over due to  
size>100K
Apr  4 01:11:59 nfs4 rpc.statd: unmon request from localhost, no  
matching monitor
Apr  4 14:10:44 nfs4 rpc.statd: unmon request from localhost, no  
matching monitor
Apr  4 14:10:44 nfs4 rpc.statd: unmon request from localhost, no  
matching monitor
Apr  5 01:08:48 nfs4 kernel:  
g_vfs_done():aacd1p1[READ(offset=-6163487656308658176,  
length=32768)]error = 5
Apr  5 01:08:48 nfs4 kernel:  
g_vfs_done():aacd1p1[READ(offset=-2344660732015456256,  
length=32768)]error = 5
Apr  5 01:08:48 nfs4 last message repeated 290 times
Apr  5 01:08:48 nfs4 kernel: g_vfs_done():
Apr  5 01:08:48 nfs4 kernel: aacd1p
Apr  5 01:08:48 nfs4 kernel: 1[RE
Apr  5 01:08:48 nfs4 kernel: AD(
Apr  5 01:08:48 nfs4 kernel: off
Apr  5 01:08:48 nfs4 kernel: set
Apr  5 01:08:48 nfs4 kernel: =-
Apr  5 01:08:48 nfs4 kernel: 234
Apr  5 01:08:48 nfs4 kernel: 466
Apr  5 01:08:48 nfs4 kernel: 073
Apr  5 01:08:48 nfs4 kernel: 201
Apr  5 01:08:48 nfs4 kernel: 545
Apr  5 01:08:48 nfs4 kernel: 6256,
Apr  5 01:08:48 nfs4 kernel: len
Apr  5 01:08:48 nfs4 kernel: gth
Apr  5 01:08:48 nfs4 kernel: =32
Apr  5 01:08:48 nfs4 kernel: 76
Apr  5 01:08:48 nfs4 kernel: 8)]
Apr  5 01:08:48 nfs4 kernel: err
Apr  5 01:08:48 nfs4 kernel: or
Apr  5 01:08:48 nfs4 kernel: = 5
Apr  5 01:08:48 nfs4 kernel:

Log of "message repeated" with more than 9 times repeated.

Apr  5 01:57:27 nfs4 last message repeated 75 times
Apr  5 01:57:28 nfs4 last message repeated 434 times
Apr  5 01:57:38 nfs4 last message repeated 18848 times
Apr  5 01:57:43 nfs4 last message repeated 9894 times
Apr  5 01:57:45 nfs4 last message repeated 435 times
Apr  5 01:57:45 nfs4 last message repeated 105 times
Apr  5 01:57:45 nfs4 last message repeated 433 times
Apr  5 01:57:45 nfs4 last message repeated 303 times
Apr  5 01:57:46 nfs4 last message repeated 421 times