ZFS hang status update
Dennis Glatting
freebsd at pki2.com
Sat Oct 20 05:38:38 UTC 2012
On Fri, 2012-10-19 at 19:08 -0700, Dennis Glatting wrote:
> I applied your debugging patch and that system has been running under
> load for 43 hours. I have no idea why.
>
> That said, some of my prior batch jobs have run for over a month. There
> was a time when ZFS was fairly stable but took a dive some months ago.
>
Boom. Roughly 49 hours, adding a SFTP transfer (60GB off the pool
disk-1) and a ls (a directory in the disk-1 pool) in a while loop.
My pools:
mc# zpool status
pool: disk-1
state: ONLINE
scan: scrub repaired 0 in 0h38m with 0 errors on Tue Oct 16 16:47:51
2012
config:
NAME STATE READ WRITE CKSUM
disk-1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
cache
da0 ONLINE 0 0 0
errors: No known data errors
pool: disk-2
state: ONLINE
scan: scrub repaired 0 in 0h6m with 0 errors on Tue Oct 16 17:05:43
2012
config:
NAME STATE READ WRITE CKSUM
disk-2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da9 ONLINE 0 0 0
da10 ONLINE 0 0 0
errors: No known data errors
camcontrol output (static linked and stored in a md):
mc# /mnt/camcontrol tags da0 -v
(no output. session hung.)
mc# /mnt/camcontrol tags da1 -v (** swap disk **)
(pass1:mps0:0:5:0): dev_openings 255
(pass1:mps0:0:5:0): dev_active 0
(pass1:mps0:0:5:0): devq_openings 255
(pass1:mps0:0:5:0): devq_queued 0
(pass1:mps0:0:5:0): held 0
(pass1:mps0:0:5:0): mintags 2
(pass1:mps0:0:5:0): maxtags 255
mc# /mnt/camcontrol tags da2 -v
(pass2:mps0:0:6:0): dev_openings 245
(pass2:mps0:0:6:0): dev_active 10
(pass2:mps0:0:6:0): devq_openings 245
(pass2:mps0:0:6:0): devq_queued 0
(pass2:mps0:0:6:0): held 0
(pass2:mps0:0:6:0): mintags 2
(pass2:mps0:0:6:0): maxtags 255
mc# /mnt/camcontrol tags da3 -v
(pass3:mps0:0:7:0): dev_openings 245
(pass3:mps0:0:7:0): dev_active 10
(pass3:mps0:0:7:0): devq_openings 245
(pass3:mps0:0:7:0): devq_queued 0
(pass3:mps0:0:7:0): held 0
(pass3:mps0:0:7:0): mintags 2
(pass3:mps0:0:7:0): maxtags 255
mc# /mnt/camcontrol tags da4 -v
(pass4:mps0:0:8:0): dev_openings 245
(pass4:mps0:0:8:0): dev_active 10
(pass4:mps0:0:8:0): devq_openings 245
(pass4:mps0:0:8:0): devq_queued 0
(pass4:mps0:0:8:0): held 0
(pass4:mps0:0:8:0): mintags 2
(pass4:mps0:0:8:0): maxtags 255
mc# /mnt/camcontrol tags da5 -v
(pass5:mps0:0:9:0): dev_openings 245
(pass5:mps0:0:9:0): dev_active 10
(pass5:mps0:0:9:0): devq_openings 245
(pass5:mps0:0:9:0): devq_queued 0
(pass5:mps0:0:9:0): held 0
(pass5:mps0:0:9:0): mintags 2
(pass5:mps0:0:9:0): maxtags 255
mc# /mnt/camcontrol tags da6 -v
(pass6:mps0:0:10:0): dev_openings 245
(pass6:mps0:0:10:0): dev_active 10
(pass6:mps0:0:10:0): devq_openings 245
(pass6:mps0:0:10:0): devq_queued 0
(pass6:mps0:0:10:0): held 0
(pass6:mps0:0:10:0): mintags 2
(pass6:mps0:0:10:0): maxtags 255
mc# /mnt/camcontrol tags da7 -v
(pass7:mps0:0:11:0): dev_openings 245
(pass7:mps0:0:11:0): dev_active 10
(pass7:mps0:0:11:0): devq_openings 245
(pass7:mps0:0:11:0): devq_queued 0
(pass7:mps0:0:11:0): held 0
(pass7:mps0:0:11:0): mintags 2
(pass7:mps0:0:11:0): maxtags 255
mc# /mnt/camcontrol tags da8 -v (** OS hdw RAID1 **)
(pass8:mps1:0:0:0): dev_openings 245
(pass8:mps1:0:0:0): dev_active 10
(pass8:mps1:0:0:0): devq_openings 245
(pass8:mps1:0:0:0): devq_queued 0
(pass8:mps1:0:0:0): held 0
(pass8:mps1:0:0:0): mintags 2
(pass8:mps1:0:0:0): maxtags 255
mc# /mnt/camcontrol tags da9 -v
(pass9:mps1:0:9:0): dev_openings 251
(pass9:mps1:0:9:0): dev_active 4
(pass9:mps1:0:9:0): devq_openings 251
(pass9:mps1:0:9:0): devq_queued 0
(pass9:mps1:0:9:0): held 0
(pass9:mps1:0:9:0): mintags 2
(pass9:mps1:0:9:0): maxtags 255
mc# /mnt/camcontrol tags da10 -v
(pass10:mps1:0:11:0): dev_openings 251
(pass10:mps1:0:11:0): dev_active 4
(pass10:mps1:0:11:0): devq_openings 251
(pass10:mps1:0:11:0): devq_queued 0
(pass10:mps1:0:11:0): held 0
(pass10:mps1:0:11:0): mintags 2
(pass10:mps1:0:11:0): maxtags 255
I did not run procstat before reboot. I wasn't sure if that was
redundant information from my prior email.
This is da0 (the cache --SSD) on which camcontrol hanged. It is on the
same controller.
da0 at mps0 bus 0 scbus0 target 3 lun 0
da0: <ATA M4-CT256M4SSD2 000F> Fixed Direct Access SCSI-6 device
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 244198MB (500118192 512 byte sectors: 255H 63S/T 31130C)
More information about the freebsd-fs
mailing list