UFS Filesystem issues, and the loss of my hair...

Wed Aug 12 12:47:26 UTC 2009

On Mon, Aug 10, 2009 at 05:20:44PM -0500, Hearn, Trevor wrote:
> Yes, it does seem like it was part of one of the other messages.  The isp(4)
> driver was just recently updated in HEAD by mjacob@ who has maintained that
> driver in the past.  He may have some insight if there is an isp(4)-specific
> problem.
> 
> --
> John Baldwin
> 
> Heh. Ok, I just watched the same error message scroll across the screen
> for about 5 minutes now, with a different offset, same length. The fun
> part is that it is not touching the device, /dev/da1p7 at all. From the
> systat -vmstat display, I see all of the traffic coming from the
> /dev/mfid0 drives. It ran for a while, then stopped. So, no access to
> the drive in question, da1p7, but on the root drive, mfid0. Odd. The
> partition is mapped to the root drive. I wonder if the driver lost
> itself, and it tried to access the file on the empty folder on the root
> drive. Sigh. Anyone?

Hello,

we faced a similar problem here (major greek university) about a year ago
[1]. Our setup consists of Dell 2950 servers, QLogic 2462 HBAs (PCI-E)
and an EMC CLARiiON CX3-40. As soon as we tried to do a simple "tar zxf
ports.tgz" on a SAN volume the system would freeze or/and panic (same error
messages as yours). Oleg Sharoiko suggested that we could decrease the
number of tag openings (tag queue depth). Decreasing it would make the
system a bit more stable but did not eliminate the problem.

Then, I contacted Matthew Jacob and tested his latest isp code [2] along
with alternative solutions like zfs and gjournal. Matthew was kind enough
to offer his support but eventually I ran out time and patience, so I moved
a couple of servers to centos in order to put the storage into production.
That was around December last year.

About a month ago Kenneth Merry announced that a new version of isp was
available [3] which corrected bugs and added new functionality. I thought
it was worth trying so I set up FreeBSD 7-stable in two Dell boxes, added
the isp patches, recompiled the kernel and started the stress tests. I
also looked around for more info and hints regarding qlogic hbas. The
Linux driver (ql2xxx) has a 32 max queue depth by default (see
ql2xmaxqdepth) which is also the recommended value by EMC. There are also
similar references for Solaris (see sd:sd_max_throttle). Some mention
even smaller values depending the storage.

Currently, I am running stress tests, using fsx, ffsb, postmark, iozone,
bonnie++, blogbench and other home-made scripts (any other suggestion?) on
two 7-stable-amd64 + isp_diffs.releng7.20090629 boxes. So far, at 32 maximum
tag openings, everything looks good, I have not seen any panics and the 
following fsck run cleanly. I will keep running more tests for a week or two
hoping that they will help draw a conclusion.

Regards,
Panagiotis

ps. cc'ed to Kenneth Merry, I think he would be interested.

[1] http://lists.freebsd.org/pipermail/freebsd-scsi/2008-October/003686.html
[2] http://feral.com/isp.html
[3] http://lists.freebsd.org/pipermail/freebsd-scsi/2009-June/003916.html

-- 
Panagiotis J. Christias    Network Management Center
P.Christias at noc.ntua.gr    National Technical Univ. of Athens, GREECE