iscsi initiator - system hang

Dennis R. Kolpanen kolpanen at kearfott.com
Tue Sep 9 21:28:18 UTC 2008


On a FreeBSD 7.0 system, certain commands issued against any of the
three mounted iscsi drives causes the system to "hang".  In this
context, hang means:

  the system continues to respond to pings
  sendmail stops accepting connections
  imapd stops accepting connections
  sshd stops accepting connections
  shell sessions already established stop accepting commands
  login at the system console is not possible

The problem has been caused, so far, by dump, restore, and pax.  These
commands work perfectly if they are directed against one of the
internal drives and not the iscsi drives.  The failures noted above do
not normally happen immediately after issuing one of these commands.
The problems seem to build over a period of minutes or tens of
minutes.  Note that the dump/restore/pax commands can take hours to
run.

Nothing is written to the system console or any of the log files
indicating a problem.  The only way to recover from the hang is by
means of the hardware reset button.

When the system was first being set up and no other users were on it,
shutting down all but one of the CPUs by means of sysctl and
"machdep.hlt_cpus" allowed restoring about 150gb to the three iscsi
drives.  Once the machine was placed into production, massive hardware
problems on an old server required this to be done immediately, this
trick no longer works.

An overview of the hardware involved:

  dual, quad-core Intel Xeon processors
  16 gb RAM FreeBSD 7.0 amd64 release
  NetworkAppliance FAS2020 SAN
  generic kernel

A complete dmesg output can be provided if desired.

By default, iscontrol creates the iscsi drives with the number of tags
set to one.  The performance of the iscsi drives with this default
setting was quite poor.  Based on a recommendation made on a mailing
list some time ago, /etc/iscsi.conf was changed to set the tags to
128.  This had a dramatic improvement on the iscsi performance.

Testing on the system that was rushed into production is not really
possible.  However, within the next week or so, a nearly identical
system should become available and this one could be used for testing.

Any ideas on what could be wrong?  Any solutions?

Thanks for your help.

Dennis R. Kolpanen


More information about the freebsd-scsi mailing list