6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]

Thu May 10 23:15:15 UTC 2007

David Wolfskill wrote:
> From a quick look in the lists, I get the impression that the Dell PERC
> 5/i may be a bit problematic.  Since I hadn't any plans on using that
> hardware, though, I've paid more attention to other things.
> 

Not sure that this impression is entirely accurate.  The biggest problem
with MFI machines is online RAID management.  The storage driver itself
matured very quickly and has been very reliable.

> Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg
> says the controller is:
> 
> mfi0: <Dell PERC 5/i> mem 0xd80f0000-0xd80fffff,0xfc4e0000-0xfc4fffff irq 78 at device 14.0 on pci2
> mfi0: 817 (224963336s/0x0020/0) - Shutdown command received from host
> mfi0: 818 (4278190080s/0x0020/0) - PCI 0x041028 0x0415 0x041028 0x041f03: Firmware initialization started (PCI ID 0015/1028/1f03/1028)
> mfi0: 819 (4278190080s/0x0020/0) - Type 18: Firmware version 1.00.02-0157
> mfi0: 820 (4278190096s/0x0008/0) - Battery Present
> mfi0: 821 (4278190124s/0x0004/0) - PD 08(e1/s255) event: Enclosure (SES) discovered on PD 08(e1/s255)
> mfi0: 822 (4278190124s/0x0002/0) - PD 08(e1/s255) event: Inserted: PD 08(e1/s255)
> mfi0: 823 (4278190124s/0x0002/0) - Type 29: Inserted: PD 08(e1/s255) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=500180b04413ce00,0000000000000000
> mfi0: 824 (4278190124s/0x0002/0) - PD 00(e1/s0) event: Inserted: PD 00(e1/s0)
> mfi0: 825 (4278190124s/0x0002/0) - Type 29: Inserted: PD 00(e1/s0) Info: enclPd=08, scsiType=0, portMap=01, sasAddr=50010b900046038e,0000000000000000
> mfi0: 826 (4278190124s/0x0002/0) - PD 01(e1/s1) event: Inserted: PD 01(e1/s1)
> mfi0: 827 (4278190124s/0x0002/0) - Type 29: Inserted: PD 01(e1/s1) Info: enclPd=08, scsiType=0, portMap=02, sasAddr=50010b9000460376,0000000000000000
> mfi0: 828 (4278190124s/0x0002/0) - PD 02(e1/s2) event: Inserted: PD 02(e1/s2)
> mfi0: 829 (4278190124s/0x0002/0) - Type 29: Inserted: PD 02(e1/s2) Info: enclPd=08, scsiType=0, portMap=04, sasAddr=50010b900046035a,0000000000000000
> mfi0: 830 (4278190124s/0x0002/0) - PD 03(e1/s3) event: Inserted: PD 03(e1/s3)
> mfi0: 831 (4278190124s/0x0002/0) - Type 29: Inserted: PD 03(e1/s3) Info: enclPd=08, scsiType=0, portMap=08, sasAddr=50010b90004603be,0000000000000000
> mfi0: 832 (4278190124s/0x0002/0) - PD 04(e1/s4) event: Inserted: PD 04(e1/s4)
> mfi0: 833 (4278190124s/0x0002/0) - Type 29: Inserted: PD 04(e1/s4) Info: enclPd=08, scsiType=0, portMap=10, sasAddr=50010b900045f6d6,0000000000000000
> mfi0: 834 (4278190124s/0x0002/0) - PD 05(e1/s5) event: Inserted: PD 05(e1/s5)
> mfi0: 835 (4278190124s/0x0002/0) - Type 29: Inserted: PD 05(e1/s5) Info: enclPd=08, scsiType=0, portMap=20, sasAddr=50010b9000460246,0000000000000000
> mfi0: 836 (224964238s/0x0020/0) - Adapter ticks 224964238 elapsed 45s: Time established as 02/16/07 18:03:58; (45 seconds since power on)
> 
> and the disks looks like:
> 
> mfid0: <MFI Logical Disk> on mfi0
> mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal
> 

Looks A OK to me.

> 
> The intended production workload involves creation and deletion of
> a large number of files rather rapidly.
> 
> I recalled that for the first year or two with Soft Updates, there
> were problems with that kind of workload, such that there was enough
> hysteresis in making free blocks actually available for subsequent
> allocation that processes that were trying to write to new blocks
> on such file systems would often fail, reporting ENOSPC.  Un-mounting
> and re-mounting the file system would clean things up, but that
> doesn't tend to be a viable approach for keeping a long-running
> application happy.  :-}
> 

sysctl vfs.ffs.doasyncfree=0 might help.  Running the syncer more 
frequently might also help, but I don't recall the sysctl node for
that.

> I reminded my colleague of this, since she also reported that an
> un-mount/re-mount sequence caused a lot of free space to show up
> on the file system in question, and she responded that she had been
> aware of this, and had been turning off Soft Updates on the file
> systems for the application in question, but she had forgotten that
> Soft Updates was on by default when she set up this (test) system.
> 
> She then turned off Soft Updates and started the test workload again.
> And instead of failing with ENOSPC after 3 days, it only took 2.

Very strange.  No chance that it was due to files that were deleted but
still referenced by open apps?

> 
> Hmmm... well; that wasn't exactly what I had expected.
> 
> Any hints, here?  The machine is running the i386 arch, with a pair of
> dual-core 2.33HHz Xeons.
> 
> I have a recent dmesg.boot, but I'd rather keep list messages fairly
> short.
> 
> We have a local private mirror of the FreeBSD CVS repository, so we have
> some flexibility in what we can do for testing, but the objective is to
> put the box in production -- and I'd rather not run CURRENT as part of a
> customer-visible production workload.  :-}  [My laptop is a different
> matter, of course....]
> 

This sounds purely like a filesystem issue, not an MFI driver issue.

Scott