LSI SAS 3008 card - 35 out of 36 disks detected
asomers at freebsd.org
Mon Dec 8 21:54:22 UTC 2014
On Mon, Dec 8, 2014 at 1:44 PM, Jason Wolfe <nitroboost at gmail.com> wrote:
> On Mon, Dec 8, 2014 at 11:20 AM, Alan Somers <asomers at freebsd.org> wrote:
>> On Mon, Dec 8, 2014 at 8:58 AM, Justin O'Conor
>> <oconnor at crystal.harvard.edu> wrote:
>>> Hi All,
>>> Thanks, this is encouraging. smp_discover ses0|1 see 36 sata disks. This is from the 10.1 install.
>> There are certainly some inconsistencies in the smp_discover
>> responses. For example, the SEP on ses0 (phy identifier: 36) has
>> "connector type: SAS virtual connector" and "connector element index:
>> 24" But the SEP on ses1 (phy identifier: 28) has "connector type: No
>> information" and "connector element index: 0". Also note that "phy
>> identifier: 12" on ses1 has "connector element index: 0". That would
>> be the first slot on the rear expander, if the slots and phys are
>> numbered the same way. My best guess is that phy 12 and phy 28 mapped
>> to the same map_idx in mpr_mapping.c:1168. So the information for the
>> SEP overwrote the information for the first disk slot. If my guess is
>> true, then recabling your chassis as you suggested wouldn't help.
>> However, you might try the attached but untested patch. It will
>> prevent the SEP from being added to the mapping table while printing a
>> useful error message. If I'm correct, then the patch will let you use
>> all 36 disk slots, but you won't have ses1 anymore.
>> In the meantime, I'll try to reproduce your problem. I have all the
>> required equipment in my lab.
> I believe this is the same issue we ran into a few years ago on the
> LSI2008, where the ses0 device would map over the boot disk. It's
> long and spans over multiple months, so just the relevant bits:
> Initial report:
> The issue seems to be a shortcoming in the detection method where it
> has no problem assigning the ses over an already mapped disk, LSI's
> initial response was to use the LSI config utility and map the drives
> This was not an option for us as we had 2000 of these devices in the
> field, and entering the LSI BIOS would be a large undertaking. In the
> end after an internal dialogue with LSI guys, Kashyap was kind enough
> to write a one off for us, that never made it upstream. It simply
> assigns the ses device to max target + 1 when a conflict is found.
> The core issue seems to be with the way LSI detects and assigns
> devices on FreeBSD, so this is by no means 'proper', but it's sound
> enough so resolve the issue for us on the LSI2008. In case it's
> interesting to anyone:
I've reproduced the problem, and it's the same one that I saw before.
It's also the same one that Jason described, but the full problem is a
little more general than just the SEP device's mapping. First a
LSI controllers have two methods for mapping phys to SCSI Bus and
Target IDs. One method is called Device Persistence mapping. It is
based on the SAS WWN attached to each phy. The other method is called
Enclosure/Slot mapping. That method uses the Connector Element Index
or the Device Slot Number field of the expander's SMP DISCOVER
response for the given phy. It seems that all of LSI's SAS2 HBAs used
Device Persistence mapping by default, but the SAS3 HBAs use
Enclosure/Slot mapping. That's why this problem rarely or never shows
up with SAS 2 HBAs.
The SAS Protocol Layer 3 rev 6g spec, section 18.104.22.168, says that the
CONNECTOR ELEMENT INDEX field shall be ignored if the CONNECTOR TYPE
field is set to 0. Clearly, the HBA firmware isn't ignoring that
field. That's a bug with the HBA firmware.
But the expander firmware could be doing better. If it reported a
unique CONNECTOR ELEMENT INDEX for the SEP phy, then we wouldn't have
this problem. I'll take that up with the expander vendor.
In the meantime, there is a workaround. Don't use the patch I sent
you; it doesn't work. The workaround is to configure your HBA to use
Device Persistence mapping. You can do that from FreeBSD using a tool
called lsiutil. Unfortunately, it isn't publicly distributed, but you
can ask Steve McConnell (cc'ed) for a copy. Here are the
1) Ensure that the hba of concern is named "mpr0".
2) Start lsiutil
3) Select "mps0" [sic]
4) (Optionally) enter e for expert mode
5) Enter 9 for "Read/change configuration pages"
6) Enter 1 for Page Type (that means the IOC pages, FYI)
7) Enter 8 for Page Number
8) Enter 0 for NVRAM values
9) Enter "yes" to make changes
10) Offset is "c"
11) Change "00000002" to "00000001".
12) Enter "yes" to save changes.
13) Either reboot, or unload and reload mpr(4).
That change will put you in Device Persistence mapping but with
persistent mapping disabled. All of the slots should work again. At
least I hope so.
More information about the freebsd-scsi