From bugmaster at FreeBSD.org Mon May 5 11:07:13 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 5 11:07:27 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200805051107.m45B7CcC070836@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc 8 problems total. From sbruno at miralink.com Wed May 7 22:32:25 2008 From: sbruno at miralink.com (Sean Bruno) Date: Wed May 7 22:32:29 2008 Subject: USB drive serial numbers Message-ID: <48222930.2010808@miralink.com> Can someone send me the output of "camcontrol inquiry daX" where daX is a usb attached hard drive? I'm interested to know if the USB external hard drives report a serial number as the USB flash drives I currently have in my possession do not report one. eg: SCSI hard drive: sudo camcontrol inquiry da0 pass0: Fixed Direct Access SCSI-3 device pass0: Serial Number 3KT17YJL pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled eg: USB flash drive: sudo camcontrol inquiry da3 pass4: Removable Direct Access SCSI-0 device pass4: Serial Number pass4: 40.000MB/s transfers Sean From scottl at samsco.org Wed May 7 23:20:36 2008 From: scottl at samsco.org (Scott Long) Date: Wed May 7 23:20:40 2008 Subject: USB drive serial numbers In-Reply-To: <48222930.2010808@miralink.com> References: <48222930.2010808@miralink.com> Message-ID: <48223940.5090109@samsco.org> Sean Bruno wrote: > Can someone send me the output of "camcontrol inquiry daX" where daX is > a usb attached hard drive? I'm interested to know if the USB external > hard drives report a serial number as the USB flash drives I currently > have in my possession do not report one. > > eg: SCSI hard drive: > sudo camcontrol inquiry da0 > pass0: Fixed Direct Access SCSI-3 device > pass0: Serial Number 3KT17YJL > pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged > Queueing Enabled > > > eg: USB flash drive: > sudo camcontrol inquiry da3 > pass4: Removable Direct Access SCSI-0 device > pass4: Serial Number > pass4: 40.000MB/s transfers > CAM used to assume that all DA devices supported the serial number EVPD page. I recently changed it to query the device for the list of pages it does support, and only ask for the serial number page if it does (which in turns cuts down on a whole lot of kernel printf noise). My experience is that some devices do, but most devices don't. If you want to check your devices manually, do: camcontrol cmd pass0 -v -c "12 01 00 00 255 00" -i 255 "-" | hd If 0x80 appears after the 4th byte, the device claims support for querying the serial number. The serial number can then be fetched with camcontrol cmd pass0 -v -c "12 01 80 00 255 00" -i 255 "-" | hd Or via camcontrol inq pass0 -S Scott From sbruno at miralink.com Wed May 7 23:53:38 2008 From: sbruno at miralink.com (Sean Bruno) Date: Wed May 7 23:53:42 2008 Subject: USB drive serial numbers In-Reply-To: <48223940.5090109@samsco.org> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> Message-ID: <482240FF.4030704@miralink.com> Scott Long wrote: > Sean Bruno wrote: >> Can someone send me the output of "camcontrol inquiry daX" where daX >> is a usb attached hard drive? I'm interested to know if the USB >> external hard drives report a serial number as the USB flash drives I >> currently have in my possession do not report one. >> >> eg: SCSI hard drive: >> sudo camcontrol inquiry da0 >> pass0: Fixed Direct Access SCSI-3 device >> pass0: Serial Number 3KT17YJL >> pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged >> Queueing Enabled >> >> >> eg: USB flash drive: >> sudo camcontrol inquiry da3 >> pass4: Removable Direct Access SCSI-0 device >> pass4: Serial Number >> pass4: 40.000MB/s transfers >> > > CAM used to assume that all DA devices supported the serial number EVPD > page. I recently changed it to query the device for the list of pages > it does support, and only ask for the serial number page if it does > (which in turns cuts down on a whole lot of kernel printf noise). My > experience is that some devices do, but most devices don't. If you want > to check your devices manually, do: > > camcontrol cmd pass0 -v -c "12 01 00 00 255 00" -i 255 "-" | hd > > If 0x80 appears after the 4th byte, the device claims support for > querying the serial number. The serial number can then be fetched with > > camcontrol cmd pass0 -v -c "12 01 80 00 255 00" -i 255 "-" | hd > > Or via > > camcontrol inq pass0 -S > > > Scott Hrm....it looks like asking for page 0x80 directly is returning the same as requesting page 0x80 or asking for all pages: sudo camcontrol devlist at scbus0 target 0 lun 0 (da0,pass0) at scbus0 target 1 lun 0 (da1,pass1) at scbus0 target 6 lun 0 (ses0,pass2) at scbus1 target 1 lun 0 (da2,pass3) at scbus2 target 0 lun 0 (pass4,da3) sudo camcontrol cmd pass4 -v -c "12 01 00 00 255 00" -i 255 "-" | hd 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 |........TOSHIBA | 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 |TransMemory | 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 |5.00PMAP1234....| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 sudo camcontrol cmd pass4 -v -c "12 01 80 00 255 00" -i 255 "-" | hd 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 |........TOSHIBA | 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 |TransMemory | 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 |5.00PMAP1234....| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 sean From scottl at samsco.org Thu May 8 00:03:50 2008 From: scottl at samsco.org (Scott Long) Date: Thu May 8 00:03:54 2008 Subject: USB drive serial numbers In-Reply-To: <482240FF.4030704@miralink.com> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> <482240FF.4030704@miralink.com> Message-ID: <48224361.10809@samsco.org> Sean Bruno wrote: > Scott Long wrote: >> Sean Bruno wrote: >>> Can someone send me the output of "camcontrol inquiry daX" where daX >>> is a usb attached hard drive? I'm interested to know if the USB >>> external hard drives report a serial number as the USB flash drives I >>> currently have in my possession do not report one. >>> >>> eg: SCSI hard drive: >>> sudo camcontrol inquiry da0 >>> pass0: Fixed Direct Access SCSI-3 device >>> pass0: Serial Number 3KT17YJL >>> pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged >>> Queueing Enabled >>> >>> >>> eg: USB flash drive: >>> sudo camcontrol inquiry da3 >>> pass4: Removable Direct Access SCSI-0 device >>> pass4: Serial Number >>> pass4: 40.000MB/s transfers >>> >> >> CAM used to assume that all DA devices supported the serial number EVPD >> page. I recently changed it to query the device for the list of pages >> it does support, and only ask for the serial number page if it does >> (which in turns cuts down on a whole lot of kernel printf noise). My >> experience is that some devices do, but most devices don't. If you want >> to check your devices manually, do: >> >> camcontrol cmd pass0 -v -c "12 01 00 00 255 00" -i 255 "-" | hd >> >> If 0x80 appears after the 4th byte, the device claims support for >> querying the serial number. The serial number can then be fetched with >> >> camcontrol cmd pass0 -v -c "12 01 80 00 255 00" -i 255 "-" | hd >> >> Or via >> >> camcontrol inq pass0 -S >> >> >> Scott > > Hrm....it looks like asking for page 0x80 directly is returning the same > as requesting page 0x80 or asking for all pages: > > sudo camcontrol devlist > at scbus0 target 0 lun 0 (da0,pass0) > at scbus0 target 1 lun 0 (da1,pass1) > at scbus0 target 6 lun 0 (ses0,pass2) > at scbus1 target 1 lun 0 (da2,pass3) > at scbus2 target 0 lun 0 (pass4,da3) > > sudo camcontrol cmd pass4 -v -c "12 01 00 00 255 00" -i 255 "-" | hd > 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 > |........TOSHIBA | > 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 > |TransMemory | > 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 > |5.00PMAP1234....| > 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 000000f0 > > sudo camcontrol cmd pass4 -v -c "12 01 80 00 255 00" -i 255 "-" | hd > 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 > |........TOSHIBA | > 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 > |TransMemory | > 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 > |5.00PMAP1234....| > 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * Wow, that's just fantastically broken. It's completely ignoring the EVPD bit in the request and simply reporting standard inq data. I guess the only thing that keeps CAM from exploding on this is that it sees the length field in byte 4 as 0x01, so it doesn't search too far into what it thinks is the response. I'll have to read the spec some more to see if there's a standard way to report that the device supports the EVPD bit that FreeBSD should be checking. Scott From sbruno at miralink.com Thu May 8 00:49:07 2008 From: sbruno at miralink.com (Sean Bruno) Date: Thu May 8 00:49:13 2008 Subject: USB drive serial numbers In-Reply-To: <48224361.10809@samsco.org> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> <482240FF.4030704@miralink.com> <48224361.10809@samsco.org> Message-ID: <48224E01.4030907@miralink.com> Scott Long wrote: > Sean Bruno wrote: >> Scott Long wrote: >>> Sean Bruno wrote: >>>> Can someone send me the output of "camcontrol inquiry daX" where >>>> daX is a usb attached hard drive? I'm interested to know if the >>>> USB external hard drives report a serial number as the USB flash >>>> drives I currently have in my possession do not report one. >>>> >>>> eg: SCSI hard drive: >>>> sudo camcontrol inquiry da0 >>>> pass0: Fixed Direct Access SCSI-3 device >>>> pass0: Serial Number 3KT17YJL >>>> pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged >>>> Queueing Enabled >>>> >>>> >>>> eg: USB flash drive: >>>> sudo camcontrol inquiry da3 >>>> pass4: Removable Direct Access SCSI-0 >>>> device >>>> pass4: Serial Number >>>> pass4: 40.000MB/s transfers >>>> >>> >>> CAM used to assume that all DA devices supported the serial number EVPD >>> page. I recently changed it to query the device for the list of pages >>> it does support, and only ask for the serial number page if it does >>> (which in turns cuts down on a whole lot of kernel printf noise). My >>> experience is that some devices do, but most devices don't. If you >>> want >>> to check your devices manually, do: >>> >>> camcontrol cmd pass0 -v -c "12 01 00 00 255 00" -i 255 "-" | hd >>> >>> If 0x80 appears after the 4th byte, the device claims support for >>> querying the serial number. The serial number can then be fetched with >>> >>> camcontrol cmd pass0 -v -c "12 01 80 00 255 00" -i 255 "-" | hd >>> >>> Or via >>> >>> camcontrol inq pass0 -S >>> >>> >>> Scott >> >> Hrm....it looks like asking for page 0x80 directly is returning the same >> as requesting page 0x80 or asking for all pages: >> >> sudo camcontrol devlist >> at scbus0 target 0 lun 0 (da0,pass0) >> at scbus0 target 1 lun 0 (da1,pass1) >> at scbus0 target 6 lun 0 (ses0,pass2) >> at scbus1 target 1 lun 0 (da2,pass3) >> at scbus2 target 0 lun 0 (pass4,da3) >> >> sudo camcontrol cmd pass4 -v -c "12 01 00 00 255 00" -i 255 "-" | hd >> 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 >> |........TOSHIBA | >> 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 >> |TransMemory | >> 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 >> |5.00PMAP1234....| >> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> |................| >> * >> 000000f0 >> >> sudo camcontrol cmd pass4 -v -c "12 01 80 00 255 00" -i 255 "-" | hd >> 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 >> |........TOSHIBA | >> 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 >> |TransMemory | >> 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 >> |5.00PMAP1234....| >> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> |................| >> * > > Wow, that's just fantastically broken. It's completely ignoring the > EVPD bit in the request and simply reporting standard inq data. I > guess the only thing that keeps CAM from exploding on this is that it > sees the length field in byte 4 as 0x01, so it doesn't search too far > into what it thinks is the response. I'll have to read the spec some > more to see if there's a standard way to report that the device supports > the EVPD bit that FreeBSD should be checking. > > Scott Any chance you have a USB hard drive lying around that you could run a "camcontrol inq"? Sean From scottl at samsco.org Thu May 8 06:36:51 2008 From: scottl at samsco.org (Scott Long) Date: Thu May 8 06:36:55 2008 Subject: USB drive serial numbers In-Reply-To: <48224E01.4030907@miralink.com> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> <482240FF.4030704@miralink.com> <48224361.10809@samsco.org> <48224E01.4030907@miralink.com> Message-ID: <48229F7F.6040602@samsco.org> Sean Bruno wrote: > Scott Long wrote: >> Sean Bruno wrote: >>> Scott Long wrote: >>>> Sean Bruno wrote: >>>>> Can someone send me the output of "camcontrol inquiry daX" where >>>>> daX is a usb attached hard drive? I'm interested to know if the >>>>> USB external hard drives report a serial number as the USB flash >>>>> drives I currently have in my possession do not report one. >>>>> >>>>> eg: SCSI hard drive: >>>>> sudo camcontrol inquiry da0 >>>>> pass0: Fixed Direct Access SCSI-3 device >>>>> pass0: Serial Number 3KT17YJL >>>>> pass0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged >>>>> Queueing Enabled >>>>> >>>>> >>>>> eg: USB flash drive: >>>>> sudo camcontrol inquiry da3 >>>>> pass4: Removable Direct Access SCSI-0 >>>>> device >>>>> pass4: Serial Number >>>>> pass4: 40.000MB/s transfers >>>>> >>>> >>>> CAM used to assume that all DA devices supported the serial number EVPD >>>> page. I recently changed it to query the device for the list of pages >>>> it does support, and only ask for the serial number page if it does >>>> (which in turns cuts down on a whole lot of kernel printf noise). My >>>> experience is that some devices do, but most devices don't. If you >>>> want >>>> to check your devices manually, do: >>>> >>>> camcontrol cmd pass0 -v -c "12 01 00 00 255 00" -i 255 "-" | hd >>>> >>>> If 0x80 appears after the 4th byte, the device claims support for >>>> querying the serial number. The serial number can then be fetched with >>>> >>>> camcontrol cmd pass0 -v -c "12 01 80 00 255 00" -i 255 "-" | hd >>>> >>>> Or via >>>> >>>> camcontrol inq pass0 -S >>>> >>>> >>>> Scott >>> >>> Hrm....it looks like asking for page 0x80 directly is returning the same >>> as requesting page 0x80 or asking for all pages: >>> >>> sudo camcontrol devlist >>> at scbus0 target 0 lun 0 (da0,pass0) >>> at scbus0 target 1 lun 0 (da1,pass1) >>> at scbus0 target 6 lun 0 (ses0,pass2) >>> at scbus1 target 1 lun 0 (da2,pass3) >>> at scbus2 target 0 lun 0 (pass4,da3) >>> >>> sudo camcontrol cmd pass4 -v -c "12 01 00 00 255 00" -i 255 "-" | hd >>> 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 >>> |........TOSHIBA | >>> 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 >>> |TransMemory | >>> 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 >>> |5.00PMAP1234....| >>> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> |................| >>> * >>> 000000f0 >>> >>> sudo camcontrol cmd pass4 -v -c "12 01 80 00 255 00" -i 255 "-" | hd >>> 00000000 00 80 00 01 1f 00 00 00 54 4f 53 48 49 42 41 20 >>> |........TOSHIBA | >>> 00000010 54 72 61 6e 73 4d 65 6d 6f 72 79 20 20 20 20 20 >>> |TransMemory | >>> 00000020 35 2e 30 30 50 4d 41 50 31 32 33 34 00 00 00 00 >>> |5.00PMAP1234....| >>> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> |................| >>> * >> >> Wow, that's just fantastically broken. It's completely ignoring the >> EVPD bit in the request and simply reporting standard inq data. I >> guess the only thing that keeps CAM from exploding on this is that it >> sees the length field in byte 4 as 0x01, so it doesn't search too far >> into what it thinks is the response. I'll have to read the spec some >> more to see if there's a standard way to report that the device supports >> the EVPD bit that FreeBSD should be checking. >> >> Scott > Any chance you have a USB hard drive lying around that you could run a > "camcontrol inq"? > > Sean A thumb drive that I have lying around has the same problem, it ignores the EVPD bit and happily returns std inq data for all requests. An ATA->USB+Firewire enclosure works correctly; it doesn't support any VPD pages, but returns a sense error in response, as it should. Given that Firewire is more explicit in its implementation of SBP/SPC, I'm not too surprised that it worked correctly (I tested this over the USB port, though). I have another ATA->USB enclosure lying around, but I can't find the power cord for it right now. Scott From sbruno at miralink.com Thu May 8 14:58:25 2008 From: sbruno at miralink.com (Sean Bruno) Date: Thu May 8 14:58:30 2008 Subject: USB drive serial numbers In-Reply-To: <48229F7F.6040602@samsco.org> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> <482240FF.4030704@miralink.com> <48224361.10809@samsco.org> <48224E01.4030907@miralink.com> <48229F7F.6040602@samsco.org> Message-ID: <4823150F.30909@miralink.com> > > A thumb drive that I have lying around has the same problem, it ignores > the EVPD bit and happily returns std inq data for all requests. An > ATA->USB+Firewire enclosure works correctly; it doesn't support any > VPD pages, but returns a sense error in response, as it should. Given > that Firewire is more explicit in its implementation of SBP/SPC, I'm not > too surprised that it worked correctly (I tested this over the USB port, > though). I have another ATA->USB enclosure lying around, but I can't > find the power cord for it right now. > > Scott > Interesting. Is there some other method that could be used to "identify" drives in the system? Sean From scottl at samsco.org Thu May 8 15:01:40 2008 From: scottl at samsco.org (Scott Long) Date: Thu May 8 15:01:44 2008 Subject: USB drive serial numbers In-Reply-To: <4823150F.30909@miralink.com> References: <48222930.2010808@miralink.com> <48223940.5090109@samsco.org> <482240FF.4030704@miralink.com> <48224361.10809@samsco.org> <48224E01.4030907@miralink.com> <48229F7F.6040602@samsco.org> <4823150F.30909@miralink.com> Message-ID: <482315D1.4040401@samsco.org> Sean Bruno wrote: > >> >> A thumb drive that I have lying around has the same problem, it ignores >> the EVPD bit and happily returns std inq data for all requests. An >> ATA->USB+Firewire enclosure works correctly; it doesn't support any >> VPD pages, but returns a sense error in response, as it should. Given >> that Firewire is more explicit in its implementation of SBP/SPC, I'm not >> too surprised that it worked correctly (I tested this over the USB port, >> though). I have another ATA->USB enclosure lying around, but I can't >> find the power cord for it right now. >> >> Scott >> > > Interesting. Is there some other method that could be used to > "identify" drives > in the system? > Sean I think the only option is to do a std inq followed by an EVPD inquiry and bcmp the results. Scott From allan at physics.umn.edu Fri May 9 01:32:05 2008 From: allan at physics.umn.edu (Graham Allan) Date: Fri May 9 01:32:09 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 Message-ID: <20080509011028.GV25577@physics.umn.edu> Hi, I've been trying to figure out a problem on a system which I just upgraded from FreeBSD 6.1 to 6.3. It's a Dell 1750 with a QLA2342 (isp2312) HBA, connected to an EMC DS-16B2 (aka Brocade 3800) switch, and from there to a couple of AC&NC Jetstor arrays. It's all been working fine like this for some time under 6.1. When I boot the system under 6.3, though I get a hang like this: isp0: Interrupting Mailbox Command (0x6f) Timeout (500000us) isp0: PLOGI 0x031a00 failed isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) isp0: isp_pdb_sync: isp_scan_fabric failed isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) isp0: isp_pdb_sync: isp_scan_fabric failed isp0: Mailbox Command 'SEND SNS' failed (COMMAND ERROR) isp0: isp_pdb_sync: isp_scan_fabric failed after which the system stays hung. I do have ispfw_load="YES" set (it reports loading F/W revision 3.3.19): isp0: port 0xdc00-0xdcff mem 0xfcf01000-0xfcf01fff irq 20 atdevice 4.0 on pci1 isp0: [GIANT-LOCKED] isp0: Board Type 2312, Chip Revision 0x2, loaded F/W Revision 3.3.19 isp1: port 0xd800-0xd8ff mem 0xfcf00000-0xfcf00fff irq 21 atdevice 4.1 on pci1 isp1: [GIANT-LOCKED] isp1: Board Type 2312, Chip Revision 0x2, loaded F/W Revision 3.3.19 I initially suspected a hardware problem, but I've retested with a couple of different QLA2342 cards in a couple of different 1750 systems. I did find one or two interesting things though. 1) Reinstalling 6.1 has everything working again (as long as ispfw_load="YES" is set). 2) if, under 6.3, I connect the HBA directly to a storage device (eg a tape drive/loader) instead of the SAN switch, it works fine. So possibly something related to point-to-point mode rather than fabric? 3) If I connect one port of the HBA to the tape loader, and the other to the SAN switch, it also boots up successfully, although we get the same errors reported as above, without the hang... isp0: Interrupting Mailbox Command (0x6f) Timeout (500000us) isp0: PLOGI 0x031a00 failed isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) isp0: isp_pdb_sync: isp_scan_fabric failed isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) isp0: isp_pdb_sync: isp_scan_fabric failed isp0: Mailbox Command 'SEND SNS' failed (COMMAND ERROR) isp0: isp_pdb_sync: isp_scan_fabric failed sa0 at isp1 bus 0 target 0 lun 0 sa0: Removable Sequential Access SCSI-4 device sa0: 200.000MB/s transfers da0 at isp0 bus 0 target 1 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 200.000MB/s transfers, Tagged Queueing Enabled da0: 953MB (1952256 512 byte sectors: 64H 32S/T 953C) ... Finally I booted with "hint.isp.0.debug=0x1F" in /boot/device.hints, while connected only to the switch, and... unfortunately I wasn't able to capture the extensive output, but the final text was: isp0: target 496 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 497 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 498 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 499 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 500 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 501 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 502 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 503 lun 0 CAM status 0xa SCSI status 0x0 isp0: target 504 lun 0 CAM status 0xa SCSI status 0x0 I will work on hooking up a serial console so I can capture the entire output, though I'm also wondering if anyone might have some advice on what to try next at this point? Graham -- ------------------------------------------------------------------------- Graham Allan - I.T. Manager - allan@physics.umn.edu - (612) 624-5040 School of Physics and Astronomy - University of Minnesota ------------------------------------------------------------------------- From allan at physics.umn.edu Fri May 9 21:56:22 2008 From: allan at physics.umn.edu (Graham Allan) Date: Fri May 9 21:56:27 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <20080509011028.GV25577@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> Message-ID: <20080509215621.GX25577@physics.umn.edu> On Thu, May 08, 2008 at 08:10:28PM -0500, Graham Allan wrote: > Hi, > > I've been trying to figure out a problem on a system which I just > upgraded from FreeBSD 6.1 to 6.3. It's a Dell 1750 with a QLA2342 > (isp2312) HBA, connected to an EMC DS-16B2 (aka Brocade 3800) switch, > and from there to a couple of AC&NC Jetstor arrays. It's all been > working fine like this for some time under 6.1. > > When I boot the system under 6.3, though I get a hang like this: > > isp0: Interrupting Mailbox Command (0x6f) Timeout (500000us) > isp0: PLOGI 0x031a00 failed > isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) > isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) > isp0: isp_pdb_sync: isp_scan_fabric failed > isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) > isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) > isp0: isp_pdb_sync: isp_scan_fabric failed > isp0: Mailbox Command 'SEND SNS' failed (COMMAND ERROR) > isp0: isp_pdb_sync: isp_scan_fabric failed > > after which the system stays hung. I've done some more testing on this and am left with a headache because nothing makes sense! Would certainly be grateful if any fiber channel gurus might comment. I tested again with FreeBSD 7.0 and got the same result as above. Next I tried swapping components in and out of the SAN to see if different combinations have different results. These tests were with 7.0 since that was what I still had installed. The SAN switch has the following devices connected: 1) AC&NC Jetstor 416F 2) AC&NC Jetstor 516F 3) Alphaserver ES40, Tru64 5.1B-6, KGPSA-CA (Emulex) HBA 4) Dell 1750, (the system I'm writing about), Qlogic 2342 HBA 5) Dell 1750, Windows 2003, LSI 7202P HBA There's no zoning on the switch since all three servers connect to the two Jetstors. Access to the RAID volumes is controlled by host filters on the Jetstors. So I tested with different combinations of devices connected to the switch, with perplexing results: FreeBSD + 416F only - boots fine FreeBSD + 516F only - boots fine FreeBSD + 416F + 516F - boots fine FreeBSD + 416F + 516F + windows - boots fine FreeBSD + 416F + 516F + ES40 - hangs with the above error but, continuing... FreeBSD + 516F + ES40 - boots fine FreeBSD + 416F + ES40 - boots fine FreeBSD + 416F + 516F + ES40 - hangs again I can't make any sense of this... there are so many different systems involved that there's no way to know where the problem really lies. Although as it did work with FreeBSD 6.1, it feels to me like something is wrong in the newer isp driver, but I have no solid knowledge to base that on. Thanks for any ideas, Graham -- ------------------------------------------------------------------------- Graham Allan - I.T. Manager - allan@physics.umn.edu - (612) 624-5040 School of Physics and Astronomy - University of Minnesota ------------------------------------------------------------------------- From lydianconcepts at gmail.com Sat May 10 04:41:33 2008 From: lydianconcepts at gmail.com (lydianconcepts@gmail.com) Date: Sat May 10 04:41:43 2008 Subject: updated freebsd-isp patches Message-ID: <20080509211554.J14314@ns1.feral.com> No particularily testing, but newer 2400 f/w and bunches of other stuff, including some n-port virtualization stuff. For whomever feels like integrating it into -current. ftp://ftp.feral.com/pub/isp/freebsd.isp.diffs MD5 (freebsd.isp.diffs) = f9aaa921ddb8982587088d703200686f From sbruno at miralink.com Sun May 11 01:07:04 2008 From: sbruno at miralink.com (Sean Bruno) Date: Sun May 11 01:07:07 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <20080509215621.GX25577@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> Message-ID: <482646B5.807@miralink.com> Graham Allan wrote: > On Thu, May 08, 2008 at 08:10:28PM -0500, Graham Allan wrote: > >> Hi, >> >> I've been trying to figure out a problem on a system which I just >> upgraded from FreeBSD 6.1 to 6.3. It's a Dell 1750 with a QLA2342 >> (isp2312) HBA, connected to an EMC DS-16B2 (aka Brocade 3800) switch, >> and from there to a couple of AC&NC Jetstor arrays. It's all been >> working fine like this for some time under 6.1. >> >> When I boot the system under 6.3, though I get a hang like this: >> >> isp0: Interrupting Mailbox Command (0x6f) Timeout (500000us) >> isp0: PLOGI 0x031a00 failed >> isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) >> isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) >> isp0: isp_pdb_sync: isp_scan_fabric failed >> isp0: Interrupting Mailbox Command (0x6e) Timeout (10000000us) >> isp0: Mailbox Command 'SEND SNS' failed (TIMEOUT) >> isp0: isp_pdb_sync: isp_scan_fabric failed >> isp0: Mailbox Command 'SEND SNS' failed (COMMAND ERROR) >> isp0: isp_pdb_sync: isp_scan_fabric failed >> >> after which the system stays hung. >> > > I've done some more testing on this and am left with a headache because > nothing makes sense! Would certainly be grateful if any fiber channel > gurus might comment. > > I tested again with FreeBSD 7.0 and got the same result as above. > > Next I tried swapping components in and out of the SAN to see if > different combinations have different results. These tests were with > 7.0 since that was what I still had installed. The SAN switch has the > following devices connected: > > 1) AC&NC Jetstor 416F > 2) AC&NC Jetstor 516F > 3) Alphaserver ES40, Tru64 5.1B-6, KGPSA-CA (Emulex) HBA > 4) Dell 1750, (the system I'm writing about), Qlogic 2342 HBA > 5) Dell 1750, Windows 2003, LSI 7202P HBA > > There's no zoning on the switch since all three servers connect to the > two Jetstors. Access to the RAID volumes is controlled by host filters > on the Jetstors. > > So I tested with different combinations of devices connected to the > switch, with perplexing results: > > FreeBSD + 416F only - boots fine > FreeBSD + 516F only - boots fine > FreeBSD + 416F + 516F - boots fine > FreeBSD + 416F + 516F + windows - boots fine > FreeBSD + 416F + 516F + ES40 - hangs with the above error > but, continuing... > FreeBSD + 516F + ES40 - boots fine > FreeBSD + 416F + ES40 - boots fine > FreeBSD + 416F + 516F + ES40 - hangs again > > I can't make any sense of this... there are so many different systems > involved that there's no way to know where the problem really lies. > Although as it did work with FreeBSD 6.1, it feels to me like something > is wrong in the newer isp driver, but I have no solid knowledge to base > that on. > > Thanks for any ideas, > > Graham > I see that you tested 6.1 but not 6.2 ... if you could, can you check 6.2? I'm trying to limit the code searching and that would help a bit. Sean From grafan at gmail.com Sun May 11 12:37:06 2008 From: grafan at gmail.com (Rong-en Fan) Date: Sun May 11 12:37:09 2008 Subject: default setting of WCE Message-ID: <6eb82e0805110511l2e814258p4ec1a22145da9477@mail.gmail.com> Hi, I'm wondering what's the default of our da(4) regarding the WCE (write cache) bit. I see some of my disks has this bit on by default and some are off. I guess it depends on the capabilities returned by the underlying device when probing? In addition, on the same hardware, I found that Linux has write cache enabled by default... (yes, I have read da(4) and knows the pros/cons of enabling this). Thanks, Rong-En Fan From allan at physics.umn.edu Sun May 11 21:10:47 2008 From: allan at physics.umn.edu (Graham Allan) Date: Sun May 11 21:10:51 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <482646B5.807@miralink.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> Message-ID: <482760D0.1070106@physics.umn.edu> Sean Bruno wrote: > I see that you tested 6.1 but not 6.2 ... if you could, can you check 6.2? > > I'm trying to limit the code searching and that would help a bit. I'm downloading 6.2 now - should be able to test it tomorrow morning. Thanks! Graham From sbruno at miralink.com Sun May 11 21:30:11 2008 From: sbruno at miralink.com (Sean Bruno) Date: Sun May 11 21:30:14 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <482760D0.1070106@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> Message-ID: <48276560.30302@miralink.com> Graham Allan wrote: > Sean Bruno wrote: > > I see that you tested 6.1 but not 6.2 ... if you could, can you > check 6.2? > > > > I'm trying to limit the code searching and that would help a bit. > > I'm downloading 6.2 now - should be able to test it tomorrow morning. > > Thanks! > > Graham Also, have you tried cvsup'ing your kernel to RELENG_6 and recompiling? Sean From allan at physics.umn.edu Mon May 12 02:38:25 2008 From: allan at physics.umn.edu (Graham Allan) Date: Mon May 12 02:38:29 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <48276560.30302@miralink.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> Message-ID: <4827AD9F.50202@physics.umn.edu> Sean Bruno wrote: > Graham Allan wrote: >> Sean Bruno wrote: >> > I see that you tested 6.1 but not 6.2 ... if you could, can you >> check 6.2? >> > >> > I'm trying to limit the code searching and that would help a bit. >> >> I'm downloading 6.2 now - should be able to test it tomorrow morning. >> > Also, have you tried cvsup'ing your kernel to RELENG_6 and recompiling? No, I only used RELENG_6_3 initially - I can try RELENG_6 after base 6.2, I guess. I did also try one or two other crazy things - like I noticed the isp2300 firmware in ispfw changed versions between 6.1 and 6.3 so I tried rebuilding 6.3 but using the 6.1 firmware (basically copying asm_2300.h from one to the other). Probably a stupid idea to anyone who knows more about the driver than me - didn't help anyway! Graham From bugmaster at FreeBSD.org Mon May 12 11:07:05 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 12 11:07:17 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200805121107.m4CB74sS038140@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc 8 problems total. From scottl at samsco.org Mon May 12 13:58:52 2008 From: scottl at samsco.org (Scott Long) Date: Mon May 12 13:58:56 2008 Subject: default setting of WCE In-Reply-To: <6eb82e0805110511l2e814258p4ec1a22145da9477@mail.gmail.com> References: <6eb82e0805110511l2e814258p4ec1a22145da9477@mail.gmail.com> Message-ID: <48284D18.5010801@samsco.org> Rong-en Fan wrote: > Hi, > > I'm wondering what's the default of our da(4) regarding > the WCE (write cache) bit. I see some of my disks has this > bit on by default and some are off. I guess it depends on > the capabilities returned by the underlying device when > probing? In addition, on the same hardware, I found that > Linux has write cache enabled by default... (yes, I have read > da(4) and knows the pros/cons of enabling this). > The 'da' driver leaves the WCE setting alone by default. The MPT driver does enable the write cache on SATA drives that it finds on SAS controllers, but this is specific to this driver and it happens unknown to the da driver. Scott From grafan at gmail.com Mon May 12 15:58:03 2008 From: grafan at gmail.com (Rong-en Fan) Date: Mon May 12 15:58:07 2008 Subject: default setting of WCE In-Reply-To: <48284D18.5010801@samsco.org> References: <6eb82e0805110511l2e814258p4ec1a22145da9477@mail.gmail.com> <48284D18.5010801@samsco.org> Message-ID: <6eb82e0805120857r2e7ff540ub44b3f8661aeba91@mail.gmail.com> On Mon, May 12, 2008 at 9:58 PM, Scott Long wrote: > Rong-en Fan wrote: >> >> Hi, >> >> I'm wondering what's the default of our da(4) regarding >> the WCE (write cache) bit. I see some of my disks has this >> bit on by default and some are off. I guess it depends on >> the capabilities returned by the underlying device when >> probing? In addition, on the same hardware, I found that >> Linux has write cache enabled by default... (yes, I have read >> da(4) and knows the pros/cons of enabling this). >> > > The 'da' driver leaves the WCE setting alone by default. The > MPT driver does enable the write cache on SATA drives that it > finds on SAS controllers, but this is specific to this driver > and it happens unknown to the da driver. Ok, so it depends on the driver that the disk attached to. Thanks for clarifying. Regards, Rong-En Fan > > Scott > From pisymbol at gmail.com Mon May 12 16:47:36 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Mon May 12 16:47:41 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <4827AD9F.50202@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> Message-ID: <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> On Sun, May 11, 2008 at 10:38 PM, Graham Allan wrote: > Sean Bruno wrote: > > Graham Allan wrote: > >> Sean Bruno wrote: > >> > I see that you tested 6.1 but not 6.2 ... if you could, can you > >> check 6.2? > >> > > >> > I'm trying to limit the code searching and that would help a bit. > >> > >> I'm downloading 6.2 now - should be able to test it tomorrow morning. > >> > > Also, have you tried cvsup'ing your kernel to RELENG_6 and recompiling? > > No, I only used RELENG_6_3 initially - I can try RELENG_6 after base 6.2, I > guess. > > I did also try one or two other crazy things - like I noticed the isp2300 > firmware in ispfw changed versions between 6.1 and 6.3 so I tried rebuilding > 6.3 but using the 6.1 firmware (basically copying asm_2300.h from one to the > other). Probably a stupid idea to anyone who knows more about the driver > than me - didn't help anyway! Graham, from the driver error messages it seems that the card believes you are on a switched fabric and that it most likely is logging into the SNS server to lookup names/addresses for your devices. Are you sure that your switched fabric is setup correctly? I missed part of this thread so I apologize if this topic has already been hashed out. If for some reason the host can not log into the SNS server and retrieve entries from the database, then you are going to be hosed (I agree the OS shouldn't be hung unless you are booting off the disk connected to the failed controller, etc.). I am very familiar with the ISP23/4xx chipset and I go digging more but I was wondering if you have verified that your topology is valid. -aps From allan at physics.umn.edu Mon May 12 17:14:06 2008 From: allan at physics.umn.edu (Graham Allan) Date: Mon May 12 17:14:11 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> Message-ID: <20080512171404.GE25577@physics.umn.edu> On Mon, May 12, 2008 at 12:19:49PM -0400, Alexander Sack wrote: > > Graham, from the driver error messages it seems that the card believes > you are on a switched fabric and that it most likely is logging into > the SNS server to lookup names/addresses for your devices. Are you > sure that your switched fabric is setup correctly? I missed part of > this thread so I apologize if this topic has already been hashed out. > If for some reason the host can not log into the SNS server and > retrieve entries from the database, then you are going to be hosed (I > agree the OS shouldn't be hung unless you are booting off the disk > connected to the failed controller, etc.). > > I am very familiar with the ISP23/4xx chipset and I go digging more > but I was wondering if you have verified that your topology is valid. I'm happy to confess to being a SAN novice, so I'm not quite sure how I would verify that, other than that it "seems to work" ok on the older OS release, and also in specific circumstances on the current one - for example, if one port of the HBA is connected directly to a device, and the other to the fabric, it doesn't have a problem - so in that situation it is able to log in to the fabric ok and retrieve database information. Even when it does hang, it does appear to have logged in to the fabric ok, according to my interpretation of the switch output: fcswitch_s43_2:admin> portshow 8 portName: portHealth: No License Authentication: None portFlags: 0x223805b portLbMod: 0x0 PRESENT ACTIVE F_PORT G_PORT U_PORT LOGIN NOELP LED ACCEPT WAS_EPORT portType: 4.1 portState: 1 Online portPhys: 6 In_Sync portScn: 6 F_Port portRegs: 0x81100000 portData: 0x11deb230 portId: 031800 portWwn: 20:08:00:60:69:51:4a:20 portWwn of device(s) connected: 21:00:00:e0:8b:08:06:d2 Distance: normal Speed: N2Gbps Interrupts: 20487 Link_failure: 18 Frjt: 0 Unknown: 404 Loss_of_sync: 12295 Fbsy: 0 Lli: 13715 Loss_of_sig: 93 Proc_rqrd: 6646 Protocol_err: 0 Timed_out: 0 Invalid_word: 0 Rx_flushed: 0 Invalid_crc: 0 Tx_unavail: 0 Delim_err: 0 Free_buffer: 0 Address_err: 0 Overrun: 0 Lr_in: 36 Suspended: 0 Lr_out: 73 Parity_err: 0 Ols_in: 73 and it's listed in the switch name server (third entry down, 031800): fcswitch_s43_2:admin> nsshow { Type Pid COS PortName NodeName TTL(sec) N 031300; 3;21:00:00:04:d9:60:17:6e;20:00:00:04:d9:60:17:6d; na FC4s: FCP PortSymb: [39] "UNKNOWN A.0 UNKNOWN FW:01.02 Port 1 " Fabric Port Name: 20:03:00:60:69:51:4a:20 N 031500; 3;21:00:00:1b:4d:00:83:ed;20:00:00:1b:4d:00:83:ec; na FC4s: FCP [JetStor FreeBSD mark R4 R001] Fabric Port Name: 20:05:00:60:69:51:4a:20 N 031800; 3;21:00:00:e0:8b:08:06:d2;20:00:00:e0:8b:08:06:d2; na FC4s: FCP Fabric Port Name: 20:08:00:60:69:51:4a:20 N 031900; 3;10:00:00:06:2b:09:4f:d8;20:00:00:06:2b:09:4f:d8; na FC4s: FCIP FCP PortSymb: [47] "LSI7202P B.0 03-01001-02A FW:1.00.06 Port 0 " Fabric Port Name: 20:09:00:60:69:51:4a:20 N 031a00; 2,3;10:00:00:00:c9:24:5b:04;20:00:00:00:c9:24:5b:04; na FC4s: FCP PortSymb: [49] "UNIX (emx2) KGPSA-CA S/W Rev 2.25: F/W Rev 3.93a0" Fabric Port Name: 20:0a:00:60:69:51:4a:20 The Local Name Server has 5 entries } It has been pointed out to me that this kind of weird interaction isn't exactly unknown in the SAN world, and setting up zoning on the switch would probably make it go away. So I will also try that (it's probably a giveway of a SAN novice that I hadn't already done so - it certainly does sound like it would help). But if the hang does point to a problem in the driver, I'm also happy to keep trying different things in the hope of revealing where the problem actually lies. Graham From allan at physics.umn.edu Mon May 12 17:26:13 2008 From: allan at physics.umn.edu (Graham Allan) Date: Mon May 12 17:26:16 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <48276560.30302@miralink.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> Message-ID: <20080512172612.GF25577@physics.umn.edu> On Sun, May 11, 2008 at 02:30:08PM -0700, Sean Bruno wrote: > >> I see that you tested 6.1 but not 6.2 ... if you could, can you > >check 6.2? > >> > >> I'm trying to limit the code searching and that would help a bit. > > > >I'm downloading 6.2 now - should be able to test it tomorrow morning. > > > Also, have you tried cvsup'ing your kernel to RELENG_6 and recompiling? I didn't get our 6.2 network install set up all the way, but during the installation boot (which identifies itself as 6.2-RELEASE-p0) it identifies all available devices on the SAN without hanging, so I think that version works fine. Next I will revert to 6.3, and cvsup it to RELENG_6... Graham -- ------------------------------------------------------------------------- Graham Allan - I.T. Manager - allan@physics.umn.edu - (612) 624-5040 School of Physics and Astronomy - University of Minnesota ------------------------------------------------------------------------- From sbruno at miralink.com Mon May 12 17:29:39 2008 From: sbruno at miralink.com (Sean Bruno) Date: Mon May 12 17:29:43 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <20080512172612.GF25577@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <20080512172612.GF25577@physics.umn.edu> Message-ID: <48287E81.5070004@miralink.com> Graham Allan wrote: > On Sun, May 11, 2008 at 02:30:08PM -0700, Sean Bruno wrote: > >>>> I see that you tested 6.1 but not 6.2 ... if you could, can you >>>> >>> check 6.2? >>> >>>> I'm trying to limit the code searching and that would help a bit. >>>> >>> I'm downloading 6.2 now - should be able to test it tomorrow morning. >>> >>> >> Also, have you tried cvsup'ing your kernel to RELENG_6 and recompiling? >> > > I didn't get our 6.2 network install set up all the way, but during the > installation boot (which identifies itself as 6.2-RELEASE-p0) it > identifies all available devices on the SAN without hanging, so I think > that version works fine. > > Next I will revert to 6.3, and cvsup it to RELENG_6... > > Graham > Interesting. I'm intrigued. Let me know. Sean From allan at physics.umn.edu Mon May 12 21:09:30 2008 From: allan at physics.umn.edu (Graham Allan) Date: Mon May 12 21:09:34 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <48287E81.5070004@miralink.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <20080512172612.GF25577@physics.umn.edu> <48287E81.5070004@miralink.com> Message-ID: <20080512210929.GJ25577@physics.umn.edu> On Mon, May 12, 2008 at 10:29:37AM -0700, Sean Bruno wrote: > Graham Allan wrote: > > > >I didn't get our 6.2 network install set up all the way, but during the > >installation boot (which identifies itself as 6.2-RELEASE-p0) it > >identifies all available devices on the SAN without hanging, so I think > >that version works fine. > > > >Next I will revert to 6.3, and cvsup it to RELENG_6... > > > >Graham > > > Interesting. I'm intrigued. > > Let me know. I get the same behaviour on RELENG_6 (kernel reports itself as 6.3-STABLE #4). Do you think I should try with -CURRENT? Would that even work with a 6.3 userland, or would I need to build world as well? I can also re-enable debug output in the isp driver and report what is going on there. Unfortunately last time I tried this the two results were not that useful: (1) capturing it from the screen (or Dell RAC output) you only get the last 20-odd lines; or (2) redirecting to a serial console slightly changes the behaviour (it still never boots but remains forever in some kind of retry loop), perhaps because the kernel execution is throttled to the 9600 baud output speed. Could try 115kbaud, but that is still pretty slow, relative to no serial. Graham From allan at physics.umn.edu Wed May 14 01:43:08 2008 From: allan at physics.umn.edu (Graham Allan) Date: Wed May 14 01:43:13 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <20080512171404.GE25577@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> <20080512171404.GE25577@physics.umn.edu> Message-ID: <20080514014307.GV25577@physics.umn.edu> On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote: > > It has been pointed out to me that this kind of weird interaction isn't > exactly unknown in the SAN world, and setting up zoning on the switch > would probably make it go away. So I will also try that (it's probably > a giveway of a SAN novice that I hadn't already done so - it certainly > does sound like it would help). But if the hang does point to a problem > in the driver, I'm also happy to keep trying different things in the > hope of revealing where the problem actually lies. Replying to my own message here. The good news for me is that setting up zoning in the switch does fix (or at least hide) the problem on this server for me. The bad news is, I believe I'm seeing a similar kind of behaviour on a completely different 6.3 setup. Haven't had time to fully characterise it yet, but in short... Dell 1950 with QLA2342, connected directly to an EMC CX300 array. Very often (lets say unpredictably 50% of time) hangs during boot at exactly the same point as the first system, right around the time it would be probing for drives. Graham From vwe at FreeBSD.org Wed May 14 11:34:07 2008 From: vwe at FreeBSD.org (vwe@FreeBSD.org) Date: Wed May 14 11:34:11 2008 Subject: kern/123666: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Message-ID: <200805141134.m4EBY6CZ065724@freefall.freebsd.org> Synopsis: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi Responsible-Changed-By: vwe Responsible-Changed-When: Wed May 14 11:33:35 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=123666 From vwe at FreeBSD.org Wed May 14 21:20:14 2008 From: vwe at FreeBSD.org (vwe@FreeBSD.org) Date: Wed May 14 21:20:16 2008 Subject: kern/123674: [ahc] ahc driver dumping Message-ID: <200805142120.m4ELKDj0019052@freefall.freebsd.org> Old Synopsis: ahc driver dumping New Synopsis: [ahc] ahc driver dumping Responsible-Changed-From-To: freebsd-i386->freebsd-scsi Responsible-Changed-By: vwe Responsible-Changed-When: Wed May 14 21:18:20 UTC 2008 Responsible-Changed-Why: doesn't sound i386 specific - reclassify often these messages are a sign of bad media, faulty hardware, bad firmware, bad cables or a drive cleaning might be needed Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=123674 From allan at physics.umn.edu Thu May 15 13:02:03 2008 From: allan at physics.umn.edu (Graham Allan) Date: Thu May 15 13:02:08 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <20080514014307.GV25577@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> <20080512171404.GE25577@physics.umn.edu> <20080514014307.GV25577@physics.umn.edu> Message-ID: <482C3446.8010203@physics.umn.edu> Graham Allan wrote: > On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote: >> It has been pointed out to me that this kind of weird interaction isn't >> exactly unknown in the SAN world, and setting up zoning on the switch >> would probably make it go away. So I will also try that (it's probably >> a giveway of a SAN novice that I hadn't already done so - it certainly >> does sound like it would help). But if the hang does point to a problem >> in the driver, I'm also happy to keep trying different things in the >> hope of revealing where the problem actually lies. > > Replying to my own message here. > > The good news for me is that setting up zoning in the switch does fix > (or at least hide) the problem on this server for me. > > The bad news is, I believe I'm seeing a similar kind of behaviour on a > completely different 6.3 setup. Haven't had time to fully characterise > it yet, but in short... Dell 1950 with QLA2342, connected directly to > an EMC CX300 array. Very often (lets say unpredictably 50% of time) > hangs during boot at exactly the same point as the first system, right > around the time it would be probing for drives. So I guess one thing I could do is build a kernal with debugging support (and possibly the "deadlock recipe" from the freebsd handbook), and force it to the debugger when it hangs. Then I could at least get some tracebacks and other information - though as it never actually panics I'm not sure how useful the information will be - I guess it's likely stuck in a loop somehow. It should give some clue. Does that sound like a reasonable idea? Does the kernel version matter (eg standard 6.3 vs RELENG_6)? Is this list the most appropriate place for me to talk about the issue? (I also think I should double-check 6.2 again, as its release notes indicate it was where isp was synced from CURRENT - I'd think it should have the same issue). Thanks for everyones interest, Graham From sbruno at miralink.com Thu May 15 17:31:59 2008 From: sbruno at miralink.com (Sean Bruno) Date: Thu May 15 17:32:01 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <482C3446.8010203@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> <20080512171404.GE25577@physics.umn.edu> <20080514014307.GV25577@physics.umn.edu> <482C3446.8010203@physics.umn.edu> Message-ID: <482C738B.9040209@miralink.com> Graham Allan wrote: > Graham Allan wrote: >> On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote: >>> It has been pointed out to me that this kind of weird interaction isn't >>> exactly unknown in the SAN world, and setting up zoning on the switch >>> would probably make it go away. So I will also try that (it's probably >>> a giveway of a SAN novice that I hadn't already done so - it certainly >>> does sound like it would help). But if the hang does point to a problem >>> in the driver, I'm also happy to keep trying different things in the >>> hope of revealing where the problem actually lies. >> >> Replying to my own message here. >> >> The good news for me is that setting up zoning in the switch does fix >> (or at least hide) the problem on this server for me. >> >> The bad news is, I believe I'm seeing a similar kind of behaviour on a >> completely different 6.3 setup. Haven't had time to fully characterise >> it yet, but in short... Dell 1950 with QLA2342, connected directly to >> an EMC CX300 array. Very often (lets say unpredictably 50% of time) >> hangs during boot at exactly the same point as the first system, right >> around the time it would be probing for drives. > > So I guess one thing I could do is build a kernal with debugging > support (and possibly the "deadlock recipe" from the freebsd > handbook), and force it to the debugger when it hangs. Then I could at > least get some tracebacks and other information - though as it never > actually panics I'm not sure how useful the information will be - I > guess it's likely stuck in a loop somehow. It should give some clue. > > Does that sound like a reasonable idea? Does the kernel version matter > (eg standard 6.3 vs RELENG_6)? Is this list the most appropriate place > for me to talk about the issue? > > (I also think I should double-check 6.2 again, as its release notes > indicate it was where isp was synced from CURRENT - I'd think it > should have the same issue). > > Thanks for everyones interest, > > Graham > _______________________________________________ If you can, I'd do the testing with RELENG_7. Sean From scottl at samsco.org Thu May 15 23:58:26 2008 From: scottl at samsco.org (Scott Long) Date: Thu May 15 23:58:29 2008 Subject: Hang on boot in isp with QLA2342 after upgrading to 6.3 In-Reply-To: <482C3446.8010203@physics.umn.edu> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> <20080512171404.GE25577@physics.umn.edu> <20080514014307.GV25577@physics.umn.edu> <482C3446.8010203@physics.umn.edu> Message-ID: <482CCE1D.70703@samsco.org> Graham Allan wrote: > Graham Allan wrote: >> On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote: >>> It has been pointed out to me that this kind of weird interaction isn't >>> exactly unknown in the SAN world, and setting up zoning on the switch >>> would probably make it go away. So I will also try that (it's probably >>> a giveway of a SAN novice that I hadn't already done so - it certainly >>> does sound like it would help). But if the hang does point to a problem >>> in the driver, I'm also happy to keep trying different things in the >>> hope of revealing where the problem actually lies. >> >> Replying to my own message here. >> >> The good news for me is that setting up zoning in the switch does fix >> (or at least hide) the problem on this server for me. >> >> The bad news is, I believe I'm seeing a similar kind of behaviour on a >> completely different 6.3 setup. Haven't had time to fully characterise >> it yet, but in short... Dell 1950 with QLA2342, connected directly to >> an EMC CX300 array. Very often (lets say unpredictably 50% of time) >> hangs during boot at exactly the same point as the first system, right >> around the time it would be probing for drives. > > So I guess one thing I could do is build a kernal with debugging support > (and possibly the "deadlock recipe" from the freebsd handbook), and > force it to the debugger when it hangs. Then I could at least get some > tracebacks and other information - though as it never actually panics > I'm not sure how useful the information will be - I guess it's likely > stuck in a loop somehow. It should give some clue. > > Does that sound like a reasonable idea? Does the kernel version matter > (eg standard 6.3 vs RELENG_6)? Is this list the most appropriate place > for me to talk about the issue? > > (I also think I should double-check 6.2 again, as its release notes > indicate it was where isp was synced from CURRENT - I'd think it should > have the same issue). > > Thanks for everyones interest, > > Graham Well, is it actually deadlocking, or just holding up the boot while it tries to individually probe many thousands of target and lun ID's? I'd bet it's the latter. Compiling in the debugger is the correct first step. You can then compile in CAMDEBUG, CAM_DEBUG_LUN=-1, and CAM_DEBUG_FLAGS=CAM_DEBUG_INFO. Scott From bugmaster at FreeBSD.org Mon May 19 11:06:59 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 19 11:07:56 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200805191106.m4JB6x9i011711@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] aac(4) will not work with Adaptec SAS RAID 3805 o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From warren.guy at calorieking.com Tue May 20 12:42:46 2008 From: warren.guy at calorieking.com (Warren Guy) Date: Tue May 20 12:42:49 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 Message-ID: <4832C397.3090004@calorieking.com> Hi, We have a few recently acquired Dell Poweredge SC440 machines in our office running FreeBSD 6.3 that are experiencing very poor disk performance. The RAID controller they are supplied with appears to be an "LSI Logic SAS 3000 Series". There are two 250GB SATA disks in RAID 1 configuration. 6-7MB/s seems to be the best I can get out of them. A Linux 2.6 machine with identical hardware and disk/controller configuration exhibits no such performance impediments. Any information or pointers are greatly appreciated. bonnie++ output: Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP eva.internal 300M 306 99 6722 1 5591 1 533 99 +++++ +++ 2359 37 Latency 26715us 157ms 2193ms 25385us 808us 3852us Version 1.93c ------Sequential Create------ --------Random Create-------- eva.internal -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 4572 11 +++++ +++ +++++ +++ 3019 7 +++++ +++ 28367 40 Latency 1277ms 23us 33us 1396ms 92us 317ms 1.93c,1.93c,eva.internal,1,1211281869,300M,,306,99,6722,1,5591,1,533,99,+++++,+++,2359,37,16,,,,,4572,11,+++++,+++,+++++,+++,3019,7,+++++,+++,28367,40,26715us,157ms,2193ms,25385us,808us,3852us,1277ms,23us,33us,1396ms,92us,317ms and bonnie++ from the Linux 2.6 machine: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP webvms 300M 41255 78 114551 26 565402 88 52867 94 +++++ +++ +++++ +++ ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3571 95 +++++ +++ +++++ +++ 3543 92 +++++ +++ 10787 82 webvms,300M,41255,78,114551,26,565402,88,52867,94,+++++,+++,+++++,+++,16,3571,95,+++++,+++,+++++,+++,3543,92,+++++,+++,10787,82 uname: FreeBSD eva.internal 6.3-RELEASE-p1 FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 13 02:56:56 UTC 2008 root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/SMP i386 from dmesg: CPU: Intel(R) Xeon(R) CPU 3050 @ 2.13GHz (2128.01-MHz 686-class CPU) real memory = 2145894400 (2046 MB) avail memory = 2090598400 (1993 MB) ... mpt0: port 0xdc00-0xdcff mem 0xefdec000-0xefdeffff,0xefdf0000-0xefdfffff irq 16 at device 8.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.5.13.0 mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). ... da0 at mpt0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers, Tagged Queueing Enabled da0: 237464MB (486326272 512 byte sectors: 255H 63S/T 30272C) pciconf: mpt0@pci2:8:0: class=0x010000 card=0x1f091028 chip=0x00541000 rev=0x01 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' device = 'SAS 3000 series, 8-port with 1068 -StorPort' class = mass storage subclass = SCSI -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20080520/6e59c600/signature.pgp From scottl at samsco.org Tue May 20 14:32:20 2008 From: scottl at samsco.org (Scott Long) Date: Tue May 20 14:32:23 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4832C397.3090004@calorieking.com> References: <4832C397.3090004@calorieking.com> Message-ID: <4832E0EE.3030402@samsco.org> Add the following to /boot/loader.conf: hw.mpt.enable_sata_wc=1 Scott Warren Guy wrote: > Hi, > > We have a few recently acquired Dell Poweredge SC440 machines in our office > running FreeBSD 6.3 that are experiencing very poor disk performance. The RAID > controller they are supplied with appears to be an "LSI Logic SAS 3000 Series". > There are two 250GB SATA disks in RAID 1 configuration. 6-7MB/s seems to be the > best I can get out of them. A Linux 2.6 machine with identical hardware and > disk/controller configuration exhibits no such performance impediments. > > Any information or pointers are greatly appreciated. > > bonnie++ output: > > Version 1.93c ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > eva.internal 300M 306 99 6722 1 5591 1 533 99 +++++ +++ 2359 37 > Latency 26715us 157ms 2193ms 25385us 808us 3852us > Version 1.93c ------Sequential Create------ --------Random Create-------- > eva.internal -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 4572 11 +++++ +++ +++++ +++ 3019 7 +++++ +++ 28367 40 > Latency 1277ms 23us 33us 1396ms 92us 317ms > 1.93c,1.93c,eva.internal,1,1211281869,300M,,306,99,6722,1,5591,1,533,99,+++++,+++,2359,37,16,,,,,4572,11,+++++,+++,+++++,+++,3019,7,+++++,+++,28367,40,26715us,157ms,2193ms,25385us,808us,3852us,1277ms,23us,33us,1396ms,92us,317ms > > and bonnie++ from the Linux 2.6 machine: > > Version 1.03 ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > webvms 300M 41255 78 114551 26 565402 88 52867 94 +++++ +++ +++++ +++ > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 3571 95 +++++ +++ +++++ +++ 3543 92 +++++ +++ 10787 82 > webvms,300M,41255,78,114551,26,565402,88,52867,94,+++++,+++,+++++,+++,16,3571,95,+++++,+++,+++++,+++,3543,92,+++++,+++,10787,82 > > > > uname: > FreeBSD eva.internal 6.3-RELEASE-p1 FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 13 > 02:56:56 UTC 2008 root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/SMP > i386 > > from dmesg: > CPU: Intel(R) Xeon(R) CPU 3050 @ 2.13GHz (2128.01-MHz 686-class CPU) > real memory = 2145894400 (2046 MB) > avail memory = 2090598400 (1993 MB) > ... > mpt0: port 0xdc00-0xdcff mem > 0xefdec000-0xefdeffff,0xefdf0000-0xefdfffff irq 16 at device 8.0 on pci2 > mpt0: [GIANT-LOCKED] > mpt0: MPI Version=1.5.13.0 > mpt0: mpt_cam_event: 0x16 > mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). > mpt0: mpt_cam_event: 0x12 > mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). > mpt0: mpt_cam_event: 0x12 > mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). > mpt0: mpt_cam_event: 0x16 > mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). > ... > da0 at mpt0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-5 device > da0: 300.000MB/s transfers, Tagged Queueing Enabled > da0: 237464MB (486326272 512 byte sectors: 255H 63S/T 30272C) > > pciconf: > mpt0@pci2:8:0: class=0x010000 card=0x1f091028 chip=0x00541000 rev=0x01 hdr=0x00 > vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > device = 'SAS 3000 series, 8-port with 1068 -StorPort' > class = mass storage > subclass = SCSI > From warren.guy at calorieking.com Tue May 20 14:56:49 2008 From: warren.guy at calorieking.com (Warren Guy) Date: Tue May 20 14:56:54 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4832E0EE.3030402@samsco.org> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> Message-ID: <4832E6C2.7040205@calorieking.com> Scott, Thanks a lot for that. This seems to have alleviated the problem, I'm seeing decent performance now in my limited benchmark. It seems quite odd to me that the write cache is not enabled by default, but oh well. Thanks again for your help! Warren Scott Long wrote: > Add the following to /boot/loader.conf: > > hw.mpt.enable_sata_wc=1 > > > Scott > > > Warren Guy wrote: >> Hi, >> >> We have a few recently acquired Dell Poweredge SC440 machines in our >> office >> running FreeBSD 6.3 that are experiencing very poor disk performance. >> The RAID >> controller they are supplied with appears to be an "LSI Logic SAS 3000 >> Series". >> There are two 250GB SATA disks in RAID 1 configuration. 6-7MB/s seems >> to be the >> best I can get out of them. A Linux 2.6 machine with identical >> hardware and >> disk/controller configuration exhibits no such performance impediments. >> >> Any information or pointers are greatly appreciated. >> >> bonnie++ output: >> >> Version 1.93c ------Sequential Output------ --Sequential Input- >> --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec %CP >> eva.internal 300M 306 99 6722 1 5591 1 533 99 +++++ +++ >> 2359 37 >> Latency 26715us 157ms 2193ms 25385us 808us >> 3852us >> Version 1.93c ------Sequential Create------ --------Random >> Create-------- >> eva.internal -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 4572 11 +++++ +++ +++++ +++ 3019 7 +++++ +++ >> 28367 40 >> Latency 1277ms 23us 33us 1396ms >> 92us 317ms >> 1.93c,1.93c,eva.internal,1,1211281869,300M,,306,99,6722,1,5591,1,533,99,+++++,+++,2359,37,16,,,,,4572,11,+++++,+++,+++++,+++,3019,7,+++++,+++,28367,40,26715us,157ms,2193ms,25385us,808us,3852us,1277ms,23us,33us,1396ms,92us,317ms >> >> >> and bonnie++ from the Linux 2.6 machine: >> >> Version 1.03 ------Sequential Output------ --Sequential Input- >> --Random- >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec %CP >> webvms 300M 41255 78 114551 26 565402 88 52867 94 +++++ >> +++ +++++ +++ >> ------Sequential Create------ --------Random >> Create-------- >> -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 3571 95 +++++ +++ +++++ +++ 3543 92 +++++ +++ >> 10787 82 >> webvms,300M,41255,78,114551,26,565402,88,52867,94,+++++,+++,+++++,+++,16,3571,95,+++++,+++,+++++,+++,3543,92,+++++,+++,10787,82 >> >> >> >> >> uname: >> FreeBSD eva.internal 6.3-RELEASE-p1 FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 13 >> 02:56:56 UTC 2008 >> root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/SMP >> i386 >> >> from dmesg: >> CPU: Intel(R) Xeon(R) CPU 3050 @ 2.13GHz (2128.01-MHz >> 686-class CPU) >> real memory = 2145894400 (2046 MB) >> avail memory = 2090598400 (1993 MB) >> ... >> mpt0: port 0xdc00-0xdcff mem >> 0xefdec000-0xefdeffff,0xefdf0000-0xefdfffff irq 16 at device 8.0 on pci2 >> mpt0: [GIANT-LOCKED] >> mpt0: MPI Version=1.5.13.0 >> mpt0: mpt_cam_event: 0x16 >> mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). >> mpt0: mpt_cam_event: 0x12 >> mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). >> mpt0: mpt_cam_event: 0x12 >> mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). >> mpt0: mpt_cam_event: 0x16 >> mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). >> ... >> da0 at mpt0 bus 0 target 0 lun 0 >> da0: Fixed Direct Access SCSI-5 device >> da0: 300.000MB/s transfers, Tagged Queueing Enabled >> da0: 237464MB (486326272 512 byte sectors: 255H 63S/T 30272C) >> >> pciconf: >> mpt0@pci2:8:0: class=0x010000 card=0x1f091028 chip=0x00541000 >> rev=0x01 hdr=0x00 >> vendor = 'LSI Logic (Was: Symbios Logic, NCR)' >> device = 'SAS 3000 series, 8-port with 1068 -StorPort' >> class = mass storage >> subclass = SCSI >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20080520/b788fd0b/signature.pgp From andrew at modulus.org Tue May 20 22:09:49 2008 From: andrew at modulus.org (Andrew Snow) Date: Tue May 20 22:09:53 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4832E6C2.7040205@calorieking.com> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> Message-ID: <4833483B.4030208@modulus.org> Warren Guy wrote: > Thanks a lot for that. This seems to have alleviated the problem, I'm seeing > decent performance now in my limited benchmark. It seems quite odd to me that > the write cache is not enabled by default, but oh well. Technically with UFS, it can lead to filesystem or database corruption - when the power goes off suddenly, the OS+controller thinks the data has been written, but the drive has it in cache that is not battery-backed. From delphij at delphij.net Tue May 20 23:06:15 2008 From: delphij at delphij.net (Xin LI) Date: Tue May 20 23:06:17 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4833483B.4030208@modulus.org> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> <4833483B.4030208@modulus.org> Message-ID: <4833595D.1070409@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Snow wrote: | Warren Guy wrote: |> Thanks a lot for that. This seems to have alleviated the problem, I'm |> seeing |> decent performance now in my limited benchmark. It seems quite odd to |> me that |> the write cache is not enabled by default, but oh well. | | Technically with UFS, it can lead to filesystem or database corruption - | when the power goes off suddenly, the OS+controller thinks the data has | been written, but the drive has it in cache that is not battery-backed. I believe that data corruption can happen on every place if writing cache is lost, it's not an UFS-specific feature :) Cheers, - -- ** Help China's quake relief at http://www.redcross.org.cn/ |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkgzWV0ACgkQi+vbBBjt66C31ACfQHkDuW7ahYNIjL652ALOm/S4 pZYAoJLWOkyhs1PBbrKpg36MqGFN8iE9 =csG8 -----END PGP SIGNATURE----- From andrew at modulus.org Tue May 20 23:33:53 2008 From: andrew at modulus.org (Andrew Snow) Date: Tue May 20 23:33:56 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4833595D.1070409@delphij.net> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> <4833483B.4030208@modulus.org> <4833595D.1070409@delphij.net> Message-ID: <48335FD5.3000800@modulus.org> Xin LI wrote: > I believe that data corruption can happen on every place if writing > cache is lost, it's not an UFS-specific feature :) ZFS has the promise of not requirring safe hardware write-caches due to its parity data and intent log. From my understanding, it can promise that a data block has been written completely and safely, and that filesystem metadata is not corrupted, but out-of-order writes could still happen. - Andrew From delphij at delphij.net Tue May 20 23:38:38 2008 From: delphij at delphij.net (Xin LI) Date: Tue May 20 23:38:42 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <48335FD5.3000800@modulus.org> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> <4833483B.4030208@modulus.org> <4833595D.1070409@delphij.net> <48335FD5.3000800@modulus.org> Message-ID: <483360F5.6090404@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Snow wrote: | Xin LI wrote: |> I believe that data corruption can happen on every place if writing |> cache is lost, it's not an UFS-specific feature :) | | ZFS has the promise of not requirring safe hardware write-caches due to | its parity data and intent log. | | From my understanding, it can promise that a data block has been | written completely and safely, and that filesystem metadata is not | corrupted, but out-of-order writes could still happen. ZFS's promise is based on the fact that it does not overwrite data. Oops, I think I should not use the term 'corruption', I meant 'loss'. Cheers, - -- ** Help China's quake relief at http://www.redcross.org.cn/ |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkgzYPUACgkQi+vbBBjt66CSsgCgsWfChktCt/XfocU/olHV1OSZ jq0An0NZaQOp8fJcxhS3YB2z3Wp7va3E =5a34 -----END PGP SIGNATURE----- From scottl at samsco.org Wed May 21 00:36:57 2008 From: scottl at samsco.org (Scott Long) Date: Wed May 21 00:37:01 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <4832E6C2.7040205@calorieking.com> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> Message-ID: <48336EA0.3050109@samsco.org> Warren Guy wrote: > Scott, > > Thanks a lot for that. This seems to have alleviated the problem, I'm seeing > decent performance now in my limited benchmark. It seems quite odd to me that > the write cache is not enabled by default, but oh well. > > Thanks again for your help! > > Warren For data reliability, you really don't want it enabled by default. The problem is that SATA/ATA performs so poorly without it that everyone turns it on and lives with the consequences. The tweak that I recommended puts it in line with what the FreeBSD ATA driver has been doing for years. According to your original benchmark, Linux performs better on the sequential tests, but those simply aren't representative of most people's workloads. Linux indeed has some tricks to make sequential benchmarks perform well, but they aren't tricks that I'm all that interested in implementing in FreeBSD (though increasing the maxio size for 64-bit platforms would help and has few detrimental effects). The same benchmark shows that FreeBSD performs just as well, if not better, than Linux in random tests, even without the write cache enabled. Those tests are more representative of typical workloads. So, it's up to you to analyze what kind of workload you expect, and make the appropriate tradeoffs. Scott From sbruno at miralink.com Wed May 21 17:29:17 2008 From: sbruno at miralink.com (Sean Bruno) Date: Wed May 21 17:29:20 2008 Subject: ISCSI Patch Message-ID: <48345BEC.8020700@miralink.com> This small change allows the iSCSI initiator to compile and load under RELENG_6. isc_cam.c 398a399 > #if __FreeBSD_version >= 700000 399a401,403 > #else > if(xpt_bus_register(sim, 0/*bus_number*/) != CAM_SUCCESS) > #endif -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Phone 503-621-5143 Fax 503-621-5199 From cdillon at wolves.k12.mo.us Wed May 21 19:46:26 2008 From: cdillon at wolves.k12.mo.us (Chris Dillon) Date: Wed May 21 19:46:29 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <48336EA0.3050109@samsco.org> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> <48336EA0.3050109@samsco.org> Message-ID: <20080521143051.17771kseoxrlhy7f@www.wolves.k12.mo.us> Quoting Scott Long : > For data reliability, you really don't want it enabled by default. The > problem is that SATA/ATA performs so poorly without it that everyone > turns it on and lives with the consequences. The tweak that I > recommended puts it in line with what the FreeBSD ATA driver has been > doing for years. Doesn't SATA NCQ solve this particular performance vs. reliability problem since it safely allows multiple outstanding write requests? Of course that means the SATA RAID controller would have to use NCQ on the drives and would probably also need its own non-volatile cache. I've always assumed this is how SCSI/SAS drives (with TCQ) perform as well as they do without sacrificing data integrity. We recently bought a new HP DL380G5 server with a P800 SAS RAID controller, MSA60 external drive shelf with 12 750GB SATA drives, 11-drive RAID5 array w/ hot-spare (a few too many drives in a single RAID5 array, I know, but I'm experimenting). The system is running Windows Server 2K3 R2. Without telling the P800 to enable the SATA WC (it has an option to do so, off by default), when doing a drag and drop file copy of several very large files from the internal SAS array to the external SATA array it writes 300MB/sec. I briefly enabled the "Physical Drive Write Cache" on the controller just a few minutes ago and ran another test and didn't notice any difference in write speed. I can only assume from this that the P800 is using NCQ on the SATA drives. -- Chris Dillon - NetEng/SysAdm Reeds Spring R-IV School District Technology Department 175 Elementary Rd. Reeds Spring, MO 65737 Voice: 417-272-8266 Fax: 417-272-0015 From scottl at samsco.org Wed May 21 20:08:27 2008 From: scottl at samsco.org (Scott Long) Date: Wed May 21 20:08:30 2008 Subject: Very poor performance from Dell/LSI Logic SAS 3000 series SATA/SAS RAID controller FreeBSD 6.3 In-Reply-To: <20080521143051.17771kseoxrlhy7f@www.wolves.k12.mo.us> References: <4832C397.3090004@calorieking.com> <4832E0EE.3030402@samsco.org> <4832E6C2.7040205@calorieking.com> <48336EA0.3050109@samsco.org> <20080521143051.17771kseoxrlhy7f@www.wolves.k12.mo.us> Message-ID: <48348131.3040602@samsco.org> Chris Dillon wrote: > Quoting Scott Long : > >> For data reliability, you really don't want it enabled by default. The >> problem is that SATA/ATA performs so poorly without it that everyone >> turns it on and lives with the consequences. The tweak that I >> recommended puts it in line with what the FreeBSD ATA driver has been >> doing for years. > > Doesn't SATA NCQ solve this particular performance vs. reliability > problem since it safely allows multiple outstanding write requests? Of > course that means the SATA RAID controller would have to use NCQ on the > drives and would probably also need its own non-volatile cache. I've > always assumed this is how SCSI/SAS drives (with TCQ) perform as well as > they do without sacrificing data integrity. Yes and no. NCQ gets you 90% the way there, but the lack of an ordered tag operation in the NCQ protocol means that i/o streams can be starved, forcing you to do unpleasant i/o scheduling hacks. But yes, it helps quite a bit, and I have a prototype driver already working that supports NCQ and performs very well with write cache turned off. > > We recently bought a new HP DL380G5 server with a P800 SAS RAID > controller, MSA60 external drive shelf with 12 750GB SATA drives, > 11-drive RAID5 array w/ hot-spare (a few too many drives in a single > RAID5 array, I know, but I'm experimenting). The system is running > Windows Server 2K3 R2. Without telling the P800 to enable the SATA WC > (it has an option to do so, off by default), when doing a drag and drop > file copy of several very large files from the internal SAS array to the > external SATA array it writes 300MB/sec. I briefly enabled the > "Physical Drive Write Cache" on the controller just a few minutes ago > and ran another test and didn't notice any difference in write speed. I > can only assume from this that the P800 is using NCQ on the SATA drives. > The cache and queueing mechanism on most IOP raid cards will smooth over the performance problems with ATA/SATA, so your results aren't too surprising. Scott From bugmaster at FreeBSD.org Mon May 26 11:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 26 11:07:46 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200805261106.m4QB6sc6065030@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] aac(4) will not work with Adaptec SAS RAID 3805 o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From alexander.gsander at gmail.com Mon May 26 13:09:57 2008 From: alexander.gsander at gmail.com (Alexander Goncharov) Date: Mon May 26 13:10:04 2008 Subject: Strange behavior of SCSI RAID 10 (FreeBSD, LSILogic MegaRAID) Message-ID: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> Hello world, I have faced with following issue on my dedicated server: 8x Opteron 885, 32gm RAM, 8x36 GM 15k rpm SCSI with RAID 10 FreeBSD 7.0-generic, 64 bit version IO performance and behavior is very strange: 1) No other process are running: Memory stat: Mem: 8796K Active, 9372K Inact, 80M Wired, 36K Cache, 12M Buf, 31G Free Copy 3GB file first time dd if=/home/3gb_file of=/home/3gb_file2 6291456+0 records in 6291456+0 records out 3221225472 bytes transferred in 138.842926 secs (23200501 bytes/sec) 20MBS is very poor? Memory stat now: Mem: 8940K Active, 5951M Inact, 287M Wired, 36K Cache, 214M Buf, 25G Free 2) Copy the same file again: dd if=/home/3gb_file of=/home/3gb_file2 6291456+0 records in 6291456+0 records out 3221225472 bytes transferred in 30.433515 secs (105844674 bytes/sec) 100MBs ? much better Mem: 9048K Active, 5951M Inact, 287M Wired, 32K Cache, 214M Buf, 25G Free Next attempts with this file show 100MBs spped 3)Copy other file first time dd if=/home/test2 of=/home/test2_2 6144000+0 records in 6144000+0 records out 3145728000 bytes transferred in 141.870921 secs (22173170 bytes/sec) 20MBs again Copy the same file again: dd if=/home/test2 of=/home/test2_2 6144000+0 records in 6144000+0 records out 3145728000 bytes transferred in 29.560267 secs (106417441 bytes/sec) 100MBs ? much better. Memory stat: Mem: 8964K Active, 12G Inact, 287M Wired, 28K Cache, 214M Buf, 19G Free Copy first file again: dd if=/home/3gb_file of=/home/3gb_file2 6291456+0 records in 6291456+0 records out 3221225472 bytes transferred in 34.310753 secs (93883847 bytes/sec) good speed .... So, first copying of any file is extremely slow, second and all next attempts are much better. Please help me with this issue. Dmesg some info: ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (May 22 2008 00:49:14) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 amr0: mem 0xdc400000-0xdc40ffff,0xdc100000-0xdc13ffff irq 16 at device 14.0 on pci140 amr0: Using 64-bit DMA amr0: [ITHREAD] amr0: delete logical drives supported by controller amr0: Firmware 514L, BIOS H430, 128MB RAM hptrr: no controller detected. acd0: DVDROM at ata1-slave UDMA33 amr0: delete logical drives supported by controller amrd0: on amr0 amrd0: 137328MB (281247744 sectors) RAID 1 (optimal) Best regards, Alexander. From gpalmer at freebsd.org Mon May 26 14:10:07 2008 From: gpalmer at freebsd.org (Gary Palmer) Date: Mon May 26 14:10:09 2008 Subject: Strange behavior of SCSI RAID 10 (FreeBSD, LSILogic MegaRAID) In-Reply-To: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> References: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> Message-ID: <20080526141005.GC1142@in-addr.com> On Mon, May 26, 2008 at 11:41:56PM +1100, Alexander Goncharov wrote: > Hello world, I have faced with following issue on my dedicated server: > > 8x Opteron 885, 32gm RAM, 8x36 GM 15k rpm SCSI with RAID 10 > > FreeBSD 7.0-generic, 64 bit version > > > > IO performance and behavior is very strange: > > 1) No other process are running: > > Memory stat: > > Mem: 8796K Active, 9372K Inact, 80M Wired, 36K Cache, 12M Buf, 31G Free > > Copy 3GB file first time > > dd if=/home/3gb_file of=/home/3gb_file2 > 6291456+0 records in > 6291456+0 records out > 3221225472 bytes transferred in 138.842926 secs (23200501 bytes/sec) > > 20MBS is very poor? > > Memory stat now: > > Mem: 8940K Active, 5951M Inact, 287M Wired, 36K Cache, 214M Buf, 25G Free > > 2) Copy the same file again: > > dd if=/home/3gb_file of=/home/3gb_file2 > 6291456+0 records in > 6291456+0 records out > 3221225472 bytes transferred in 30.433515 secs (105844674 bytes/sec) > > 100MBs ? much better > The "Inact" (Inactive) went up - the 3GiB file is now cached in memory. So the second (and subsequent) runs are going from cached memory so your 100MiB/sec transfer is actually just testing write speed, not read/write speed. This is the same for your other tests too "Inact" is memory that has been used and is being kept around incase it is used again, in other words its caching file data in the "Inact" region in top. It'll be reused if something else needs the memory, but until then it sticks around. Remember - you are copying the file from and to the same filesystem - this is always going to appear slow relative to pure read or pure write tests. A pure write test is effectively what you have when you're getting your 100MiB/sec test result since its just writing out from cache memory. Regards, Gary From alexander.gsander at gmail.com Mon May 26 15:36:45 2008 From: alexander.gsander at gmail.com (Alexander Goncharov) Date: Mon May 26 15:36:51 2008 Subject: Strange behavior of SCSI RAID 10 (FreeBSD, LSILogic MegaRAID) In-Reply-To: <20080526141005.GC1142@in-addr.com> References: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> <20080526141005.GC1142@in-addr.com> Message-ID: <5c29cc10805260836rb8ccac0ic722c255e60286f8@mail.gmail.com> On Tue, May 27, 2008 at 1:10 AM, Gary Palmer wrote: > On Mon, May 26, 2008 at 11:41:56PM +1100, Alexander Goncharov wrote: > > Hello world, I have faced with following issue on my dedicated server: > > > > 8x Opteron 885, 32gm RAM, 8x36 GM 15k rpm SCSI with RAID 10 > > > > FreeBSD 7.0-generic, 64 bit version > > > > > > > > IO performance and behavior is very strange: > > > > 1) No other process are running: > > > > Memory stat: > > > > Mem: 8796K Active, 9372K Inact, 80M Wired, 36K Cache, 12M Buf, 31G Free > > > > Copy 3GB file first time > > > > dd if=/home/3gb_file of=/home/3gb_file2 > > 6291456+0 records in > > 6291456+0 records out > > 3221225472 bytes transferred in 138.842926 secs (23200501 bytes/sec) > > > > 20MBS is very poor? > > > > Memory stat now: > > > > Mem: 8940K Active, 5951M Inact, 287M Wired, 36K Cache, 214M Buf, 25G Free > > > > 2) Copy the same file again: > > > > dd if=/home/3gb_file of=/home/3gb_file2 > > 6291456+0 records in > > 6291456+0 records out > > 3221225472 bytes transferred in 30.433515 secs (105844674 bytes/sec) > > > > 100MBs ? much better > > > > The "Inact" (Inactive) went up - the 3GiB file is now cached in > memory. So the second (and subsequent) runs are going from cached > memory so your 100MiB/sec transfer is actually just testing write > speed, not read/write speed. This is the same for your other > tests too > > "Inact" is memory that has been used and is being kept around incase > it is used again, in other words its caching file data in the > "Inact" region in top. It'll be reused if something else needs > the memory, but until then it sticks around. > > Remember - you are copying the file from and to the same > filesystem - this is always going to appear slow relative to > pure read or pure write tests. A pure write test is effectively > what you have when you're getting your 100MiB/sec test result since > its just writing out from cache memory. > > Regards, > > Gary > Hi Gary, Big thanks for your quick reply. Read only and write only speed is good ~100MBs. But I am not sure if it really good for hw RAID 10 8x 15k rpm drives. I am worried about read/write speed usecase which is most used at real tasks (data base). I was surprised 20 MBs speed, this value is likely to one drive speed. But I have 8 high speed drives. Something isn't right here. I am waiting for freebsd community help. Thanks in advance. Alexander From os at rsu.ru Mon May 26 18:26:23 2008 From: os at rsu.ru (Oleg Sharoiko) Date: Mon May 26 18:26:29 2008 Subject: Strange behavior of SCSI RAID 10 (FreeBSD, LSILogic MegaRAID) In-Reply-To: <5c29cc10805260836rb8ccac0ic722c255e60286f8@mail.gmail.com> References: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> <20080526141005.GC1142@in-addr.com> <5c29cc10805260836rb8ccac0ic722c255e60286f8@mail.gmail.com> Message-ID: <483AFB0A.6050401@rsu.ru> Hi! Alexander Goncharov wrote: > On Tue, May 27, 2008 at 1:10 AM, Gary Palmer wrote: > > >> On Mon, May 26, 2008 at 11:41:56PM +1100, Alexander Goncharov wrote: >> >>> Hello world, I have faced with following issue on my dedicated server: >>> >>> 8x Opteron 885, 32gm RAM, 8x36 GM 15k rpm SCSI with RAID 10 >>> >>> FreeBSD 7.0-generic, 64 bit version >>> >>> >>> >>> IO performance and behavior is very strange: >>> >>> 1) No other process are running: >>> >>> Memory stat: >>> >>> Mem: 8796K Active, 9372K Inact, 80M Wired, 36K Cache, 12M Buf, 31G Free >>> >>> Copy 3GB file first time >>> >>> dd if=/home/3gb_file of=/home/3gb_file2 >>> 6291456+0 records in >>> 6291456+0 records out >>> 3221225472 bytes transferred in 138.842926 secs (23200501 bytes/sec) >>> >>> 20MBS is very poor? >>> >>> Alexander, you're using default block size of 512 bytes. In this case your hardware has to: while (!eof(source_file)) { locate next 512 byte block read it locate writing position write block } So it does 6291456*2 searches. Moreover you're working with files and thus dd has to go throug filesystem layer which involves additional overhead. Please try increasing block size to, for example, 8 mb (add bs=8m to the arguments of dd) >>> Memory stat now: >>> >>> Mem: 8940K Active, 5951M Inact, 287M Wired, 36K Cache, 214M Buf, 25G Free >>> >>> 2) Copy the same file again: >>> >>> dd if=/home/3gb_file of=/home/3gb_file2 >>> 6291456+0 records in >>> 6291456+0 records out >>> 3221225472 bytes transferred in 30.433515 secs (105844674 bytes/sec) >>> >>> 100MBs ? much better >>> >>> >> The "Inact" (Inactive) went up - the 3GiB file is now cached in >> memory. So the second (and subsequent) runs are going from cached >> memory so your 100MiB/sec transfer is actually just testing write >> speed, not read/write speed. This is the same for your other >> tests too >> >> "Inact" is memory that has been used and is being kept around incase >> it is used again, in other words its caching file data in the >> "Inact" region in top. It'll be reused if something else needs >> the memory, but until then it sticks around. >> >> Remember - you are copying the file from and to the same >> filesystem - this is always going to appear slow relative to >> pure read or pure write tests. A pure write test is effectively >> what you have when you're getting your 100MiB/sec test result since >> its just writing out from cache memory. >> >> Regards, >> >> Gary >> >> > > Hi Gary, > > Big thanks for your quick reply. Read only and write only speed is good > ~100MBs. But I am not sure if it really good for hw RAID 10 8x 15k rpm > drives. > I am worried about read/write speed usecase which is most used at real tasks > (data base). I was surprised 20 MBs speed, this value is likely to one drive > speed. But I have 8 high speed drives. Something isn't right here. > > I am waiting for freebsd community help. > > Thanks in advance. > Alexander > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > From tit at ispserver.com Wed May 28 03:40:05 2008 From: tit at ispserver.com (Alexander Titaev) Date: Wed May 28 03:40:12 2008 Subject: kern/123666: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Message-ID: <200805280340.m4S3e4m5091134@freefall.freebsd.org> The following reply was made to PR kern/123666; it has been noted by GNATS. From: Alexander Titaev To: bug-followup@FreeBSD.org, romain@blogreen.org Cc: Subject: Re: kern/123666: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Date: Wed, 28 May 2008 11:52:54 +0900 Hi it find type volume (single drive) only msk-srv# dmesg | grep ^aac aac0: mem 0xd8400000-0xd85fffff irq 18 at device 14.0 on pci11 aac0: Enabling 64-bit address support aac0: New comm. interface enabled aac0: [ITHREAD] aac0: Adaptec 3805, aac driver 2.0.0-1 aacp0: on aac0 aacp1: on aac0 aacp2: on aac0 aacd0: on aac0 aacd0: 953690MB (1953157120 sectors) msk-srv# uname -mr 7.0-RELEASE-p1 amd64 -- regards, Alexander mailto:tit@ispserver.com From netslists at gmail.com Wed May 28 06:31:35 2008 From: netslists at gmail.com (Sten Daniel Soersdal) Date: Wed May 28 06:31:42 2008 Subject: Strange behavior of SCSI RAID 10 (FreeBSD, LSILogic MegaRAID) In-Reply-To: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> References: <5c29cc10805260541r1f90f516q8ac7d45d2bb15702@mail.gmail.com> Message-ID: <483CF5AE.3090105@gmail.com> Alexander Goncharov wrote: > Hello world, I have faced with following issue on my dedicated server: > > 8x Opteron 885, 32gm RAM, 8x36 GM 15k rpm SCSI with RAID 10 > > FreeBSD 7.0-generic, 64 bit version > > > > IO performance and behavior is very strange: > > 1) No other process are running: > > Memory stat: > > Mem: 8796K Active, 9372K Inact, 80M Wired, 36K Cache, 12M Buf, 31G Free > > Copy 3GB file first time > > dd if=/home/3gb_file of=/home/3gb_file2 > 6291456+0 records in > 6291456+0 records out > 3221225472 bytes transferred in 138.842926 secs (23200501 bytes/sec) > > 20MBS is very poor? > > Memory stat now: > > Mem: 8940K Active, 5951M Inact, 287M Wired, 36K Cache, 214M Buf, 25G Free > > 2) Copy the same file again: > > dd if=/home/3gb_file of=/home/3gb_file2 > 6291456+0 records in > 6291456+0 records out > 3221225472 bytes transferred in 30.433515 secs (105844674 bytes/sec) > > 100MBs ? much better > > Mem: 9048K Active, 5951M Inact, 287M Wired, 32K Cache, 214M Buf, 25G Free > > Next attempts with this file show 100MBs spped > > 3)Copy other file first time > dd if=/home/test2 of=/home/test2_2 > 6144000+0 records in > 6144000+0 records out > 3145728000 bytes transferred in 141.870921 secs (22173170 bytes/sec) > > 20MBs again > > Copy the same file again: > > dd if=/home/test2 of=/home/test2_2 > 6144000+0 records in > 6144000+0 records out > 3145728000 bytes transferred in 29.560267 secs (106417441 bytes/sec) > > 100MBs ? much better. Memory stat: > > Mem: 8964K Active, 12G Inact, 287M Wired, 28K Cache, 214M Buf, 19G Free > > Copy first file again: > > dd if=/home/3gb_file of=/home/3gb_file2 > 6291456+0 records in > 6291456+0 records out > 3221225472 bytes transferred in 34.310753 secs (93883847 bytes/sec) > > good speed > > .... > > So, first copying of any file is extremely slow, second and all next > attempts are much better. Please help me with this issue. > > Dmesg some info: > > > ioapic0 irqs 0-23 on motherboard > kbd1 at kbdmux0 > ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) > hptrr: HPT RocketRAID controller driver v1.1 (May 22 2008 00:49:14) > acpi0: on motherboard > acpi0: [ITHREAD] > acpi0: Power Button (fixed) > Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 > > > amr0: mem > 0xdc400000-0xdc40ffff,0xdc100000-0xdc13ffff irq 16 at device 14.0 on pci140 > amr0: Using 64-bit DMA > amr0: [ITHREAD] > amr0: delete logical drives supported by controller > amr0: Firmware 514L, BIOS H430, 128MB RAM I'm not sure if it's relevant but the LSILogic MegaRAID cards i have are not able to do true RAID 1+0. Instead it mirrors and then concatenates which will leave you relatively poor read/write performance (no striping). To do RAID 1+0 on my controllers i had to do mirror in BIOS and then stripe the mirror's using gstripe. Also once you do RAID you have to pay great attention to block sizes/widths/etc and alignment of such. Improper alignment will ruin any performance benefits your RAID levels may provide. Without Battery backed cache your write performance will be poor. Also 'dd' defaults to 512 byte blocks (poor testing). Just my $0.2 -- Sten Daniel Soersdal From tit at ispserver.com Thu May 29 05:30:08 2008 From: tit at ispserver.com (Alexander Titaev) Date: Thu May 29 05:30:15 2008 Subject: kern/123666: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Message-ID: <200805290530.m4T5U869051120@freefall.freebsd.org> The following reply was made to PR kern/123666; it has been noted by GNATS. From: Alexander Titaev To: bug-followup@FreeBSD.org, romain@blogreen.org Cc: Subject: Re: kern/123666: [aac] aac(4) will not work with Adaptec SAS RAID 3805 controller Date: Thu, 29 May 2008 14:28:58 +0900 =C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5, bug-followup. but in 7.0-STABLE all worked fine msk-srv# uname -mr 7.0-STABLE amd64 msk-srv# dmesg | grep ^aa aac0: mem 0xd8400000-0xd85fffff irq 18 at device 14.0 o= n pci11 aac0: Enabling 64-bit address support aac0: Enable Raw I/O aac0: Enable 64-bit array aac0: New comm. interface enabled aac0: [ITHREAD] aac0: Adaptec 3805, aac driver 2.0.0-1 aacp0: on aac0 aacp1: on aac0 aacp2: on aac0 aacd0: on aac0 aacd0: 953690MB (1953157120 sectors) aacd1: on aac0 aacd1: 5722190MB (11719045120 sectors) --=20 =D1 =F3=E2=E0=E6=E5=ED=E8=E5=EC, Alexander mailto:tit@ispserver.com