kern/157397: [ada] ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability

Matthias Andree mandree at FreeBSD.org
Wed Apr 3 22:10:01 UTC 2013


The following reply was made to PR kern/157397; it has been noted by GNATS.

From: Matthias Andree <mandree at FreeBSD.org>
To: bug-followup at FreeBSD.org, Alexander Motin <mav at FreeBSD.org>
Cc:  
Subject: Re: kern/157397: [ada] ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability
Date: Thu, 04 Apr 2013 00:08:12 +0200

 Further information:
 
 - I have /usr (and only /usr) on the drive in question.
 
 # tunefs -p /dev/label/usr
 tunefs: POSIX.1e ACLs: (-a)                                disabled
 tunefs: NFSv4 ACLs: (-N)                                   enabled
 tunefs: MAC multilabel: (-l)                               disabled
 tunefs: soft updates: (-n)                                 enabled
 tunefs: soft update journaling: (-j)                       enabled
 tunefs: gjournal: (-J)                                     disabled
 tunefs: trim: (-t)                                         disabled
 tunefs: maximum blocks per file in a cylinder group: (-e)  2048
 tunefs: average file size: (-f)                            16384
 tunefs: average number of files in a directory: (-s)       64
 tunefs: minimum percentage of free space: (-m)             8%
 tunefs: optimization preference: (-o)                      time
 tunefs: volume label: (-L)                                 usr
 
 
 - I am running with kern.cam.ada.default_timeout=5 which makes the
 computer recover faster
 
 
 - write/read status for stalls is unclear to me, but the kernel only
 ever logs WRITE_FPDMA_QUEUED, so I guess the answer is "write".
 
 "rm -rf /usr/obj" or "log in to GNOME and try starting gnome-terminal"
 are sufficient to trigger it.
 
 
 - reducing the number of tags to 31 does not appear to help.  Linux's
 libata does that only to distinguish the bit mask 0xffffffff it might
 get with 32 tags from "fatal errors".
 
 
 - disabling NCQ through "camcontrol negotiate ada1 -T disable" would
 appear to help, causing massive slowdown (as is expected; as I run with
 ata caches disabled), but requires further long-winded testing before
 I'd really confirm it helps
 
 
 # camcontrol identify ada1
 pass1: <SAMSUNG HD103SI 1AG01118> ATA-7 SATA 2.x device
 pass1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
 
 protocol              ATA/ATAPI-7 SATA 2.x
 device model          SAMSUNG HD103SI
 firmware revision     1AG01118
 serial number         (elided)
 WWN                   (elided)
 cylinders             16383
 heads                 16
 sectors/track         63
 sector size           logical 512, physical 512, offset 0
 LBA supported         268435455 sectors
 LBA48 supported       1953525168 sectors
 PIO supported         PIO4
 DMA supported         WDMA2 UDMA6
 
 Feature                      Support  Enabled   Value           Vendor
 read ahead                     yes      yes
 write cache                    yes      no
 flush cache                    yes      yes
 overlap                        no
 Tagged Command Queuing (TCQ)   no       no
 Native Command Queuing (NCQ)   yes              32 tags
 SMART                          yes      yes
 microcode download             yes      yes
 security                       yes      no
 power management               yes      yes
 advanced power management      yes      yes     254/0xFE
 automatic acoustic management  yes      no      0/0x00  254/0xFE
 media status notification      no       no
 power-up in Standby            yes      no
 write-read-verify              no       no
 unload                         no       no
 free-fall                      no       no
 data set management (TRIM)     no
 
 
 # camcontrol tags ada1 -N31
 (pass1:ahcich1:0:0:0): tagged openings now 31
 (pass1:ahcich1:0:0:0): device openings: 31
 
 
 Logs through "egrep ahcich1\|ada1\|pass1\|ahci0" available from
 <http://people.freebsd.org/~mandree/PR157397-logs.txt>, with Serial
 numbers removed.
 
 OBSERVE that this only ever affects odd-numbered slots, never
 even-numbered slots.
 
 
 Linux findings:
 
 - Linux uses 31 out of 32 slots so it can distinguish a fatal error from
 "all bits set in 32-bit bitmask", see:
 
 <https://ata.wiki.kernel.org/index.php/Main_Page>
 
 - Linux sources at
 <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata>
 for browsing;
 check ata_device_blacklist in libata-core.c -> no Samsung stuff.
 
 Regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile
 investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by
 Linux on the SB700 that my computer is using, see ahci_error_intr() in
 libahci.h - I am not going to interpret that for lack of expertise.


More information about the freebsd-bugs mailing list