Second SATA device lost after ZFS root is mount
Alexander Motin
mav at FreeBSD.org
Mon Nov 14 23:43:23 UTC 2011
On 15.11.2011 01:00, Sebastian Chmielewski wrote:
> On Tue, 15 Nov 2011 00:39:52 +0200
> Alexander Motin<mav at FreeBSD.org> wrote:
>
>> SATA device can be dropped because of error during reset/ probe/
>> initialization sequence or because controller reported disconnection.
>> Verbose boot messages (boot -v from loader prompt) should give more
>> information about what happened there. Show please full verbose dmesg.
> Using rc_debug="YES" in rc.conf I've found that my device is dropped during
> sysctl_start. With empty sysctl.conf my device is not lost. The contents of
> file seems quite innocent:
>
> # Uncomment this to prevent users from seeing information about processes that
> # are being run under another UID.
> security.bsd.see_other_uids=1
>
> # Enable/disable coredump
> kern.coredump=1
>
> # Up the maxfiles to 4x default
> kern.maxfiles=49312
>
> kern.ipc.shmmax=67108864
> kern.ipc.shmall=32768
>
> # Allow users to mount CD's
> vfs.usermount=1
> vfs.hirunningspace=8388608
> vfs.lorunningspace=1048576
>
> kern.corefile="/var/coredumps/%U/%N.core"
>
> # Do not truncate command line arguments in ps(1) listing
> kern.ps_arg_cache_limit=10000
>
> # Tune for desktop usage
> kern.sched.preempt_thresh=224
>
> # Increase default setting - recommended for 2 GB of RAM
> kern.maxvnodes=400000
>
> dev.acpi_ibm.0.lcd_brightness=6
> dev.acpi_ibm.0.lcd_brightness=3
> net.link.tap.user_open=1
> net.link.tap.up_on_open=1
>
> The device is lost even when sysctl is started with new file when booting finishes (I did service sysctl restart from X session).
> # sysctl debug.bootverbose=1
> # service sysctl restart
> # dmesg
>
> ahcich1: DISCONNECT requested
> ahcich1: AHCI reset...
> ahcich1: SATA connect timeout time=10000us status=00000000
> ahcich1: AHCI reset: device not found
> (ada1:ahcich1:0:0:0): lost device
> (pass1:ahcich1:0:0:0): lost device
> (pass1:ahcich1:0:0:0): removing device entry
>
> Crazy, isn't it?
It is. I've never heard about such things.
Reset status looks like if device was indeed disconnected or powered
down. I don't even know how to do it this way, at least on Intel
chipsets. My laptop's BIOS has bug that disables SATA port after
suspend/resume, but there it can be seen in reset status that port was
explicitly disabled. I have only one crazy idea: while setting screen
brightness you are calling ACPI code that is black box by definition and
can do whatever it wants with hardware, including using any possible
custom power control interfaces.
Was the second disk initially planned in this laptop? Laptop vendors
more then desktop ones tend to hardcode things.
I would try two things:
- bisecting list of sysctls found one that cause this;
- tried to enable SATA interface power management for the device. If
power management was somehow enabled on the device around the OS, it may
cause false DISCONNECT messages, while it still it should not cause such
reset status. Setting hint.ahcich.1.pm_level=1 in loader.conf will make
ahci(4) driver do ignore link loss events. If device indeed lost, you
should see command timeouts and only then device loss.
--
Alexander Motin
More information about the freebsd-current
mailing list