Probably Hardware Trouble But What Is It?

Sun Dec 7 17:08:20 UTC 2014

Drew,

Just trying to assist....

 From the look of it, something is definitely failing and it is either 
the controller or the disk.  FreeBSD is trying to stay alive.  (I've had 
something similar happen in the past.  When I rebooted, a disk showed to 
be faulted and inaccessible.)

I'd theorize that the first line about the kernel maxfiles being 
exceeded by root (borrowing you haven't changed the setting) is due to 
the failure trying to allocate file handles to handle the requests that 
can't be completed due to the failure.

If you have access to the console and another drive, you may want to 
connect a second drive, configure it to mirror the first and hope that 
it can mirror the first.  If it works, great.  BTW, don't forget to 
install bootblocks if this is your boot drive.

Now, if it doesn't start to mirror the drive after being attached, 
you're going to have to reboot.  That's probably going to show you the 
real failure. :-(

If the controller card is onboard, not much you can do.  If it's a PCIe 
bus card, try to re-seat it.  Sometimes things get pulled on, or hit 
inadvertently and aren't sitting in the slot correctly any more.

I agree with the other post in either replacing the connecting cables 
and/or re-seating them.

If, after all this, it doesn't work, it's probably the disk itself.

Now, comes the patient part.  If it's the drive, it's probably pretty 
hot from failing and trying to do it's job.  Don't laugh at this it's 
worked for me 5 out of 7 times.  Remove it from the machine, let it cool 
to room temperature on anti-static bag.  Once cool, put it in the bag, 
put it in your freezer for at least three hours.  Re-insert into the 
machine.  (At this point, you should have that other drive for the 
mirror connected.)  If the drive isn't a catastrophic loss, it will work 
for a short time.  I recommend you allow it to mirror.  Ask the drive to 
do NOTHING but let it sit and mirror while in single-user mode.

However, before going to that last 'iffy' part, check everything before 
that.

P.

On 12/06/2014 19:58, Drew Tomlinson wrote:
> I'm running FBS 9.1 RELEASE that I built several years ago.  It's 
> mostly a Samba server and has "just worked" so I've never done much 
> more with it.  However recently, I find it "locked up" with thousands 
> of these messages on the console:
>
> kernel: kern.maxfiles limit exceeded by uid 0, please see tuning(7)
>
> I've looked in /var/log/messages and also see lots of messages like 
> these:
>
> Dec  6 13:55:53 vm kernel: siisch0:  ... waiting for slots 18000000
> Dec  6 13:55:53 vm kernel: siisch0: Timeout on slot 28
> Dec  6 13:55:53 vm kernel: siisch0: siis_timeout is 00040000 ss 
> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000
> Dec  6 13:55:53 vm kernel: siisch0:  ... waiting for slots 08000000
> Dec  6 13:55:55 vm kernel: siisch0: Timeout on slot 27
> Dec  6 13:55:55 vm kernel: siisch0: siis_timeout is 00040000 ss 
> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): FLUSHCACHE48. ACB: ea 
> 00 00 00 00 40 00 00 00 00 00 00
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command 
> timeout
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. 
> ACB: 60 01 fe d8 74 40 39 00 00 00 00 00
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command 
> timeout
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command
> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. 
> ACB: 60 0a a5 7f 00 40 4c 00 00 00 00 00
>
> This machine uses zfs.  I have two pools:
>
> # zpool list
> NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> zback  1.81T   848G  1008G    45%  1.00x  ONLINE  -
> zroot  1.81T  1.16T   666G    64%  1.00x  ONLINE  -
>
> Then I tried this and my ssh window is now stuck:
>
> # zpool status
>   pool: zback
>  state: ONLINE
> status: One or more devices are faulted in response to IO failures.
> action: Make sure the affected devices are connected, then run 'zpool 
> clear'.
>    see: http://illumos.org/msg/ZFS-8000-HC
>   scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zback       ONLINE       3     0     0
>           ada0      ONLINE       4     0     0
>
> I opened another ssh window and tried 'zpool clear zback' as suggested 
> but it appears stuck too.
>
> I'm sure I haven't provided all the relevant information so please ask 
> and I will do so.  I'd appreciate any guidance on how to take a proper 
> backup of ada0 and what I should do next.  I think this zback pool is 
> just the one disk which is a 2TB drive.  I'd like to know how to 
> confirm that if possible since it seems the zpool commands aren't able 
> to complete.
>
> I appreciate any suggestions or guidance.
>
> Thanks,
>
> Drew
>