Strange lock/crash - 100% cpu with basic command line utils

Ivan Dimitrov zlobber at gmail.com
Tue Nov 12 12:28:32 UTC 2013


Hello list

This is my first time reporting a problem, so please excuse me if this 
is not the right place or format. Also apology for my poor English.

Last month we started experiencing strange locks on some of our servers. 
On semi-random occasions, when typing `cd`, `ls`, `pwd` the server would 
crash and start behave strangely. Sometimes the problem is reproducible, 
sometimes all commands work as expected.
All servers are Intel or AMD CPUs with FreeBSD 9.2 that netboot the 
latest kernel and load the OS in RAM.
All our servers are using zfs with ssd for cache. Here is an example 
server:
Also we tested out with preempted and non preempted kernel.

==========================================

[root at ph3storage5 ~]# zpool status -v
   pool: zstorage5p1
  state: ONLINE
   scan: scrub repaired 0 in 39h36m with 0 errors on Mon Nov  4 05:11:48 
2013
config:

     NAME        STATE     READ WRITE CKSUM
     zstorage5p1  ONLINE       0     0     0
       mirror-0  ONLINE       0     0     0
         ada0    ONLINE       0     0     0
         ada1    ONLINE       0     0     0
     cache
       ada4p1    ONLINE       0     0     0

errors: No known data errors

   pool: zstorage5p2
  state: ONLINE
   scan: scrub repaired 0 in 14h59m with 0 errors on Sun Nov  3 04:41:50 
2013
config:

     NAME        STATE     READ WRITE CKSUM
     zstorage5p2  ONLINE       0     0     0
       mirror-0  ONLINE       0     0     0
         ada2    ONLINE       0     0     0
         ada3    ONLINE       0     0     0
     cache
       ada4p2    ONLINE       0     0     0

errors: No known data errors

==========================================
The typical lock would look like the following:
cd ~userdir/ ; ls
At this point, the ls command "freezes" and cannot be "ctrl+c".
We open up another console and see that the `ls` command is using 100% 
CPU. Also, some disk operations randomly start taking 1 to 2 minutes to 
complete. For example, we used `camcontrol` a few times, and it freezed 
at one point.
Also (while crashed) we used zpool to remove the ssd cache from the 
pool, than we re-added the cache back to the pool, but when we issued 
zpool status, the command freezed for a minute.

We managed to collect some data from two different incidents

Incident 1: http://pastebin.com/EkCeSwY9
Incident 2: http://pastebin.com/5rj9BV68

Since the problem is reproducible, we accept proposals how to do further 
tests.

Thanks in advance
Best Regards
Ivan Dimitrov


More information about the freebsd-fs mailing list