Locked up processes after upgrade to ZFS v15

Sat Oct 9 13:37:08 UTC 2010

Am 09.10.2010 um 13:12 schrieb Jeremy Chadwick:

> On Wed, Oct 06, 2010 at 02:28:31PM +0200, Kai Gallasch wrote:
>> Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15.
>> After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more.

> On RELENG_7, the system used ZFS v14, had the same tunings, and had an
> uptime of 221 days w/out issue.

8.0 and 8.1-STABLE + ZFS v14 also ran very solid on my servers - dang!

> With RELENG_8, the system lasted approximately 12 hours (about half a
> day) before getting into a state that looks almost identical to Kai's
> system: existing processes were stuck (unkillable, even with -9).  New
> processes could be spawned (including ones which used the ZFS
> filesystems), and commands executed successfully.

same here. I can provoke this locked process problem by starting
one of my webserver jails. The first httpd process will lock up  after max. 30 minutes.

Problem is, that after lot httpd forks, apache can not fork any more child processes and the stuck (not killable) httpd processes all have a socket open, with the IP address of the webserver. So a restart of apache is not possible, because $IP:80 is already occupied.

The jail also cannot be stopped/started in this state.. Only choice there is: Restart the whole jail-host server (some processes would not die - ps -axl advised + unclean umounts of ufs partitions) or delete the IP-Adresse from the network interface and migrate the jail to another server (zfs send/receive).. no fun at all. BTW: zfs destroy also does not work here.

> init complained about wedged processes when the system was rebooted:

I use 'procstat -k -k -a | grep faul' to look for this condition..

This will find all processes in the table that contain 'trap_pfault'

> Oct  9 02:00:56 init: some processes would not die; ps axl advised
> 
> No indication of any hardware issues on the console.

here too.

> The administrator who was handling the issue did not use "ps -l", "top",
> nor "procstat -k", so we don't have any indication of what the process
> state was in, nor what the kernel calling stack looked like that lead up
> to the wedging.  All he stated was that the processes were in D/I
> states, which doesn't help since that's what they're in normally anyway.
> If I was around I would have forced DDB and done "call doadump" to
> investigate things post-mortem.

Another sign is an increased count of processes in 'top'. 

> Monitoring graphs of the system during this time don't indicate any
> signs of memory thrashing (though bsnmp-ucd doesn't provide as much
> granularity as top does); the system looks normal except for a slightly
> decreased load average (probably as a result of the deadlocked
> processes).

My server currently has 28 GB RAM, with < 60% usage and no special zfs tuning in loader.conf - although I tried to set vm.pmap.pg_ps_enabled="0" to find out if the locked processes had anything to do with it.
But setting it, did not prevent the problem from reoccurring.

> Aside from the top/procstat/kernel dump aspect, what other information
> would kernel folks be interested in?  Is "call doadump" sufficient for
> post-mortem investigation?  I need to know since if/when this happens
> again (likely), I want to get folks as much information as possible.

I'm also willing to help, but need explicit instructions. I could provoke such a lockup on one of my servers, but don't have that much time to leave the server in this state.. So only a small time frame to collect wanted debug data.

> Also, a question for Kai: what did you end up doing to resolve this
> problem?  Did you roll back to an older FreeBSD, or...?

This bug struck me really hard, because the affected server is not part of a cluster and hosts about 50 jails (mail, web, databases).
Problem is: Sockets held open by locked processes cannot be closed.. So a restart of a jammed service is not possible.
Theoretically I had the option to boot into the old world/kernel, but I'm sure with the old zfs.ko a zfs mount of ZFS v15 wouldn't be possible. AFAIK there is no zfs downgrade command or utility..

Of course a bare metal recovery of the whole server from tape was also a last option. But really??

my 'solution':

- move the most instable jails to other servers and restore them to UFS partitions.
- move everything else in the zpool temporarily to other servers running zfs (zfs send/recieve)
- zfs destroy -r
- zpool delete
- gpart create -t freebsd-ufs
- gpart add ...
- restore all jails from zfs to ufs

So the server was now reverted to ufs - just for the piece of (my) mind, although I waste around 50% of the raid capacity for reserved FS allocation and all the other disadvantages compared to a volume manager. I will still use zfs on several machines, but for some time not for critical data. ZFS is a nifty thing, but I really depend on a stable FS. (Of course for other people zfs v15 may be running smoothly)

I must repeat. I offer my help if someone wants to dig into the locking problem.  

Regards,
Kai.