Unkillable and runaway processes
Benjamin Close
Benjamin.Close at clearchain.com
Tue Sep 4 16:07:39 PDT 2007
Kenneth Vestergaard Schmidt wrote:
> Hello.
>
> Our ZFS testbed is experiencing some weird problems with rsync. We run a
> nightly backup of about 1.6 TB data (that's how much is stored, not how
> much is transferred), but after the initial sync I haven't been able to
> get the machine through one full cycle.
>
> After many hours of rsyncing data from 50+ machines, suddenly one
> rsync-process will hang, spinning on the CPU.
>
> It switches state between CPU0, CPU1, RUN and 'zfs:(&', but doesn't
> really do anything. It can't be killed, and you can't reboot the machine
> - it'll get past syncing disks, but won't shutdown or reboot.
>
> I can't do an 'ls' in the directory that rsync is running on - it'll
> just hang, too.
>
> The machine is running current from August 29th.
>
> I could use some pointers on what to do - is there some way I can debug
> this better, maybe give some better info?
>
>
I do a similar thing with close to 3 TB of data and have found that too
much activity causes the same hang you mention. Disabiling ZIL fixes the
issues:
vfs.zfs.zil_disable=1
in /boot/loader.conf
Since ZFS is always consistent on disk and ZIL and it's a nightly rsync,
disabling ZIL is quite safe.
I'd love to debug here this but can't as the box uses a USB
mouse/keyboard so every time I drop to a debugger I lose keyboard support :(
Cheers,
Benjamin
More information about the freebsd-current
mailing list