New ZFSv28 patchset for 8-STABLE: Kernel Panic

Mon Dec 27 05:30:31 UTC 2010

Hi

On 27 December 2010 16:04, jhell <jhell at dataix.net> wrote:

>
> Before anything else can you: (in FreeBSD)
>
> 1) Set vfs.zfs.recover=1 at the loader prompt (OK set vfs.zfs.recover=1)
> 2) Boot into single user mode without opensolaris.ko and zfs.ko loaded
> 3) ( mount -w / ) to make sure you can remove and also write new
> zpool.cache as needed.
> 3) Remove /boot/zfs/zpool.cache
> 4) kldload both zfs and opensolaris i.e. ( kldload zfs ) should do the trick
> 5) verify that vfs.zfs.recover=1 is set then ( zpool import pool )
> 6) Give it a little bit monitor activity using Ctrl+T to see activity.
>
> You should have your pool back to a working condition after this. The
> reason why oi_127 can't work with your pool is because it cannot see
> FreeBSD generic labels. The only way to work around this for oi_127
> would be to either point it directly at the replacing device or to use
> actual slices or partitions for your slogs and other such devices.
>
> Use adaNsN or gpt or gptid for working with your pool if you plan on
> using other OS's for recovery effects.
>

Hi..

Thank you for your response, I will keep it safely should it ever occur again.

Let me explain why I used labels..

It all started when I was trying to solve some serious performance
issue when running with zfs
http://forums.freebsd.org/showthread.php?t=20476

One of the step in trying to trouble shoot the latency problem, was to
use AHCI ; I had always thought that activating AHCI in the bios was
sufficient to get it going on FreeBSD, turned out that was the case
and that I needed to load ahci.ko as well.

After doing so, my system wouldn't boot anymore as it was trying to be
/dev/ad0 which didn't exist anymore and was now names /dev/ata0.
So I used a label to the boot disk to ensure that I will never
encounter that problem ever again.

In the same mindset, I used labels for the cache and log device I
later added to the pool...

I have to say however, that zfs had no issue using the labels until I
tried to remove it. I had rebooted several times without having any
problems.
zpool status never hanged

It all started to play up when I ran the command:
zpool remove pool log label/zil

zpool never ever came out from running that command (I let it run for
a good 30 minutes, during which I was fearing the worse, and once I
rebooted and nothing ever worked, suicide looked like an appealing
alternative)

It is very disappointing however that because the pool is in a
non-working state, none of the command available to troubleshoot the
problem would actually work (which I'm guessing is related to zpool
looking for a device name that it can never find being a label)

I also can't explain why FreeBSD would kernel panic when it was
finally in a state of being able to do an import.

I have to say unfortunately, that if I hadn't had OpenIndiana, I would
probably still by crying underneath my desk right now...

Thanks again for your email, I have no doubt that this would have
worked but in my situation, I got your answer in just 2 hours, which
is better than any paid support could provide !

Jean-Yves
PS: saving my 5MB files over the network , went from 40-55s with v15
to a constant 16s with v28... I can't test with ZIL completely
disabled , it seems that vfs.zfs.zil_disable has been removed, and so
did vfs.zfs.write_limit_override