zfs parition probing causing long delay at BTX loader

Fri Oct 21 15:19:25 UTC 2011

Dear Steven,

This script freezes zfs on my 2 systems with replicated data, and on an
independent VM I created. I believe it freezes  on the rename line. This
only happens if you have a zvol.

So I will be removing my zvols, since I am not using them.

eg.
zfs create -V 10m tank/testzvol

then run script

#!/usr/local/bin/bash
#
# Author: Peter Maloney
# Purpose: try to crash ZFS by doing IO and renaming snapshots
#
# Result: it crashes every system I put it on, as long as there is a
zvol in the pool

dataset=tank

count=0
nextprint=$(date +%s)

while true; do
    echo Snapshot
    zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1
    zfs snapshot -r ${dataset}@testcrashsnap || break

    current=""
    for next in 1 2 3 4 5; do
        echo Renaming from ${current} to ${next}
        zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1
        zfs rename -r ${dataset}@testcrashsnap${current}
${dataset}@testcrashsnap${next} || break
        current=${next}
    done

    echo Destroy
    zfs destroy -r ${dataset}@testcrashsnap${current} || break
    let count++

    now=$(date +%s)
    if [ $now -gt $nextprint ]; then
        echo $count
        let nextprint+=1
    fi
done

I'll file a PR on Monday.

On 10/21/2011 01:37 PM, Peter Maloney wrote:
> First post failed, because the attachment was too big. (Or maybe you got
> a copy anyway since you are also in the To)
> Here it is again:
>
> On 10/21/2011 01:32 PM, Peter Maloney wrote:
>> On 10/21/2011 01:04 PM, Steven Hartland wrote:
>>> ----- Original Message ----- From: "Peter Maloney"
>>> <peter.maloney at brockmann-consult.de>
>>> To: <freebsd-stable at freebsd.org>
>>> Sent: Friday, October 21, 2011 11:17 AM
>>> Subject: Re: zfs parition probing causing long delay at BTX loader
>>>
>>>
>>>> On 10/20/2011 07:23 PM, Steven Hartland wrote:
>>>>> Installing a new machine here which has 10+ disks
>>>>> we're seeing BTX loader take 50+ seconds to enumerate
>>>>> the disks.
>>>> I am running 8-STABLE. On my system with 22 disks, it took much longer
>>>> than a minute (maybe 5 minutes... not sure, but overall boot was
>>>> about 7
>>>> minutes). While this time is passing, I can watch the leds on the disks
>>>> blink in order, many times in a loop.
>>>>
>>>> My IO card is a LSI SATA/SAS 9211-8i 6Gb/s.
>>> We are indeed using that 3 x 9211-8i's per chassis.
>>>> After I upgraded the firmware to version 11, it seems to take much less
>>>> time, but I didn't time it. And watching the LEDs last time I rebooted,
>>>> I don't notice them all blinking the same way. Instead, all were solid
>>>> for a second or two after the long wait, and then only the root disks.
>>> We are already running fw v11.00.00.00 but thanks for the heads up.
>>>
>> Are you running the IT or IR firmware version? I am running the IR one.
>>
>> And by the way, here is my uname -a:
>> # uname -a
>> FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29
>> 15:06:03 CEST 2011    
>> root at bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>> And I installed 8-STABLE using cvsup using this date filter in my
>> cvsup file:
>> *default date=2011.09.27.00.00.00
>>
>> And I remember one other thing I did since the firmware upgrade. I
>> booted off a Linux hard disk which I had to put in the first hard disk
>> bay, or it wouldn't boot from it. So I moved the root disk somewhere
>> else. FreeBSD still boots, so I left it where I moved it. I don't know
>> if that changed the boot time.
>>
>>>> So if you have the same card, I suggest you update the firmware. (I
>>>> updated for stability rather than boot speed, and it seemed stable
>>>> until
>>>> it froze today, after 2 weeks)
>>> Do you have any information about the hang?
>> I decided to rename some of my replication snapshots to fill in gaps
>> from daily snapshots (since I wasn't always doing them daily)... just
>> so I could delete old replication snapshots.
>>
>> So I wrote a bash script to take the first replication snapshot per
>> day and rename it
>> (need to get bash from ports or hope it runs in sh):
>>
>> for day in {4..16}; do
>>     if [ "$day" -lt 10 ]; then
>>         day="0$day"
>>     fi
>>
>>     firstSnapshotOfDay=$(zfs list -o name -t snapshot -r tank | grep
>> -E "^tank at replication-201110${day}" | head -n1)
>>
>>     if [ "$firstSnapshotOfDay" = "" ]; then
>>         continue
>>     fi
>>
>>     time=$(echo ${firstSnapshotOfDay} | cut -d'-' -f2)
>>     hour=${time:8:2}
>>     minute=${time:10:2}
>>     second=${time:12:2}
>>
>>     echo "============="
>>     echo $day $firstSnapshotOfDay $time $hour $minute $second
>>     echo zfs rename -r "${firstSnapshotOfDay}"
>> tank at daily-2011-10-${day}T${hour}:${minute}:${second}
>> done
>>
>> And then I took the output from it, and started running it.
>> For example:
>>
>> zfs rename -r tank at replication-20111004111436
>> tank at daily-2011-10-04T11:14:36
>>
>> I ran 8 of the commands like the above, which took about 1 second
>> each. Then the 9th command froze.
>>
>> root at bcnas1:~/bin/zfs/snapshots# zfs rename -r
>> tank at replication-20111013000000 tank at daily-2011-10-13T00:00:00
>> ^C
>> load: 0.13  cmd: zfs 70731 [tx->tx_sync_done_cv)] 489.40r 0.00u 0.07s
>> 0% 1760k
>> load: 0.01  cmd: zfs 70731 [tx->tx_sync_done_cv)] 638.13r 0.00u 0.07s
>> 0% 1328k
>>
>> I then tried other things in other windows (using screen). Anything
>> involving zpool or zfs would hang like this:
>>
>> root at bcnas1:~/bin/rsync# zpool status
>> ^C^C
>> load: 0.12  cmd: zpool 87352 [spa_namespace_lock] 479.77r 0.00u 0.00s
>> 0% 1628k
>> load: 0.01  cmd: zpool 87352 [spa_namespace_lock] 616.65r 0.00u 0.00s
>> 0% 1288k
>>
>> Other attempts to read from the tank zpool worked fine. But maybe it
>> was only reading from arc and l2arc. The system has 48 GB of memory.
>> And my NFS mounts stopped working. NFS requests would just timeout.
>>
>> top, gstat, etc. all show an idle system. I had a "zpool iostat 5"
>> running in another window, which was not frozen, but also just looked
>> normal and idle.
>>
>> It reminds me of when I was using 8.2-RELEASE, and zfs destroy on a
>> snapshot caused a kernel panic.
>>
>> I also tried kill -9 on the above "zfs rename" process, which didn't
>> work (normal for processes waiting for kernel calls).
>>
>> Then I did "shutdown -r now" which got me to a screen (attached a
>> photo... unfortunately I don't know how the serial console or KVM over
>> IP works so can't get a proper log) saying that shutdown terminated
>> abnormally, going to single user mode (never got a prompt though), and
>> "some processes would not die; ps axl advised". I don't know which
>> processes hung (58465 I guess) . If ctrl+t is showing the process id,
>> then it wasn't one of the commands above. I think it might have been
>> nfsd. (I have also found that "/etc/rc.d/nfsd restart" causes nfs to
>> stop and never come back, and use lots of cpu; here is my post about
>> it: http://forums.freebsd.org/showthread.php?t=26727)
>>
>> I looked through the logs, but don't see anything interesting.  Here
>> is the log from my reboot until not including the next boot:
>>
>> Oct 21 11:39:59 bcnas1 shutdown: reboot by peter:
>> Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.422014,  0]
>> nmbd/nmbd_browsesync.c:585(collect_all_workgroup_names_from_wins_server)
>> Oct 21 11:40:01 bcnas1 nmbd[64827]:  
>> collect_all_workgroup_names_from_wins_server:
>> Oct 21 11:40:01 bcnas1 nmbd[64827]:   Cannot find my workgroup
>> ARBEITSGRUPPE on subnet UNICAST_SUBNET.
>> Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.522809,  0]
>> nmbd/nmbd_browsesync.c:585(collect_all_workgroup_names_from_wins_server)
>> Oct 21 11:40:01 bcnas1 nmbd[64827]:  
>> collect_all_workgroup_names_from_wins_server:
>> Oct 21 11:40:01 bcnas1 nmbd[64827]:   Cannot find my workgroup
>> ARBEITSGRUPPE on subnet UNICAST_SUBNET.
>> Oct 21 11:40:01 bcnas1 ntpd[75976]: ntpd exiting on signal 15
>> Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.857859,  0]
>> nmbd/nmbd.c:71(terminate)
>> Oct 21 11:40:01 bcnas1 nmbd[64827]:   Got SIGTERM: going down...
>> Oct 21 11:40:31 bcnas1 rc.shutdown: 30 second watchdog timeout
>> expired. Shutdown terminated.
>> Oct 21 11:40:31 bcnas1 init: /bin/sh on /etc/rc.shutdown terminated
>> abnormally, going to single user mode
>> Oct 21 11:40:31 bcnas1 syslogd: exiting on signal 15
>> Oct 21 11:47:39 bcnas1 syslogd: kernel boot file is /boot/kernel/kernel
>>
>> Looking at the screenshot, maybe 58465 is the process id that wouldn't
>> die. I could not find that number in /var/log/messages.
>>
>>
>> Any suggestions on where I can find useful information?
>>
>> I plan to keep searching for something that shows error messages. On
>> every other panic, I would find SCSI errors, mps driver messages, etc.
>>
>>>    Regards
>>>    Steve
>>> This e.mail is private and confidential between Multiplay (UK) Ltd.
>>> and the person or entity to whom it is addressed. In the event of
>>> misdirection, the recipient is prohibited from using, copying,
>>> printing or otherwise disseminating it or any information contained
>>> in it.
>>>
>>> In the event of misdirection, illegible or incomplete transmission
>>> please telephone +44 845 868 1337 or return the E.mail to
>>> postmaster at multiplay.co.uk.
>>>
>>
>> -- 
>>
>> --------------------------------------------
>> Peter Maloney
>> Brockmann Consult
>> Max-Planck-Str. 2
>> 21502 Geesthacht
>> Germany
>> Tel: +49 4152 889 300
>> Fax: +49 4152 889 333
>> E-mail: peter.maloney at brockmann-consult.de
>> Internet: http://www.brockmann-consult.de
>> --------------------------------------------
>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------