From bugmaster at FreeBSD.org Mon Feb 2 03:06:52 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Feb 2 03:07:49 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200902021106.n12B6oQP094413@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/131086 fs [ext2fs] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/131009 fs [ext2fs] [hang] System freezes when attempting to copy o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs] [zfs] [panic] NFS v3 Panic when under high load o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 38 problems total. From antik at bsd.ee Mon Feb 2 04:29:29 2009 From: antik at bsd.ee (Andrei Kolu) Date: Mon Feb 2 04:29:35 2009 Subject: zfs compression and nfs Message-ID: <4986E2F2.8070903@bsd.ee> Hi, I encouontered strange problem with zfs compressed volume that is shared out over nfs. volume is created with command: # zpool create example /dev/da1 # zfs set compression=gzip data/configuration Now my "data" is shared with NFS and all servers have access to "configuration" volume. All NFS clients can write to volume and show written files over network. What is missing is files from server side- it does not show any file on compressed volume that is written by clients over NFS. If I copy same files/directories to nfs root eg. "data" then I can access files from server. Where are my files? # zpool status pool: data state: ONLINE scrub: scrub in progress, 19.58% done, 1h9m to go config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 da1 ONLINE 0 0 0 errors: No known data errors From antik at bsd.ee Mon Feb 2 05:40:57 2009 From: antik at bsd.ee (Andrei Kolu) Date: Mon Feb 2 05:42:29 2009 Subject: zfs compression and nfs In-Reply-To: References: <4986E2F2.8070903@bsd.ee> Message-ID: <4986F7E3.6020404@bsd.ee> Markus Gebert wrote: > Hi Andrei > > Am 02.02.2009 um 13:11 schrieb Andrei Kolu: > >> I encouontered strange problem with zfs compressed volume that is >> shared out over nfs. >> >> volume is created with command: >> >> # zpool create example /dev/da1 >> >> # zfs set compression=gzip data/configuration > > Since 'zfs set' is usually used on a file system (i.e. not a > directory), I assume 'data/configuration' is a zfs filesystem separate > from 'data/'. > Yes, it is created with command (I forgot to add it in my first post): # zfs create data/configuration # mount data on /data (zfs, NFS exported, local) data/configuration on /data/configuration (zfs, local) data/iscsi on /data/iscsi (zfs, local) > >> Now my "data" is shared with NFS and all servers have access to >> "configuration" volume. All NFS clients can write to volume and show >> written files over network. What is missing is files from server >> side- it does not show any file on compressed volume that is written >> by clients over NFS. If I copy same files/directories to nfs root eg. >> "data" then I can access files from server. Where are my files? > > > I don't think this is related to compression. > > If 'data/' and 'data/configuration' really happen to be different > filesystems and you're mounting only 'data/' on the client, the > behaviour you're seeing is expected. What's happening is that you're > client is able to to see the configuration _directory_ inside the > mounted 'data/' filesystem. But since the nfsclient won't be able to > cross filesystem boundaries on the server (nfs restriction), changing > to that directory and writing a file on the client will actually > result in the file being written to the 'data/' filesystem on the > server (inside it's 'configuration' _directory_). You are not seeing > these files on the server, because there 'data/configuration' is > actually you're compressed zfs filesysten that never got a write. You > should be able to make the lost files visible on the server by > umounting 'data/configuration': > > # zfs umount data/configuration > > Of course this does not solve your problem. I guess you need to export > 'data/configuration' too and mount it on the client. > But I can see "configuration" directory from NFS client!? If I understand correctly then NFS can't use "filesystem on filesystem" for example my case with "data/configuration"? Can I compress "data" then? All other subfilesystems will be compressed also? How can I see what compression ratio I got on compressed filesystem? So many questions... From markus.gebert at hostpoint.ch Mon Feb 2 05:45:55 2009 From: markus.gebert at hostpoint.ch (Markus Gebert) Date: Mon Feb 2 05:46:01 2009 Subject: zfs compression and nfs In-Reply-To: <4986E2F2.8070903@bsd.ee> References: <4986E2F2.8070903@bsd.ee> Message-ID: Hi Andrei Am 02.02.2009 um 13:11 schrieb Andrei Kolu: > I encouontered strange problem with zfs compressed volume that is > shared out over nfs. > > volume is created with command: > > # zpool create example /dev/da1 > > # zfs set compression=gzip data/configuration Since 'zfs set' is usually used on a file system (i.e. not a directory), I assume 'data/configuration' is a zfs filesystem separate from 'data/'. > Now my "data" is shared with NFS and all servers have access to > "configuration" volume. All NFS clients can write to volume and show > written files over network. What is missing is files from server > side- it does not show any file on compressed volume that is written > by clients over NFS. If I copy same files/directories to nfs root > eg. "data" then I can access files from server. Where are my files? I don't think this is related to compression. If 'data/' and 'data/configuration' really happen to be different filesystems and you're mounting only 'data/' on the client, the behaviour you're seeing is expected. What's happening is that you're client is able to to see the configuration _directory_ inside the mounted 'data/' filesystem. But since the nfsclient won't be able to cross filesystem boundaries on the server (nfs restriction), changing to that directory and writing a file on the client will actually result in the file being written to the 'data/' filesystem on the server (inside it's 'configuration' _directory_). You are not seeing these files on the server, because there 'data/configuration' is actually you're compressed zfs filesysten that never got a write. You should be able to make the lost files visible on the server by umounting 'data/configuration': # zfs umount data/configuration Of course this does not solve your problem. I guess you need to export 'data/configuration' too and mount it on the client. Markus From 000.fbsd at quip.cz Mon Feb 2 06:31:11 2009 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Mon Feb 2 06:31:18 2009 Subject: zfs compression and nfs In-Reply-To: <4986F7E3.6020404@bsd.ee> References: <4986E2F2.8070903@bsd.ee> <4986F7E3.6020404@bsd.ee> Message-ID: <4986FF23.40502@quip.cz> Andrei Kolu wrote: > Markus Gebert wrote: [...] >> Of course this does not solve your problem. I guess you need to export >> 'data/configuration' too and mount it on the client. >> > But I can see "configuration" directory from NFS client!? If I > understand correctly then NFS can't use "filesystem on filesystem" for > example my case with "data/configuration"? Can I compress "data" then? > All other subfilesystems will be compressed also? How can I see what > compression ratio I got on compressed filesystem? 'zfs get -r compressratio' will show you ratio for all ZFS filesystems. Miroslav Lachman From markus.gebert at hostpoint.ch Mon Feb 2 06:33:44 2009 From: markus.gebert at hostpoint.ch (Markus Gebert) Date: Mon Feb 2 06:33:50 2009 Subject: zfs compression and nfs In-Reply-To: <4986F7E3.6020404@bsd.ee> References: <4986E2F2.8070903@bsd.ee> <4986F7E3.6020404@bsd.ee> Message-ID: <71E28FE1-0A16-499B-B240-A7D9BAC2D8FF@hostpoint.ch> Andrei Kolu wrote: >> >> >>> Now my "data" is shared with NFS and all servers have access to >>> "configuration" volume. All NFS clients can write to volume and >>> show written files over network. What is missing is files from >>> server side- it does not show any file on compressed volume that >>> is written by clients over NFS. If I copy same files/directories >>> to nfs root eg. "data" then I can access files from server. Where >>> are my files? >> >> >> I don't think this is related to compression. >> >> If 'data/' and 'data/configuration' really happen to be different >> filesystems and you're mounting only 'data/' on the client, the >> behaviour you're seeing is expected. What's happening is that >> you're client is able to to see the configuration _directory_ >> inside the mounted 'data/' filesystem. But since the nfsclient >> won't be able to cross filesystem boundaries on the server (nfs >> restriction), changing to that directory and writing a file on the >> client will actually result in the file being written to the >> 'data/' filesystem on the server (inside it's 'configuration' >> _directory_). You are not seeing these files on the server, because >> there 'data/configuration' is actually you're compressed zfs >> filesysten that never got a write. You should be able to make the >> lost files visible on the server by umounting 'data/configuration': >> >> # zfs umount data/configuration >> >> Of course this does not solve your problem. I guess you need to >> export 'data/configuration' too and mount it on the client. >> > But I can see "configuration" directory from NFS client!? Yes, you can, because that directory is part of the 'data/' filesystem and used (by zfs on the server) as a mount point for the 'data/ configuration' filesystem. > If I understand correctly then NFS can't use "filesystem on > filesystem" for example my case with "data/configuration"? Well, at least nfsv3 and lower don't have this ability for sure. I once heard that nfsv4 might do it, but I at least for me, that didn't work on FreeBSD (tested with 7.0 which has only quite basic nfsv4 support AFAIK). But you can mount all your zfs filesystems on the client, i.e.: # mkdir /mnt/data # mount_nfs -3 server:/data /mnt/data # mount_nfs -3 server:/data/configuration /mnt/data/configuration > Can I compress "data" then? You could, since data is just another zfs filesystem. But if you mount like stated above, you should already have achieved your goal. > All other subfilesystems will be compressed also? 'compression' is a zfs property. If you set a property on the top- level zfs of a pool, then usually it will be inherited by all filesystems within the pool. But you can override properties for subfilesystems. > How can I see what compression ratio I got on compressed filesystem? # zfs get compressratio data/configuration btw: # man zfs Markus From vova at fbsd.ru Mon Feb 2 07:06:29 2009 From: vova at fbsd.ru (Vladimir Grebenschikov) Date: Mon Feb 2 07:06:36 2009 Subject: failed to build sysutils/fusefs-kmod on recent 8-CURRENT Message-ID: <1233585885.56108.1.camel@localhost> ===> Found saved configuration for fusefs-kmod-0.3.0_5 ===> Extracting for fusefs-kmod-0.3.9.p1.20080208_5 => MD5 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. => SHA256 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. ===> Patching for fusefs-kmod-0.3.9.p1.20080208_5 ===> Applying FreeBSD patches for fusefs-kmod-0.3.9.p1.20080208_5 ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on package: fusefs-libs>2.4.1 - found ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on executable: deplate - found ===> Configuring for fusefs-kmod-0.3.9.p1.20080208_5 ===> Building for fusefs-kmod-0.3.9.p1.20080208_5 ===> fuse_module (all) Warning: Object directory not changed from original /usr/ports/sysutils/fusefs-kmod/work/fuse4bsd-498acaef33b0/fuse_module @ -> /usr/src/sys machine -> /usr/src/sys/i386/include awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -q awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -h cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_main.c cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_msg.c cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_dev.c cc1: warnings being treated as errors fuse_dev.c: In function 'fusedev_clone': fuse_dev.c:556: warning: implicit declaration of function 'unit2minor' fuse_dev.c:556: warning: nested extern declaration of 'unit2minor' -- Vladimir B. Grebenschikov vova@fbsd.ru From aryeh.friedman at gmail.com Mon Feb 2 07:39:38 2009 From: aryeh.friedman at gmail.com (Aryeh M. Friedman) Date: Mon Feb 2 07:39:45 2009 Subject: failed to build sysutils/fusefs-kmod on recent 8-CURRENT In-Reply-To: <1233585885.56108.1.camel@localhost> References: <1233585885.56108.1.camel@localhost> Message-ID: <49870CA1.9010804@gmail.com> Vladimir Grebenschikov wrote: > ===> Found saved configuration for fusefs-kmod-0.3.0_5 > ===> Extracting for fusefs-kmod-0.3.9.p1.20080208_5 > => MD5 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. > => SHA256 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. > ===> Patching for fusefs-kmod-0.3.9.p1.20080208_5 > ===> Applying FreeBSD patches for fusefs-kmod-0.3.9.p1.20080208_5 > ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on package: fusefs-libs>2.4.1 - found > ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on executable: deplate - found > ===> Configuring for fusefs-kmod-0.3.9.p1.20080208_5 > ===> Building for fusefs-kmod-0.3.9.p1.20080208_5 > ===> fuse_module (all) > Warning: Object directory not changed from original /usr/ports/sysutils/fusefs-kmod/work/fuse4bsd-498acaef33b0/fuse_module > @ -> /usr/src/sys > machine -> /usr/src/sys/i386/include > awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p > awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -q > awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -h > cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_main.c > cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_msg.c > cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c fuse_dev.c > cc1: warnings being treated as errors > fuse_dev.c: In function 'fusedev_clone': > fuse_dev.c:556: warning: implicit declaration of function 'unit2minor' > fuse_dev.c:556: warning: nested extern declaration of 'unit2minor' > > Even though it is a total hack I had no issue by just remove the unit2minor call and using the raw param From aryeh.friedman at gmail.com Mon Feb 2 07:41:15 2009 From: aryeh.friedman at gmail.com (Aryeh M. Friedman) Date: Mon Feb 2 07:41:21 2009 Subject: failed to build sysutils/fusefs-kmod on recent 8-CURRENT In-Reply-To: <49870CA1.9010804@gmail.com> References: <1233585885.56108.1.camel@localhost> <49870CA1.9010804@gmail.com> Message-ID: <49870D0E.3010004@gmail.com> Aryeh M. Friedman wrote: > Vladimir Grebenschikov wrote: >> ===> Found saved configuration for fusefs-kmod-0.3.0_5 >> ===> Extracting for fusefs-kmod-0.3.9.p1.20080208_5 >> => MD5 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. >> => SHA256 Checksum OK for fuse4bsd/498acaef33b0.tar.gz. >> ===> Patching for fusefs-kmod-0.3.9.p1.20080208_5 >> ===> Applying FreeBSD patches for fusefs-kmod-0.3.9.p1.20080208_5 >> ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on package: >> fusefs-libs>2.4.1 - found >> ===> fusefs-kmod-0.3.9.p1.20080208_5 depends on executable: deplate >> - found >> ===> Configuring for fusefs-kmod-0.3.9.p1.20080208_5 >> ===> Building for fusefs-kmod-0.3.9.p1.20080208_5 >> ===> fuse_module (all) >> Warning: Object directory not changed from original >> /usr/ports/sysutils/fusefs-kmod/work/fuse4bsd-498acaef33b0/fuse_module >> @ -> /usr/src/sys >> machine -> /usr/src/sys/i386/include >> awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p >> awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -q >> awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -h >> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >> -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 >> --param inline-unit-growth=100 --param large-function-growth=1000 >> -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 >> -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding >> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall >> -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >> -Wno-pointer-sign -fformat-extensions -c fuse_main.c >> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >> -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 >> --param inline-unit-growth=100 --param large-function-growth=1000 >> -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 >> -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding >> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall >> -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >> -Wno-pointer-sign -fformat-extensions -c fuse_msg.c >> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >> -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 >> --param inline-unit-growth=100 --param large-function-growth=1000 >> -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 >> -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding >> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall >> -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >> -Wno-pointer-sign -fformat-extensions -c fuse_dev.c >> cc1: warnings being treated as errors >> fuse_dev.c: In function 'fusedev_clone': >> fuse_dev.c:556: warning: implicit declaration of function 'unit2minor' >> fuse_dev.c:556: warning: nested extern declaration of 'unit2minor' >> >> > Even though it is a total hack I had no issue by just remove the > unit2minor call and using the raw param > Forgot yo mention (even though off topic) the same hack *DOES NOT* work for x11/nvidia-driver if that is an issue for you From gavin at FreeBSD.org Mon Feb 2 12:46:42 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Mon Feb 2 12:46:50 2009 Subject: kern/68978: [panic] [ufs] crashes with failing hard disk, loose pointers in kernel? Message-ID: <200902022046.n12KkfiG038049@freefall.freebsd.org> Old Synopsis: [firewire] firewire crashes with failing hard disk, loose pointers in kernel? New Synopsis: [panic] [ufs] crashes with failing hard disk, loose pointers in kernel? Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: gavin Responsible-Changed-When: Mon Feb 2 20:41:44 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). To be honest, I'm not sure this will be a priority - the UFS code is known to expect that the underlying media is 100% working. I'll keep this PR open though, as some progress has been made recently on fixing up these assumptions. http://www.freebsd.org/cgi/query-pr.cgi?pr=68978 From jmrueda at diatel.upm.es Mon Feb 2 17:33:11 2009 From: jmrueda at diatel.upm.es (=?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?=) Date: Mon Feb 2 17:33:18 2009 Subject: Raidz2 pool with single disk failure is faulted Message-ID: <49879C62.6070509@diatel.upm.es> On a FreeBSD 7.1-PRERELEASE amd64 system I had a raidz2 pool made up of 8 disks. Due to some things I tried in the past, the pool was currently like this: z1 ONLINE raidz2 ONLINE mirror/gm0 ONLINE mirror/gm1 ONLINE da2 ONLINE da3 ONLINE da4 ONLINE da5 ONLINE da6 ONLINE da7 ONLINE da2 to da7 where originally mirror/gm2 to mirror/gm7, but I replaced them little by little, eliminating the corresponding gmirrors at the same time. I don't think this is relevant for what I'm goint to explain, but I mention it just in case... One day, after a system reboot, one of the disks (da4) was dead and FreeBSD renamed all of the other disks that used to be after it (da5 became da4, da6 became da5, and da7 became da6). The pool was unavailable (da4 to da6 marked as corrupt and da7 as unavailable) because I suppose ZFS couldn't match the contents in the last 3 disks to their new names. I was able to fix this by inserting a blank new disk, rebooting, now the disk names were correct again, and the pool showed up as degraded because da4 was unavailable, but usable. I resilvered the pool and everything was back to normal. Yesterday, another disk died after a system reboot and the pool was unavailable again because of the automatic renaming of the SCSI disks. However, this time I didn't substitute it by a blank disk, but for another identical disk which I had been using in the past in a different ZFS pool on a different computer, but with the same name (z1) and same characteristics (raidz2, 8 disks). The disk hadn't been erased and its pool hadn't been destroyed, so it still had whatever ZFS stored in it. After rebooting, it seems ZFS got confused or something when it found out about two different active pools with the same name, etc. and it faulted the pool. I stopped ZFS, wiped the beginning and end of the disk with zeroes, but the problem persisted. Finally, I tried to export and import the pool, as I read somewhere that may help, but zpool import complains about an I/O error (which I imagine is ficticious, because all of the disks are find, I can read from them with dd no problem). The current situation is this: # zpool import pool: z1 id: 8828203687312199578 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-5E config: z1 FAULTED corrupted data raidz2 ONLINE mirror/gm0 ONLINE mirror/gm1 ONLINE da2 ONLINE da3 ONLINE da4 UNAVAIL corrupted data da5 ONLINE da6 ONLINE da7 ONLINE # zpool import -f z1 cannot import 'z1': I/O error By the way, before exporting the pool, the CKSUM column in "zpool status" showed 6 errors. However, zpool status -v didn't give any additional information. How come the pool is faulted if it is raidz2 and 7 out of 8 disks are reported as fine? Any idea how to recover the pool? The data has to be in there, as I haven't done any other destructive operation, as far as I can think of, and I imagine it should be some stupid little detail. I have dumped all of the labels in the 8 disks with zdb -l, and I don't see anything peculiar. They are fine in the 7 online disks, and it doesn't exist in the da4 disk. Is there some kind of diagnostic tools similar to dumpfs, but for zfs? I can provide additional information if needed. From morganw at chemikals.org Mon Feb 2 20:03:05 2009 From: morganw at chemikals.org (Wes Morgan) Date: Mon Feb 2 20:03:12 2009 Subject: Raidz2 pool with single disk failure is faulted In-Reply-To: <49879C62.6070509@diatel.upm.es> References: <49879C62.6070509@diatel.upm.es> Message-ID: On Tue, 3 Feb 2009, Javier Mart?n Rueda wrote: > On a FreeBSD 7.1-PRERELEASE amd64 system I had a raidz2 pool made up of 8 > disks. Due to some things I tried in the past, the pool was currently like > this: > > z1 ONLINE > raidz2 ONLINE > mirror/gm0 ONLINE > mirror/gm1 ONLINE > da2 ONLINE > da3 ONLINE > da4 ONLINE > da5 ONLINE > da6 ONLINE > da7 ONLINE > > da2 to da7 where originally mirror/gm2 to mirror/gm7, but I replaced them > little by little, eliminating the corresponding gmirrors at the same time. I > don't think this is relevant for what I'm goint to explain, but I mention it > just in case... > > One day, after a system reboot, one of the disks (da4) was dead and FreeBSD > renamed all of the other disks that used to be after it (da5 became da4, da6 > became da5, and da7 became da6). The pool was unavailable (da4 to da6 marked > as corrupt and da7 as unavailable) because I suppose ZFS couldn't match the > contents in the last 3 disks to their new names. I was able to fix this by > inserting a blank new disk, rebooting, now the disk names were correct again, > and the pool showed up as degraded because da4 was unavailable, but usable. I > resilvered the pool and everything was back to normal. > > Yesterday, another disk died after a system reboot and the pool was > unavailable again because of the automatic renaming of the SCSI disks. > However, this time I didn't substitute it by a blank disk, but for another > identical disk which I had been using in the past in a different ZFS pool on > a different computer, but with the same name (z1) and same characteristics > (raidz2, 8 disks). The disk hadn't been erased and its pool hadn't been > destroyed, so it still had whatever ZFS stored in it. > > After rebooting, it seems ZFS got confused or something when it found out > about two different active pools with the same name, etc. and it faulted the > pool. I stopped ZFS, wiped the beginning and end of the disk with zeroes, but > the problem persisted. Finally, I tried to export and import the pool, as I > read somewhere that may help, but zpool import complains about an I/O error > (which I imagine is ficticious, because all of the disks are find, I can read > from them with dd no problem). > > The current situation is this: > > # zpool import > pool: z1 > id: 8828203687312199578 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on on another system, but can be imported using > the '-f' flag. > see: http://www.sun.com/msg/ZFS-8000-5E > config: > > z1 FAULTED corrupted data > raidz2 ONLINE > mirror/gm0 ONLINE > mirror/gm1 ONLINE > da2 ONLINE > da3 ONLINE > da4 UNAVAIL corrupted data > da5 ONLINE > da6 ONLINE > da7 ONLINE > # zpool import -f z1 > cannot import 'z1': I/O error > > By the way, before exporting the pool, the CKSUM column in "zpool status" > showed 6 errors. However, zpool status -v didn't give any additional > information. > > How come the pool is faulted if it is raidz2 and 7 out of 8 disks are > reported as fine? Any idea how to recover the pool? The data has to be in > there, as I haven't done any other destructive operation, as far as I can > think of, and I imagine it should be some stupid little detail. > > I have dumped all of the labels in the 8 disks with zdb -l, and I don't see > anything peculiar. They are fine in the 7 online disks, and it doesn't exist > in the da4 disk. > > Is there some kind of diagnostic tools similar to dumpfs, but for zfs? > > I can provide additional information if needed. I would try removing /boot/zfs/zpool.cache and re-importing, and if that doesn't work detach da4 device (camcontrol stop da4 or so) and see if it will import. Also make sure you wiped at least 512k from the front of the drive. From jmrueda at diatel.upm.es Mon Feb 2 23:09:06 2009 From: jmrueda at diatel.upm.es (=?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?=) Date: Mon Feb 2 23:09:13 2009 Subject: Raidz2 pool with single disk failure is faulted In-Reply-To: References: <49879C62.6070509@diatel.upm.es> Message-ID: <4987ED81.6080008@diatel.upm.es> Wes Morgan escribi?: > On Tue, 3 Feb 2009, Javier Mart?n Rueda wrote: > >> On a FreeBSD 7.1-PRERELEASE amd64 system I had a raidz2 pool made up >> of 8 disks. Due to some things I tried in the past, the pool was >> currently like this: >> >> z1 ONLINE >> raidz2 ONLINE >> mirror/gm0 ONLINE >> mirror/gm1 ONLINE >> da2 ONLINE >> da3 ONLINE >> da4 ONLINE >> da5 ONLINE >> da6 ONLINE >> da7 ONLINE >> >> da2 to da7 where originally mirror/gm2 to mirror/gm7, but I replaced >> them little by little, eliminating the corresponding gmirrors at the >> same time. I don't think this is relevant for what I'm goint to >> explain, but I mention it just in case... >> >> One day, after a system reboot, one of the disks (da4) was dead and >> FreeBSD renamed all of the other disks that used to be after it (da5 >> became da4, da6 became da5, and da7 became da6). The pool was >> unavailable (da4 to da6 marked as corrupt and da7 as unavailable) >> because I suppose ZFS couldn't match the contents in the last 3 disks >> to their new names. I was able to fix this by inserting a blank new >> disk, rebooting, now the disk names were correct again, and the pool >> showed up as degraded because da4 was unavailable, but usable. I >> resilvered the pool and everything was back to normal. >> >> Yesterday, another disk died after a system reboot and the pool was >> unavailable again because of the automatic renaming of the SCSI >> disks. However, this time I didn't substitute it by a blank disk, but >> for another identical disk which I had been using in the past in a >> different ZFS pool on a different computer, but with the same name >> (z1) and same characteristics (raidz2, 8 disks). The disk hadn't been >> erased and its pool hadn't been destroyed, so it still had whatever >> ZFS stored in it. >> >> After rebooting, it seems ZFS got confused or something when it found >> out about two different active pools with the same name, etc. and it >> faulted the pool. I stopped ZFS, wiped the beginning and end of the >> disk with zeroes, but the problem persisted. Finally, I tried to >> export and import the pool, as I read somewhere that may help, but >> zpool import complains about an I/O error (which I imagine is >> ficticious, because all of the disks are find, I can read from them >> with dd no problem). >> >> The current situation is this: >> >> # zpool import >> pool: z1 >> id: 8828203687312199578 >> state: FAULTED >> status: One or more devices contains corrupted data. >> action: The pool cannot be imported due to damaged devices or data. >> The pool may be active on on another system, but can be >> imported using >> the '-f' flag. >> see: http://www.sun.com/msg/ZFS-8000-5E >> config: >> >> z1 FAULTED corrupted data >> raidz2 ONLINE >> mirror/gm0 ONLINE >> mirror/gm1 ONLINE >> da2 ONLINE >> da3 ONLINE >> da4 UNAVAIL corrupted data >> da5 ONLINE >> da6 ONLINE >> da7 ONLINE >> # zpool import -f z1 >> cannot import 'z1': I/O error >> >> By the way, before exporting the pool, the CKSUM column in "zpool >> status" showed 6 errors. However, zpool status -v didn't give any >> additional information. >> >> How come the pool is faulted if it is raidz2 and 7 out of 8 disks are >> reported as fine? Any idea how to recover the pool? The data has to >> be in there, as I haven't done any other destructive operation, as >> far as I can think of, and I imagine it should be some stupid little >> detail. >> >> I have dumped all of the labels in the 8 disks with zdb -l, and I >> don't see anything peculiar. They are fine in the 7 online disks, and >> it doesn't exist in the da4 disk. >> >> Is there some kind of diagnostic tools similar to dumpfs, but for zfs? >> >> I can provide additional information if needed. > > I would try removing /boot/zfs/zpool.cache and re-importing, and if > that doesn't work detach da4 device (camcontrol stop da4 or so) and > see if it will import. > > Also make sure you wiped at least 512k from the front of the drive. I tried all that, but nothing worked. I've tried to trace what's going on in the kernel when I try to import the pool. The problem seems to be in dsl_pool_open(). In the first part of the function, there is this call: err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1, &dp->dp_root_dir_obj); if (err) goto out; zap_lookup() was returning EIO, but I don't think it is a real I/O problem, but a checksumming problem, because I also got these messages: zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 15 error 86 retry #1 for read to raidz offset 6eb6d6d9000 zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 15 error 86 zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 16 error 86 zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 15 error 86 retry #1 for read to raidz offset c03bf45800 zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 15 error 86 zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 16 error 86 zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 15 error 86 retry #1 for read to raidz offset 5902760f800 zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 15 error 86 zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 16 error 86 zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 15 error 86 retry #1 for read to offset 0 zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 15 error 86 retry #1 for read to raidz offset 6eb6d6d9000 zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 15 error 86 zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 16 error 86 zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 15 error 86 retry #1 for read to raidz offset c03bf45800 zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 15 error 86 zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 16 error 86 zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 15 error 86 retry #1 for read to raidz offset 5902760f800 zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 15 error 86 zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 16 error 86 zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 17 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 14 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 15 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 16 error 86 zio 0xffffff000bcb5ac0 vdev offset 0 stage 17 error 86 zio 0xffffff0003f44000 vdev offset 0 stage 17 error 5 Error 86 seems to be ECKSUM, so I decided to disable checksumming and see what happened. To disable checksumming I edited /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c and just put a return 0 at the beginning of zio_checksum_error(). I tried to import again, and still didn't work. Only this time, zap_lookup() was returning ENOENT, which I imagine it means that ZFS cannot locate the root of the pool or something like that. Any ideas? From jmrueda at diatel.upm.es Tue Feb 3 04:40:51 2009 From: jmrueda at diatel.upm.es (=?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?=) Date: Tue Feb 3 04:40:58 2009 Subject: Raidz2 pool with single disk failure is faulted In-Reply-To: <4987ED81.6080008@diatel.upm.es> References: <49879C62.6070509@diatel.upm.es> <4987ED81.6080008@diatel.upm.es> Message-ID: <49883B45.3040606@diatel.upm.es> I solved the problem. This is how I did it, in case one day it prevents somebody from jumping in front of a train :-) First of all, I got some insight from various sites, mailing list archives, documents, etc. Among them, maybe these two were more helpful: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf I suspected that maybe my uberblock was somehow corrupted, and thought it would be worthwhile to rollback to an earlier uberblock. However, my pool was raidz2 and the examples I had seen about how to do this were with simple pools, so I tried a different approach, which in the end proved very successful: First, I added a couple of printf to vdev_uberblock_load_done(), which is in /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c: --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 +++ vdev_label.c 2009-02-03 13:14:52.000000000 +0100 @@ -659,10 +659,12 @@ if (zio->io_error == 0 && uberblock_verify(ub) == 0) { mutex_enter(&spa->spa_uberblock_lock); + printf("JMR: vdev_uberblock_load_done ub_txg=%qd ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); if (vdev_uberblock_compare(ub, ubbest) > 0) *ubbest = *ub; mutex_exit(&spa->spa_uberblock_lock); } + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); zio_buf_free(zio->io_data, zio->io_size); } After compiling and loading the zfs.ko module, I executed "zpool import" and these messages came up: ... JMR: vdev_uberblock_load_done ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254782 ub_timestamp=1233545533 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254781 ub_timestamp=1233545528 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254780 ub_timestamp=1233545523 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254779 ub_timestamp=1233545518 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254778 ub_timestamp=1233545513 ... JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 So, the uberblock with transaction group 4254783 was the most recent. I convinced ZFS to use an earlier one with this patch (note the second expression I added to the if statement): --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 +++ vdev_label.c 2009-02-03 13:25:43.000000000 +0100 @@ -659,10 +659,12 @@ if (zio->io_error == 0 && uberblock_verify(ub) == 0) { mutex_enter(&spa->spa_uberblock_lock); - if (vdev_uberblock_compare(ub, ubbest) > 0) + printf("JMR: vdev_uberblock_load_done ub_txg=%qd ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); + if (vdev_uberblock_compare(ub, ubbest) > 0 && ub->ub_txg < 4254783) *ubbest = *ub; mutex_exit(&spa->spa_uberblock_lock); } + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); zio_buf_free(zio->io_data, zio->io_size); } After compiling and loading the zfs.ko module, I executed "zpool import" and the pool was still faulted. So, I decremented the limit txg to "< 4254782" and this time the zpool came up as ONLINE. After crossing my fingers I executed "zpool import z1", and it worked ok. No data loss, everything back to normal. The only curious thing I've noticed is this: # zpool status pool: z1 state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Tue Feb 3 09:26:40 2009 config: NAME STATE READ WRITE CKSUM z1 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 mirror/gm0 ONLINE 0 0 0 mirror/gm1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 8076139616933977534 UNAVAIL 0 0 0 was /dev/da4 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 errors: No known data errors As you can see, the raidz2 vdev is marked as ONLINE, when I think it should be DEGRADED. Nevertheless, the pool is readable and writeable, and so far I haven't detected any problem. To be safe, I am extracting all the data and I will recreate the pool again from scratch, just in case. Pending questions: 1) Why did the "supposed corruption" happened in the first place? I advise people not to mix disks from different zpools with the same name in the same computer. That's what I did, and maybe it's what caused my problems. 2) Rolling back to an earlier uberblock seems to solve some faulted zpool problems. I think it would be interesting to have a program that let you do it in a user-friendly way (after warning you about the dangers, etc.). From morganw at chemikals.org Tue Feb 3 05:36:39 2009 From: morganw at chemikals.org (Wesley Morgan) Date: Tue Feb 3 05:37:13 2009 Subject: Raidz2 pool with single disk failure is faulted In-Reply-To: <49883B45.3040606@diatel.upm.es> References: <49879C62.6070509@diatel.upm.es> <4987ED81.6080008@diatel.upm.es> <49883B45.3040606@diatel.upm.es> Message-ID: On Tue, 3 Feb 2009, Javier Mart?n Rueda wrote: > I solved the problem. This is how I did it, in case one day it prevents > somebody from jumping in front of a train :-) > > First of all, I got some insight from various sites, mailing list archives, > documents, etc. Among them, maybe these two were more helpful: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html > http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf > > I suspected that maybe my uberblock was somehow corrupted, and thought it > would be worthwhile to rollback to an earlier uberblock. However, my pool was > raidz2 and the examples I had seen about how to do this were with simple > pools, so I tried a different approach, which in the end proved very > successful: > > First, I added a couple of printf to vdev_uberblock_load_done(), which is in > /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c: > > --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 > +++ vdev_label.c 2009-02-03 13:14:52.000000000 +0100 > @@ -659,10 +659,12 @@ > > if (zio->io_error == 0 && uberblock_verify(ub) == 0) { > mutex_enter(&spa->spa_uberblock_lock); > + printf("JMR: vdev_uberblock_load_done ub_txg=%qd > ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); > if (vdev_uberblock_compare(ub, ubbest) > 0) > *ubbest = *ub; > mutex_exit(&spa->spa_uberblock_lock); > } > + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd > ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); > > zio_buf_free(zio->io_data, zio->io_size); > } > > After compiling and loading the zfs.ko module, I executed "zpool import" and > these messages came up: > > ... > JMR: vdev_uberblock_load_done ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ub_txg=4254782 ub_timestamp=1233545533 > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ub_txg=4254781 ub_timestamp=1233545528 > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ub_txg=4254780 ub_timestamp=1233545523 > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ub_txg=4254779 ub_timestamp=1233545518 > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > JMR: vdev_uberblock_load_done ub_txg=4254778 ub_timestamp=1233545513 > ... > JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 > > So, the uberblock with transaction group 4254783 was the most recent. I > convinced ZFS to use an earlier one with this patch (note the second > expression I added to the if statement): > > --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 > +++ vdev_label.c 2009-02-03 13:25:43.000000000 +0100 > @@ -659,10 +659,12 @@ > > if (zio->io_error == 0 && uberblock_verify(ub) == 0) { > mutex_enter(&spa->spa_uberblock_lock); > - if (vdev_uberblock_compare(ub, ubbest) > 0) > + printf("JMR: vdev_uberblock_load_done ub_txg=%qd > ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); > + if (vdev_uberblock_compare(ub, ubbest) > 0 && ub->ub_txg < > 4254783) > *ubbest = *ub; > mutex_exit(&spa->spa_uberblock_lock); > } > + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd > ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); > > zio_buf_free(zio->io_data, zio->io_size); > } > > After compiling and loading the zfs.ko module, I executed "zpool import" and > the pool was still faulted. So, I decremented the limit txg to "< 4254782" > and this time the zpool came up as ONLINE. After crossing my fingers I > executed "zpool import z1", and it worked ok. No data loss, everything back > to normal. The only curious thing I've noticed is this: > > # zpool status > pool: z1 > state: ONLINE > status: One or more devices could not be used because the label is missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: resilver completed with 0 errors on Tue Feb 3 09:26:40 2009 > config: > > NAME STATE READ WRITE CKSUM > z1 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > mirror/gm0 ONLINE 0 0 0 > mirror/gm1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > 8076139616933977534 UNAVAIL 0 0 0 was /dev/da4 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > > errors: No known data errors > > As you can see, the raidz2 vdev is marked as ONLINE, when I think it should > be DEGRADED. Nevertheless, the pool is readable and writeable, and so far I > haven't detected any problem. To be safe, I am extracting all the data and I > will recreate the pool again from scratch, just in case. > > > Pending questions: > > 1) Why did the "supposed corruption" happened in the first place? I advise > people not to mix disks from different zpools with the same name in the same > computer. That's what I did, and maybe it's what caused my problems. > > 2) Rolling back to an earlier uberblock seems to solve some faulted zpool > problems. I think it would be interesting to have a program that let you do > it in a user-friendly way (after warning you about the dangers, etc.). > It would be interesting to see if the txid from all of your labels was the same. I would highly advise scrubbing your array. I believe the reason that your "da4" is showing up with only a uuid is because zfs is now recognizing that the da4 it sees is not the correct one. Still very curious how you ended up in that situation. I wonder if you had corruption that was unknown before you removed da4. From jmrueda at diatel.upm.es Tue Feb 3 05:41:28 2009 From: jmrueda at diatel.upm.es (=?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?=) Date: Tue Feb 3 05:41:34 2009 Subject: Raidz2 pool with single disk failure is faulted In-Reply-To: References: <49879C62.6070509@diatel.upm.es> <4987ED81.6080008@diatel.upm.es> <49883B45.3040606@diatel.upm.es> Message-ID: <49884979.7080103@diatel.upm.es> Wesley Morgan escribi?: > On Tue, 3 Feb 2009, Javier Mart?n Rueda wrote: > >> I solved the problem. This is how I did it, in case one day it >> prevents somebody from jumping in front of a train :-) >> >> First of all, I got some insight from various sites, mailing list >> archives, documents, etc. Among them, maybe these two were more helpful: >> >> http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html >> >> http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf >> >> I suspected that maybe my uberblock was somehow corrupted, and >> thought it would be worthwhile to rollback to an earlier uberblock. >> However, my pool was raidz2 and the examples I had seen about how to >> do this were with simple pools, so I tried a different approach, >> which in the end proved very successful: >> >> First, I added a couple of printf to vdev_uberblock_load_done(), >> which is in >> /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c: >> >> --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 >> +++ vdev_label.c 2009-02-03 13:14:52.000000000 +0100 >> @@ -659,10 +659,12 @@ >> >> if (zio->io_error == 0 && uberblock_verify(ub) == 0) { >> mutex_enter(&spa->spa_uberblock_lock); >> + printf("JMR: vdev_uberblock_load_done ub_txg=%qd >> ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); >> if (vdev_uberblock_compare(ub, ubbest) > 0) >> *ubbest = *ub; >> mutex_exit(&spa->spa_uberblock_lock); >> } >> + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd >> ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); >> >> zio_buf_free(zio->io_data, zio->io_size); >> } >> >> After compiling and loading the zfs.ko module, I executed "zpool >> import" and these messages came up: >> >> ... >> JMR: vdev_uberblock_load_done ub_txg=4254783 ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ub_txg=4254782 ub_timestamp=1233545533 >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ub_txg=4254781 ub_timestamp=1233545528 >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ub_txg=4254780 ub_timestamp=1233545523 >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ub_txg=4254779 ub_timestamp=1233545518 >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> JMR: vdev_uberblock_load_done ub_txg=4254778 ub_timestamp=1233545513 >> ... >> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 >> ub_timestamp=1233545538 >> >> So, the uberblock with transaction group 4254783 was the most recent. >> I convinced ZFS to use an earlier one with this patch (note the >> second expression I added to the if statement): >> >> --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 >> +++ vdev_label.c 2009-02-03 13:25:43.000000000 +0100 >> @@ -659,10 +659,12 @@ >> >> if (zio->io_error == 0 && uberblock_verify(ub) == 0) { >> mutex_enter(&spa->spa_uberblock_lock); >> - if (vdev_uberblock_compare(ub, ubbest) > 0) >> + printf("JMR: vdev_uberblock_load_done ub_txg=%qd >> ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); >> + if (vdev_uberblock_compare(ub, ubbest) > 0 && >> ub->ub_txg < 4254783) >> *ubbest = *ub; >> mutex_exit(&spa->spa_uberblock_lock); >> } >> + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd >> ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); >> >> zio_buf_free(zio->io_data, zio->io_size); >> } >> >> After compiling and loading the zfs.ko module, I executed "zpool >> import" and the pool was still faulted. So, I decremented the limit >> txg to "< 4254782" and this time the zpool came up as ONLINE. After >> crossing my fingers I executed "zpool import z1", and it worked ok. >> No data loss, everything back to normal. The only curious thing I've >> noticed is this: >> >> # zpool status >> pool: z1 >> state: ONLINE >> status: One or more devices could not be used because the label is >> missing or >> invalid. Sufficient replicas exist for the pool to continue >> functioning in a degraded state. >> action: Replace the device using 'zpool replace'. >> see: http://www.sun.com/msg/ZFS-8000-4J >> scrub: resilver completed with 0 errors on Tue Feb 3 09:26:40 2009 >> config: >> >> NAME STATE READ WRITE CKSUM >> z1 ONLINE 0 0 0 >> raidz2 ONLINE 0 0 0 >> mirror/gm0 ONLINE 0 0 0 >> mirror/gm1 ONLINE 0 0 0 >> da2 ONLINE 0 0 0 >> da3 ONLINE 0 0 0 >> 8076139616933977534 UNAVAIL 0 0 0 was /dev/da4 >> da5 ONLINE 0 0 0 >> da6 ONLINE 0 0 0 >> da7 ONLINE 0 0 0 >> >> errors: No known data errors >> >> As you can see, the raidz2 vdev is marked as ONLINE, when I think it >> should be DEGRADED. Nevertheless, the pool is readable and writeable, >> and so far I haven't detected any problem. To be safe, I am >> extracting all the data and I will recreate the pool again from >> scratch, just in case. >> >> >> Pending questions: >> >> 1) Why did the "supposed corruption" happened in the first place? I >> advise people not to mix disks from different zpools with the same >> name in the same computer. That's what I did, and maybe it's what >> caused my problems. >> >> 2) Rolling back to an earlier uberblock seems to solve some faulted >> zpool problems. I think it would be interesting to have a program >> that let you do it in a user-friendly way (after warning you about >> the dangers, etc.). >> > > > It would be interesting to see if the txid from all of your labels was > the same. I would highly advise scrubbing your array. I did a zdb -l in all the healthy disks, and all the labels (4 copies x 7 devices) were identical, except for the "guid" field at the beginning. That's the vdev's guid, so I think it's normal it's different for each disk. The txg field was identical in all of them. > > I believe the reason that your "da4" is showing up with only a uuid is > because zfs is now recognizing that the da4 it sees is not the correct > one. Still very curious how you ended up in that situation. I wonder > if you had corruption that was unknown before you removed da4. Definitely the current da4 has nothing to do with the zpool. First it belonged to a different zpool and later I zeroed the beginning and end. The GUID that is listed in "zpool status" is the same one that appears in the zpool labels for the old da4. I don't recall seeing any corruption before, and I scrubbed the pool from time to time. By the way, thinking again about this, the timestamp on the most recent uberblock was 6:32 CET, which also coincides with the time that the server froze, while the change of disks took place about 2-3 hours later. So, maybe the change of disks had nothing to do with all this after all. The disks are connected to a RAID controller, although they are exported in pass-through mode. From dan.cojocar at gmail.com Tue Feb 3 07:03:58 2009 From: dan.cojocar at gmail.com (Dan Cojocar) Date: Tue Feb 3 07:04:05 2009 Subject: zfs replace disk has failed Message-ID: Hello all, In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, after replacing the failed disk with a new one using: zpool replace tank ad1 I have noticed that the replace is taking too long and that the system is not responding, after restart the new disk was not recognized any more in bios :(, I have tested also in another box and the disk was not recognized there too. I have installed a new one on the same location (ad1 I think). Then the zpool status has reported something like this (this is from memory because I have made many changes back then, I don't remember exactly if the online disk was ad1 or ad2): zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing UNAVAIL 0 387 0 insufficient replicas 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 ad2 ONLINE 0 0 0 At this stage I was thinking that if I will attach the new disk (ad1) to the mirror I will get sufficient replicas to detach 9318348042598806923 (this one was the disk that has failed the second time), so I did an attach, after the resilvering process has completed with success, I had: zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing UNAVAIL 0 387 0 insufficient replicas 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 ad2 ONLINE 0 0 0 ad1 ONLINE 0 0 0 And I'm not able to detach 9318348042598806923 :(, and another bad news is that if I try to access something under /tank the operation is hanging, eg: if I do a ls /tank is freezing and if I do in another console: zpool status which was working before ls, now it's freezing too. What should I do next? Thanks, Dan From jh at saunalahti.fi Tue Feb 3 07:40:07 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Tue Feb 3 07:40:14 2009 Subject: kern/131009: System freezes when attempting to copy from one mounted (USB-disk-resident) ext2 filesystem to another Message-ID: <200902031540.n13Fe64U049194@freefall.freebsd.org> The following reply was made to PR kern/131009; it has been noted by GNATS. From: Jaakko Heinonen To: Donald Allen Cc: bug-followup@FreeBSD.org Subject: Re: kern/131009: System freezes when attempting to copy from one mounted (USB-disk-resident) ext2 filesystem to another Date: Tue, 3 Feb 2009 17:33:47 +0200 On 2009-01-26, Donald Allen wrote: > Thank you for the patch. Building a kernel is next on my agenda, and I > will install the fix when I do. Can you confirm that the patch fixed the problem for you? -- Jaakko From linimon at FreeBSD.org Tue Feb 3 08:13:12 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Tue Feb 3 08:13:18 2009 Subject: kern/131342: [nfs] mounting/unmounting of disks causes NFS to fail Message-ID: <200902031613.n13GD7Ll076863@freefall.freebsd.org> Synopsis: [nfs] mounting/unmounting of disks causes NFS to fail Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Feb 3 16:12:58 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=131342 From linimon at FreeBSD.org Tue Feb 3 08:40:36 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Tue Feb 3 08:40:44 2009 Subject: kern/131009: [ext2fs] [hang] System freezes when attempting to copy from one mounted (USB-disk-resident) ext2 filesystem to another Message-ID: <200902031640.n13GeZvO098410@freefall.freebsd.org> Synopsis: [ext2fs] [hang] System freezes when attempting to copy from one mounted (USB-disk-resident) ext2 filesystem to another State-Changed-From-To: open->closed State-Changed-By: linimon State-Changed-When: Tue Feb 3 16:39:25 UTC 2009 State-Changed-Why: Closed at submitter's request. http://www.freebsd.org/cgi/query-pr.cgi?pr=131009 From cattelan at thebarn.com Tue Feb 3 09:30:04 2009 From: cattelan at thebarn.com (Russell Cattelan) Date: Tue Feb 3 09:30:10 2009 Subject: kern/131084: [xfs] xfs destroys itself after copying data Message-ID: <200902031730.n13HU3sj030496@freefall.freebsd.org> The following reply was made to PR kern/131084; it has been noted by GNATS. From: Russell Cattelan To: bug-followup@FreeBSD.org, estellnb@gmail.com Cc: Subject: Re: kern/131084: [xfs] xfs destroys itself after copying data Date: Tue, 03 Feb 2009 10:53:25 -0600 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Write support is not anywhere close to finished. I have a much more modern version of XFS working with FreeBSD current but it also is very early stages of being able to write the and xfs filesystem. - -Russell Cattelan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJiHaFNRmM+OaGhBgRAkYHAJ9lnFSN64WSKTdOnn35y2+7DlPl5wCfb//H CCCrq6fRlHtlhTt7+3pX/UM= =hRXC -----END PGP SIGNATURE----- From gavin at FreeBSD.org Wed Feb 4 05:33:06 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Wed Feb 4 05:33:18 2009 Subject: kern/131356: [tmpfs][patch] unlink(2) on tmpfs removs wrong files with hard-links Message-ID: <200902041333.n14DX5xt091590@freefall.freebsd.org> Synopsis: [tmpfs][patch] unlink(2) on tmpfs removs wrong files with hard-links Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: gavin Responsible-Changed-When: Wed Feb 4 13:13:39 UTC 2009 Responsible-Changed-Why: Over to maintainer(s) http://www.freebsd.org/cgi/query-pr.cgi?pr=131356 From linimon at FreeBSD.org Wed Feb 4 08:13:39 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Feb 4 08:13:45 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902041613.n14GDckg012126@freefall.freebsd.org> Synopsis: [nfs] poor scaling behavior of the NFS server under load Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Feb 4 16:13:29 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=131360 From remko at FreeBSD.org Wed Feb 4 10:40:23 2009 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Wed Feb 4 10:40:30 2009 Subject: bin/131341: makefs: error "Bad file descriptor" on the mount point of md-presentation makefs image Message-ID: <200902041840.n14IeLqI022736@freefall.freebsd.org> Old Synopsis: error "Bad file descriptor" on the mount point of md-presentation makefs image New Synopsis: makefs: error "Bad file descriptor" on the mount point of md-presentation makefs image Responsible-Changed-From-To: freebsd-i386->freebsd-fs Responsible-Changed-By: remko Responsible-Changed-When: Wed Feb 4 18:40:08 UTC 2009 Responsible-Changed-Why: reassign to -fs http://www.freebsd.org/cgi/query-pr.cgi?pr=131341 From remko at FreeBSD.org Wed Feb 4 12:52:37 2009 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Wed Feb 4 12:52:42 2009 Subject: kern/131353: gjournal kernel lock Message-ID: <200902042052.n14KqaNM024682@freefall.freebsd.org> Synopsis: gjournal kernel lock Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: remko Responsible-Changed-When: Wed Feb 4 20:52:26 UTC 2009 Responsible-Changed-Why: reassign to -fs http://www.freebsd.org/cgi/query-pr.cgi?pr=131353 From admin at kedvenc.hu Thu Feb 5 12:52:12 2009 From: admin at kedvenc.hu (Joe7) Date: Thu Feb 5 12:52:37 2009 Subject: nfs delay causing broken files Message-ID: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> Hi there, I'd like to know how to tune NFS (or related kernel) settings so the NFS client writes would have NO delay? I'm getting application level errors as NFS client writes data, then application would read the data from NFS master just 1 second later and it's not there yet, so outdated data is read. Thanks for any ideas in avance, Joe From rmacklem at uoguelph.ca Fri Feb 6 08:55:44 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Feb 6 08:55:50 2009 Subject: nfs delay causing broken files In-Reply-To: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> References: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> Message-ID: On Thu, 5 Feb 2009, Joe7 wrote: > > Hi there, > > I'd like to know how to tune NFS (or related kernel) settings so the NFS > client writes would have NO delay? > I'm getting application level errors as NFS client writes data, then > application would read the data from NFS master just 1 second later and it's > not there yet, so outdated data is read. > You will have to enable synchronous writing (which will be a big performance hit, but...): - either mount with the "sync" option OR - have the apps open the files with O_SYNC (or O_FSYNC) If the recent writes need to be visible on other clients (and not just the NFS server), you will also have to bypass the client side caching for readers. This can be done by setting nfs_directio_enable to non-zero using sysctl and opening the files with O_DIRECT. (I think setting acregmax=0 as a mount option should achieve the same result, if you can't add O_DIRECT to the apps.) Again, a big performance hit, but... As an historical aside, way back when (before NFSv3) I concocked something callled not quite nfs, which included a cache coherency protocol, but it didn't catch on. Cache coherency seems to have never been a priority for the NFS crowd. Even in NFSv4, there isn't a cache coherency protocol, nor any interest in adding one. (In fairness, coherency between multiple clients can be achieved in NFSv4 for data by the clients putting byte range locks on all the byte ranges.) rick From rmacklem at uoguelph.ca Fri Feb 6 09:22:33 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Feb 6 09:22:41 2009 Subject: nfs delay causing broken files In-Reply-To: References: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> Message-ID: On Fri, 6 Feb 2009, Rick Macklem wrote: [stuff snipped] > > If the recent writes need to be visible on other clients (and not just the > NFS server), you will also have to bypass the client side caching for > readers. This can be done by setting nfs_directio_enable to non-zero > using sysctl and opening the files with O_DIRECT. (I think setting > acregmax=0 as a mount option should achieve the same result, if you can't > add O_DIRECT to the apps.) Again, a big performance hit, but... > I know it's weird to reply to my own post, but I realized I should clarify that setting acregmax=0 will only achieve this approximately, based on the clock resolution used for the file's modify time. Setting acregmax=0 should disable client side attribute caching, such that the client always does a Getattr against the server. Then, if the mtime attribute for the file has changed since the client cached data, it will be purged. As such, this only works when the mtime has changed and that will be based upon clock resolution (and if the mtime got saved on the server's disk, for the case where the server has crashed/rebooted). Have a good weekend, rick From admin at kedvenc.hu Fri Feb 6 09:27:21 2009 From: admin at kedvenc.hu (Joe7) Date: Fri Feb 6 09:27:29 2009 Subject: nfs delay causing broken files In-Reply-To: References: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> Message-ID: <20090206182718.dhsvxzt7r4sswos4@www.site.hu> Hi, Thank you for reply, but: I'm afraid sync mount is not an option: mount_nfs: -o sync: option not supported Tried acregmax=0 already, and that made no difference unfortunately. Application level O_DIRECT is not an option for us, so I'm wondering what can be done at mount level? thanks, Joe Id?zet (Rick Macklem ): > > > On Fri, 6 Feb 2009, Rick Macklem wrote: > > [stuff snipped] >> >> If the recent writes need to be visible on other clients (and not just the >> NFS server), you will also have to bypass the client side caching for >> readers. This can be done by setting nfs_directio_enable to non-zero >> using sysctl and opening the files with O_DIRECT. (I think setting >> acregmax=0 as a mount option should achieve the same result, if you can't >> add O_DIRECT to the apps.) Again, a big performance hit, but... >> > I know it's weird to reply to my own post, but I realized I should clarify > that setting acregmax=0 will only achieve this approximately, based on > the clock resolution used for the file's modify time. Setting acregmax=0 > should disable client side attribute caching, such that the client always > does a Getattr against the server. Then, if the mtime attribute for the > file has changed since the client cached data, it will be purged. As such, > this only works when the mtime has changed and that will be based upon > clock resolution (and if the mtime got saved on the server's disk, for the > case where the server has crashed/rebooted). > > Have a good weekend, rick From rmacklem at uoguelph.ca Fri Feb 6 10:36:43 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Feb 6 10:36:50 2009 Subject: nfs delay causing broken files In-Reply-To: <20090206182718.dhsvxzt7r4sswos4@www.site.hu> References: <20090205212511.jrmipmazeo4o0c8w@www.site.hu> <20090206182718.dhsvxzt7r4sswos4@www.site.hu> Message-ID: On Fri, 6 Feb 2009, Joe7 wrote: > Hi, > > Thank you for reply, but: > I'm afraid sync mount is not an option: > mount_nfs: -o sync: option not supported > > Tried acregmax=0 already, and that made no difference unfortunately. > Application level O_DIRECT is not an option for us, so I'm wondering what can > be done at mount level? > Oops, I saw that MNT_SYNCHRONOUS would set IO_SYNC, but didn't check to see if the mount would actually work. Sorry about that. Hmm, I think that, without O_DIRECT. you are SOL (shit outa luck:-). rick From rmacklem at uoguelph.ca Fri Feb 6 11:03:47 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Feb 6 11:03:53 2009 Subject: nfs delay causing broken files Message-ID: Well, maybe not SOL if you are willing to hack the kernel sources. If you change the following line in sys/nfsclient/nfs_bio.c as follows: if (nfs_directio_enable && (ioflag & IO_DIRECT) && vp->v_type == VREG) return nfs_directio_write(vp, uio, cred, ioflag); to if (nfs_directio_enable && vp->v_type == VREG) return nfs_directio_write(vp, uio, cred, ioflag); then all writes would be pushed to the server if nfs_directio_enable is set non-zero by sysctl. (In other words, O_DIRECT would be set for all opens on the NFS mounts.) Ugly, but if it fixes your problem...rick ps: I'm thinking that a mount option that does this might be useful? From admin at kedvenc.hu Fri Feb 6 11:37:48 2009 From: admin at kedvenc.hu (Joe7) Date: Fri Feb 6 11:37:58 2009 Subject: nfs delay causing broken files In-Reply-To: References: Message-ID: <20090206203745.57a035vq2so44ok0@www.site.hu> Okay, So although it's likely that kernel hack would do it, am I right that a wrapper script with open(...O_DIRECT) would do the job? Basicly i'm creating an file with imagemagick and wanna place that on the nfs server. So I assume if I create the file locally and copy over using a little script that uses open(.. O_DIRECT), it would just work? Application is PHP+imagemagick binary thus pretty high level compared to this stuff, but please let me know if you agree! Thanks Joe Id?zet (Rick Macklem ): > Well, maybe not SOL if you are willing to hack the kernel sources. If you > change the following line in sys/nfsclient/nfs_bio.c as follows: > if (nfs_directio_enable && (ioflag & IO_DIRECT) && vp->v_type == VREG) > return nfs_directio_write(vp, uio, cred, ioflag); > to > if (nfs_directio_enable && vp->v_type == VREG) > return nfs_directio_write(vp, uio, cred, ioflag); > > then all writes would be pushed to the server if nfs_directio_enable is > set non-zero by sysctl. (In other words, O_DIRECT would be set for all > opens on the NFS mounts.) > > Ugly, but if it fixes your problem...rick > ps: I'm thinking that a mount option that does this might be useful? From admin at kedvenc.hu Fri Feb 6 13:32:59 2009 From: admin at kedvenc.hu (Joe7) Date: Fri Feb 6 13:33:06 2009 Subject: nfs delay causing broken files Message-ID: <20090206223255.nx3wziczkwso0w4o@www.site.hu> So I tried, wrote a tenliner c script that copies the files ouF = open(argv[2], O_WRONLY | O_CREAT | O_DIRECT | O_SYNC) opening the target with that, but tested and the same problem occurs am I missing something obvious? Thanks for all your help, Joe Id?zet (Rick Macklem ): > > > On Fri, 6 Feb 2009, Joe7 wrote: > >> Okay, >> >> So although it's likely that kernel hack would do it, >> am I right that a wrapper script with open(...O_DIRECT) would do the job? >> >> Basicly i'm creating an file with imagemagick and wanna place that >> on the nfs server. >> So I assume if I create the file locally and copy over using a >> little script that uses open(.. O_DIRECT), it would just work? >> Application is PHP+imagemagick binary thus pretty high level >> compared to this stuff, but please let me know if you agree! >> > Sounds like it might work. Good luck with it, rick From martin at email.aon.at Sat Feb 7 07:40:06 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sat Feb 7 07:40:13 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902071540.n17Fe5nN062750@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sat, 7 Feb 2009 16:31:21 +0100 (CET) I am now very sure that this is an interaction with pppoa, and it is also worse than I originally thought: it will even lead to failed NFS transactions for the client. Here is what I have: Machine A ('server', a mini home server) does the following: - connecting to the Internet using usermode ppp over pppoa over an Alcatel ADSL modem - NFS serving FreeBSD sources Machine B does the following: - Mounting the FreeBSD sources from A (using amd), under directory /vol/SRC/FreeBSD/HEAD/src - Compiling the FreeBSD sources: make -j4 buildworld, such that the corresponding obj is local (via amd again) Especially in the first phase of the buildworld (clean, depend, obj), there is a lot of simultaneous NFS traffic from B to A. As soon as a download is started at A (going via pppoa, of course), the load on A rises to very high values (> 20 not uncommon). This may lead to B aborting the compile, it just did that with "directory not found". Both machines are running 7.1.0. No such problem happended when both were running 6.3.0. From martin at email.aon.at Sat Feb 7 08:40:07 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sat Feb 7 08:40:14 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902071640.n17Ge6oQ007703@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sat, 7 Feb 2009 17:32:15 +0100 (CET) o.k. pppoa does not have (much) to do with it... even when it is not running, the excessive load happens. From martin at email.aon.at Sat Feb 7 09:00:17 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sat Feb 7 09:00:57 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902071700.n17H0HJT021579@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sat, 7 Feb 2009 17:59:05 +0100 (CET) o.k. more info... I have this on the server (machine A in my previous post); it is showing quite a high number for "Server Ret-Failed": # nfsstat Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 2304000 0 221 165 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 22 0 0 Mknod Fsstat Fsinfo PathConf Commit 0 58 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 17 17 2304466 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 0 2304000 885499 221 0 0 0 0 BioRLHits Misses BioD Hits Misses DirE Hits Misses 532202 165 89 11 50 0 Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 8 19 3123468 82 34382 3156 19 11 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 11 0 0 9522 0 868616 Mknod Fsstat Fsinfo PathConf Commit 0 31396 23 0 147 Server Ret-Failed 3046139 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses 0 51 0 412 Server Write Gathering: WriteOps WriteRPC Opsaved 3156 3156 0 and this on the client (machine B): # nfsstat Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 48679649 0 3035823 58 26400 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 7255 0 767940 Mknod Fsstat Fsinfo PathConf Commit 0 27347 1 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 1 52543821 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 46870061 49414909 64135869 3035823 1144711 26389 0 0 BioRLHits Misses BioD Hits Misses DirE Hits Misses 8748412 58 43987 7218 25358 0 Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 0 0 0 Mknod Fsstat Fsinfo PathConf Commit 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses 0 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0 From peter at vk2pj.dyndns.org Sat Feb 7 12:46:53 2009 From: peter at vk2pj.dyndns.org (Peter Jeremy) Date: Sat Feb 7 12:47:00 2009 Subject: Unable to pwd in ZFS snapshot Message-ID: <20090207200918.GA58657@test71.vk2pj.dyndns.org> I'm running -current from late last year (just after the ZFS v13 import) and have found that I can't determine the current working directory inside a snapshot: # df -k /usr/ports/.zfs/snapshot/pre_7.4 Filesystem 1024-blocks Used Avail Capacity Mounted on tank/ports@pre_7.4 877019136 31293824 845725312 4% /usr/ports/.zfs/snap shot/pre_7.4 # cd /usr/ports/.zfs/snapshot/pre_7.4 # pwd pwd: .: No such file or directory # ls . .cvsignore accessibility dns math security CHANGES arabic editors mbone shells COPYRIGHT archivers emulators misc sysutils CVS astro finance multimedia textproc GIDs audio french net ukrainian INDEX-7 benchmarks ftp net-im vietnamese INDEX-8 biology games net-mgmt work KNOBS build.cvsupdate german net-p2p www LEGAL build.diffs graphics news x11 MOVED cad hebrew packages x11-clocks Makefile chinese hungarian palm x11-drivers Mk comms irc polish x11-fm README converters japanese ports-mgmt x11-fonts Templates databases java portuguese x11-servers Tools deskutils korean print x11-themes UIDs devel lang russian x11-toolkits UPDATING distfiles mail science x11-wm # This breaks (eg) make. I got around it by cloning the snapshot but this behaviour strikes me as counter-intuitive (and the error message leaves something to be desired). -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090207/5854e912/attachment.pgp From freebsd-listen at fabiankeil.de Sat Feb 7 13:25:38 2009 From: freebsd-listen at fabiankeil.de (Fabian Keil) Date: Sat Feb 7 13:25:45 2009 Subject: Unable to pwd in ZFS snapshot In-Reply-To: <20090207200918.GA58657@test71.vk2pj.dyndns.org> References: <20090207200918.GA58657@test71.vk2pj.dyndns.org> Message-ID: <20090207221202.1a19456a@fabiankeil.de> Peter Jeremy wrote: > I'm running -current from late last year (just after the ZFS v13 > import) and have found that I can't determine the current working > directory inside a snapshot: > # df -k /usr/ports/.zfs/snapshot/pre_7.4 > Filesystem 1024-blocks Used Avail Capacity Mounted on > tank/ports@pre_7.4 877019136 31293824 845725312 4% /usr/ports/.zfs/snap > shot/pre_7.4 > # cd /usr/ports/.zfs/snapshot/pre_7.4 > # pwd > pwd: .: No such file or directory I can reproduce this on: FreeBSD TP51.local 8.0-CURRENT FreeBSD 8.0-CURRENT #30: Sat Feb 7 19:37:07 CET 2009 fk@TP51.local:/usr/obj/usr/src/sys/THINKPAD i386 It seems to work with bash's builtin though: fk@TP51 /tank/privoxy/.zfs/snapshot/2009-02-07 $df -k . Filesystem 1024-blocks Used Avail Capacity Mounted on tank/privoxy@2009-02-07 3704832 45824 3659008 1% /tank/privoxy/.zfs/snapshot/2009-02-07 fk@TP51 /tank/privoxy/.zfs/snapshot/2009-02-07 $/bin/pwd pwd: .: No such file or directory fk@TP51 /tank/privoxy/.zfs/snapshot/2009-02-07 $pwd /tank/privoxy/.zfs/snapshot/2009-02-07 Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090207/0c6ec034/signature.pgp From morganw at chemikals.org Sat Feb 7 14:04:33 2009 From: morganw at chemikals.org (Wesley Morgan) Date: Sat Feb 7 14:04:40 2009 Subject: zfs replace disk has failed In-Reply-To: References: Message-ID: On Tue, 3 Feb 2009, Dan Cojocar wrote: > Hello all, > In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, > after replacing the failed disk with a new one using: > zpool replace tank ad1 > I have noticed that the replace is taking too long and that the system > is not responding, after restart the new disk was not recognized any > more in bios :(, I have tested also in another box and the disk was > not recognized there too. > I have installed a new one on the same location (ad1 I think). Then > the zpool status has reported something like this (this is from memory > because I have made many changes back then, I don't remember exactly > if the online disk was ad1 or ad2): > > zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > replacing UNAVAIL 0 387 0 > insufficient replicas > 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old > 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 > ad2 ONLINE 0 0 0 > At this stage I was thinking that if I will attach the new disk (ad1) > to the mirror I will get sufficient replicas to detach > 9318348042598806923 (this one was the disk that has failed the second > time), so I did an attach, after the resilvering process has completed > with success, I had: > zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > replacing UNAVAIL 0 387 0 > insufficient replicas > 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old > 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 > ad2 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > And I'm not able to detach 9318348042598806923 :(, and another bad > news is that if I try to access something under /tank the operation is > hanging, eg: if I do a ls /tank is freezing and if I do in another > console: zpool status which was working before ls, now it's freezing > too. > What should I do next? > Thanks, > Dan ZFS seems to fall over on itself if a disk replacement is interrupted and the replacement drive goes missing. By attaching the disk, you now have a 3-way mirror. The two possibilties for you would be to roll the array back to a previous txg, which I'm not at all sure would work, or to create a fake device the same size as the array devices and put a label on it that emulates the missing device, and you can then cancel the replacement. Once the replacement is cancelled, you should be able to remove the nonexistent device. Note, that the labels are all checksummed with sha256 so it's not a simple hex edit (unless you can calculate checksums by hand also!). If you send me the first 512k of either ad1 or ad2 (off-list of course), I can alter the labels to be the missing guids, and you can use md devices and sparse files to fool zpool. -- This .signature sanitized for your protection From martin at email.aon.at Sat Feb 7 23:50:03 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sat Feb 7 23:50:09 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902080750.n187o3kl026625@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sun, 8 Feb 2009 08:40:31 +0100 (CET) Yet more info... here is output from top. Also, the following just happened: - I am editing this mail on the NFS server. - together with the top output from below, I was pasting a total of 1000 lines (my XTerm scroll size). - This caused the load on this server to effectively double again (over the pasted values shown below). Basically, I can only continue editing this mail if I suspend the build on the client machine, in which case the server immediately becomes responsive again. So maybe it is not a pppoa interaction with NFS serving, but any load on the server + NFS server makes the load on the server go to insane values. Or may be it is just additional TCP load, because I am displaying this XTerm on the NFS client (where the X server is running), and all the pasting has to go via the X server's TCP connection. Also, I have the impression that as long as only one of the 8 nfsd's on the server is busy, things are mostly normal, but as soon as more than one starts doing work (as seen in the output below), the load on the server goes way up. And regarding "mostly normal": even if only one nfsd seems to be active, the load on the server is already close to one - assuming that an nfsd does not do much more than network and disk i/o this really should not be the case (and was not under 6.3, where the load was low even under quite heavy NFS i/o). So maybe it is a ULE problem, after all? last pid: 2527; load averages: 14.71, 10.36, 6.13 up 0+01:04:43 08:21:08 111 processes: 9 running, 102 sleeping CPU: 1.4% user, 0.0% nice, 90.5% system, 8.1% interrupt, 0.0% idle Mem: 135M Active, 745M Inact, 119M Wired, 1012K Cache, 112M Buf, 248M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 971 root 1 4 0 3128K 944K - 13:45 40.28% nfsd 972 root 1 4 0 3128K 944K - 2:19 15.09% nfsd 973 root 1 4 0 3128K 944K - 1:31 10.16% nfsd 974 root 1 4 0 3128K 944K - 1:03 6.05% nfsd 975 root 1 4 0 3128K 944K - 0:49 4.59% nfsd 977 root 1 4 0 3128K 944K - 0:41 3.56% nfsd 978 root 1 4 0 3128K 944K - 0:35 2.64% nfsd 976 root 1 4 0 3128K 944K - 0:31 1.81% nfsd 2527 root 1 96 0 3164K 992K RUN 0:00 1.54% rsh 1471 root 1 81 -15 5032K 2716K select 0:05 0.05% ppp 919 root 1 96 0 3128K 3148K select 2:16 0.00% amd 1539 root 1 96 0 6508K 4964K RUN 0:10 0.00% xterm 1140 squid 1 4 0 12000K 10152K sbwait 0:05 0.00% perl5.8.9 1130 squid 1 96 0 15660K 10820K RUN 0:05 0.00% squid 1141 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1142 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1143 squid 1 4 0 12000K 10104K sbwait 0:03 0.00% perl5.8.9 From martin at email.aon.at Sun Feb 8 00:10:07 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sun Feb 8 00:10:13 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902080810.n188A6oB044398@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sun, 8 Feb 2009 09:00:20 +0100 (CET) Sorry for the many posts... this is problem is really nagging me, and I need to clarify an error I made: sysctl kern.sched.name on the server reports '4BSD', so it's not ULE as I wrote in my previous posting. From martin at email.aon.at Sun Feb 8 00:20:05 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sun Feb 8 00:20:11 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902080820.n188K3st052428@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sun, 8 Feb 2009 09:12:46 +0100 (CET) Another top snapshot on the server, scenario is the following: - make -j4 buildworld running on the client, currently in "===> kerberos5/lib/libkadm5srv (all)" - on the server, in addition to serving NFS, also running a 'svn log' command, where the repository is also served by the server via http (httpd, subversion repo, and svn log all running on the server, displaying in an XTerm running on the client). If I do the 'svn log' without NFS load from the client (buildworld stopped), there is nearly instantaneous output. If I do it with a running buildworld on the client, I get the a top output similar to the one below (actually, it is very hard to capture the "worst" moments, as the server is so unresponsive - in fact the load was something like 25, and each of the 8 nfsds consumed about 10% of CPU). last pid: 2527; load averages: 14.71, 10.36, 6.13 up 0+01:04:43 08:21:08 111 processes: 9 running, 102 sleeping CPU: 1.4% user, 0.0% nice, 90.5% system, 8.1% interrupt, 0.0% idle Mem: 135M Active, 745M Inact, 119M Wired, 1012K Cache, 112M Buf, 248M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 971 root 1 4 0 3128K 944K - 13:45 40.28% nfsd 972 root 1 4 0 3128K 944K - 2:19 15.09% nfsd 973 root 1 4 0 3128K 944K - 1:31 10.16% nfsd 974 root 1 4 0 3128K 944K - 1:03 6.05% nfsd 975 root 1 4 0 3128K 944K - 0:49 4.59% nfsd 977 root 1 4 0 3128K 944K - 0:41 3.56% nfsd 978 root 1 4 0 3128K 944K - 0:35 2.64% nfsd 976 root 1 4 0 3128K 944K - 0:31 1.81% nfsd 2527 root 1 96 0 3164K 992K RUN 0:00 1.54% rsh 1471 root 1 81 -15 5032K 2716K select 0:05 0.05% ppp 919 root 1 96 0 3128K 3148K select 2:16 0.00% amd 1539 root 1 96 0 6508K 4964K RUN 0:10 0.00% xterm 1140 squid 1 4 0 12000K 10152K sbwait 0:05 0.00% perl5.8.9 1130 squid 1 96 0 15660K 10820K RUN 0:05 0.00% squid 1141 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1142 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1143 squid 1 4 0 12000K 10104K sbwait 0:03 0.00% perl5.8.9 From dan.cojocar at gmail.com Sun Feb 8 00:44:55 2009 From: dan.cojocar at gmail.com (Dan Cojocar) Date: Sun Feb 8 00:45:01 2009 Subject: zfs replace disk has failed In-Reply-To: References: Message-ID: On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan wrote: > On Tue, 3 Feb 2009, Dan Cojocar wrote: > >> Hello all, >> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, >> after replacing the failed disk with a new one using: >> zpool replace tank ad1 >> I have noticed that the replace is taking too long and that the system >> is not responding, after restart the new disk was not recognized any >> more in bios :(, I have tested also in another box and the disk was >> not recognized there too. >> I have installed a new one on the same location (ad1 I think). Then >> the zpool status has reported something like this (this is from memory >> because I have made many changes back then, I don't remember exactly >> if the online disk was ad1 or ad2): >> >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> At this stage I was thinking that if I will attach the new disk (ad1) >> to the mirror I will get sufficient replicas to detach >> 9318348042598806923 (this one was the disk that has failed the second >> time), so I did an attach, after the resilvering process has completed >> with success, I had: >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> ad1 ONLINE 0 0 0 >> And I'm not able to detach 9318348042598806923 :(, and another bad >> news is that if I try to access something under /tank the operation is >> hanging, eg: if I do a ls /tank is freezing and if I do in another >> console: zpool status which was working before ls, now it's freezing >> too. >> What should I do next? >> Thanks, >> Dan > > ZFS seems to fall over on itself if a disk replacement is interrupted and > the replacement drive goes missing. > > By attaching the disk, you now have a 3-way mirror. The two possibilties for > you would be to roll the array back to a previous txg, which I'm not at all > sure would work, or to create a fake device the same size as the array > devices and put a label on it that emulates the missing device, and you can > then cancel the replacement. Once the replacement is cancelled, you should > be able to remove the nonexistent device. Note, that the labels are all > checksummed with sha256 so it's not a simple hex edit (unless you can > calculate checksums by hand also!). > > If you send me the first 512k of either ad1 or ad2 (off-list of course), I > can alter the labels to be the missing guids, and you can use md devices and > sparse files to fool zpool. > Hello Wesley, This was a production server so I had to restore the mirror from the backup. Can you explain a bit how can someone alter the labels of a disk in a pool? Thanks, Dan From kib at FreeBSD.org Sun Feb 8 05:28:11 2009 From: kib at FreeBSD.org (kib@FreeBSD.org) Date: Sun Feb 8 05:28:18 2009 Subject: kern/131356: [tmpfs][patch] unlink(2) on tmpfs removs wrong files with hard-links Message-ID: <200902081328.n18DSBhA088380@freefall.freebsd.org> Synopsis: [tmpfs][patch] unlink(2) on tmpfs removs wrong files with hard-links Responsible-Changed-From-To: freebsd-fs->kib Responsible-Changed-By: kib Responsible-Changed-When: Sun Feb 8 13:27:57 UTC 2009 Responsible-Changed-Why: Take. http://www.freebsd.org/cgi/query-pr.cgi?pr=131356 From stb at lassitu.de Sun Feb 8 06:13:12 2009 From: stb at lassitu.de (Stefan Bethke) Date: Sun Feb 8 06:13:19 2009 Subject: zfs: using, then destroying a snapshot sometimes confuses zfs Message-ID: <76873DDF-D21B-48AF-9AFB-5A2747BE406B@lassitu.de> Sorry I can't be more precise at the moment, but while creating a script that mirrors some zfs filesystems to another machine, I've now twice gotten weird behaviour and then a panic. The script iterates over a couple of zfs file systems: - creates a snapshot with zfs snapshot tank/foo@mirror - uses rsync to copy the contents of the snapshot with rsync /tank/ foo/.zfs/snapshot/mirror/ dest:... - destroys the snapshot with zfs destroy tank/foo@mirror During testing the script, I twice got to a point where, after the snapshot was created without an error message, rsync dropped out with an error message similar to "invalid file handle" on /tank/foo/.zfs/ snapshot. At that point, I could cd to /tank/foo/.zfs, but ls produced the same error message. I then tried to unmount the snapshot with zfs umount, and got a panic (which I also didn't manage to capture). Is this a generally known issue, or should I try to capture more information when this happens again? I'm running with these loader variables on amd64: vfs.zfs.arc_max="512M" vfs.zfs.prefetch_disable="1" vfs.zfs.zil_disable="1" Stefan -- Stefan Bethke Fon +49 151 14070811 From morganw at chemikals.org Sun Feb 8 09:26:10 2009 From: morganw at chemikals.org (Wesley Morgan) Date: Sun Feb 8 09:26:17 2009 Subject: zfs replace disk has failed In-Reply-To: References: Message-ID: On Sun, 8 Feb 2009, Dan Cojocar wrote: > On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan wrote: >> On Tue, 3 Feb 2009, Dan Cojocar wrote: >> >>> Hello all, >>> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, >>> after replacing the failed disk with a new one using: >>> zpool replace tank ad1 >>> I have noticed that the replace is taking too long and that the system >>> is not responding, after restart the new disk was not recognized any >>> more in bios :(, I have tested also in another box and the disk was >>> not recognized there too. >>> I have installed a new one on the same location (ad1 I think). Then >>> the zpool status has reported something like this (this is from memory >>> because I have made many changes back then, I don't remember exactly >>> if the online disk was ad1 or ad2): >>> >>> zpool status >>> pool: tank >>> state: DEGRADED >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tank DEGRADED 0 0 0 >>> mirror DEGRADED 0 0 0 >>> replacing UNAVAIL 0 387 0 >>> insufficient replicas >>> 10193841952954445329 REMOVED 0 0 0 was >>> /dev/ad1/old >>> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >>> ad2 ONLINE 0 0 0 >>> At this stage I was thinking that if I will attach the new disk (ad1) >>> to the mirror I will get sufficient replicas to detach >>> 9318348042598806923 (this one was the disk that has failed the second >>> time), so I did an attach, after the resilvering process has completed >>> with success, I had: >>> zpool status >>> pool: tank >>> state: DEGRADED >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tank DEGRADED 0 0 0 >>> mirror DEGRADED 0 0 0 >>> replacing UNAVAIL 0 387 0 >>> insufficient replicas >>> 10193841952954445329 REMOVED 0 0 0 was >>> /dev/ad1/old >>> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >>> ad2 ONLINE 0 0 0 >>> ad1 ONLINE 0 0 0 >>> And I'm not able to detach 9318348042598806923 :(, and another bad >>> news is that if I try to access something under /tank the operation is >>> hanging, eg: if I do a ls /tank is freezing and if I do in another >>> console: zpool status which was working before ls, now it's freezing >>> too. >>> What should I do next? >>> Thanks, >>> Dan >> >> ZFS seems to fall over on itself if a disk replacement is interrupted and >> the replacement drive goes missing. >> >> By attaching the disk, you now have a 3-way mirror. The two possibilties for >> you would be to roll the array back to a previous txg, which I'm not at all >> sure would work, or to create a fake device the same size as the array >> devices and put a label on it that emulates the missing device, and you can >> then cancel the replacement. Once the replacement is cancelled, you should >> be able to remove the nonexistent device. Note, that the labels are all >> checksummed with sha256 so it's not a simple hex edit (unless you can >> calculate checksums by hand also!). >> >> If you send me the first 512k of either ad1 or ad2 (off-list of course), I >> can alter the labels to be the missing guids, and you can use md devices and >> sparse files to fool zpool. >> > > Hello Wesley, > This was a production server so I had to restore the mirror from the backup. > Can you explain a bit how can someone alter the labels of a disk in a pool? > Thanks, > Dan > As far as I know there is no tool available to interactively edit a label, although since the source code that defines the labels and the data within is available it should be possible to write. For devices in the same pool, they should all have nearly identical labels, differing only in the actual guid for the device itself. In my situation, I simply altered the guid with a hex editor and borrowed the zfs sha256 code to write the correct checksum to the label and using gvirstor (md probably would have worked as well) was able to cancel the failed replacement. From toasty at dragondata.com Sun Feb 8 21:16:20 2009 From: toasty at dragondata.com (Kevin Day) Date: Sun Feb 8 21:16:27 2009 Subject: zio->io_cv deadlock Message-ID: <8E12CEFC-25DE-4B82-97BD-7ED717650089@dragondata.com> I'm playing with a -CURRENT install from a couple of weeks ago. Everything seems okay for a few days, then eventually every process ends up stuck in zio->io_cv. If I go to the console, it's responsive until I try logging in, then login is stuck in zio->io_cv as well. Ctrl-Alt-Esc drops me into ddb, but then ddb hangs instantly. Nothing on the console or syslog before it hangs. Anyone seen anything similar? -- Kevin Possibly relevant info: 8 core Opteron 64GB RAM da1 at twa0 bus 0 target 0 lun 1 da1: Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 4678158MB (9580867585 512 byte sectors: 255H 63S/T 596381C) server5# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT z 4.44T 1.19T 3.25T 26% ONLINE - server5# zpool status -v pool: z state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM z ONLINE 0 0 0 da1 ONLINE 0 0 0 errors: No known data errors server5# cat /boot/loader.conf vm.kmem_size_max="2048M" vm.kmem_size="2048M" vfs.zfs.arc_max="100M" zfs_load="YES" vfs.root.mountfrom="zfs:z" (tried lowering arc_max, didn't seem to help) From bugmaster at FreeBSD.org Mon Feb 9 03:06:51 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Feb 9 03:07:54 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200902091106.n19B6o7r009094@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131353 fs gjournal kernel lock o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs] [zfs] [panic] NFS v3 Panic when under high load o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po 42 problems total. From avg at icyb.net.ua Mon Feb 9 05:40:10 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Mon Feb 9 05:40:16 2009 Subject: nfs umount soft hang In-Reply-To: <498AF8E1.7020206@icyb.net.ua> References: <498AF8E1.7020206@icyb.net.ua> Message-ID: <49902DF2.8050206@icyb.net.ua> on 05/02/2009 16:34 Andriy Gapon said the following: > I have an NFS server and NFS client separated by a firewall. Both > servers are FreeBSD 7.1. > > Server configuration: > nfs_server_enable="YES" > nfs_server_flags="-t -n 4" > rpcbind_enable="YES" > mountd_flags="-r -p 737" > mountd_enable="YES" > > The firewall allows tcp and udp to port 111, but only tcp to ports 2049 > and 737 (configured for mountd, see above). > > On the client I use e.g. the following command for mounting: > mount -t nfs -o nfsv3,tcp,intr,rdirplus,-r=32768,-w=32768 > XXXX:/export/usr/obj /usr/obj > > Mounting and subsequent fs operations work flawlessly. > > When I unmount umount command hangs but can be interrupted with ^C. > Everything seems to be clean after that - the filesystem is unmounted, > there are no post-effects on both client and server. I think this is it: 377 /* 378 * Report to mountd-server which nfsname 379 * has been unmounted. 380 */ 381 if (ai != NULL && !(fflag & MNT_FORCE) && do_rpc) { 382 clp = clnt_create(hostp, RPCPROG_MNT, RPCMNT_VER1, "udp"); I wonder if umount could be smarter as to whether use udp or tcp here. -- Andriy Gapon From jh at saunalahti.fi Mon Feb 9 07:55:26 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Mon Feb 9 07:55:44 2009 Subject: Unable to pwd in ZFS snapshot In-Reply-To: <20090207200918.GA58657@test71.vk2pj.dyndns.org> References: <20090207200918.GA58657@test71.vk2pj.dyndns.org> Message-ID: <20090209155521.GA3418@a91-153-125-115.elisa-laajakaista.fi> Hi, On 2009-02-08, Peter Jeremy wrote: > I'm running -current from late last year (just after the ZFS v13 > import) and have found that I can't determine the current working > directory inside a snapshot: getcwd(3) first tries __getcwd() system call but it always fails because the VFS name cache is not supported for .zfs control directories. Secondly getcwd(3) tries to resolve working directory by traversing the directory tree to root but this fails too because the .zfs directory is normally hidden from the directory listing. (getcwd(3) uses readdir(3) to find component names) > This breaks (eg) make. I got around it by cloning the snapshot but > this behaviour strikes me as counter-intuitive (and the error message > leaves something to be desired). You can also work it around by making the ".zfs" directory visible: zfs set snapdir=visible volume -- Jaakko From rwatson at FreeBSD.org Mon Feb 9 09:56:17 2009 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Feb 9 09:56:27 2009 Subject: Unable to pwd in ZFS snapshot In-Reply-To: <20090209155521.GA3418@a91-153-125-115.elisa-laajakaista.fi> References: <20090207200918.GA58657@test71.vk2pj.dyndns.org> <20090209155521.GA3418@a91-153-125-115.elisa-laajakaista.fi> Message-ID: On Mon, 9 Feb 2009, Jaakko Heinonen wrote: > On 2009-02-08, Peter Jeremy wrote: >> I'm running -current from late last year (just after the ZFS v13 import) >> and have found that I can't determine the current working directory inside >> a snapshot: > > getcwd(3) first tries __getcwd() system call but it always fails because the > VFS name cache is not supported for .zfs control directories. Secondly > getcwd(3) tries to resolve working directory by traversing the directory > tree to root but this fails too because the .zfs directory is normally > hidden from the directory listing. (getcwd(3) uses readdir(3) to find > component names) Now that we have a new VOP to assist in reverse-name resolution, it could be that ZFS could provide the back-end lookup to address this issue without forcing the use of the namecache for things we don't want to cache. Robert N M Watson Computer Laboratory University of Cambridge From nbari at k9.cx Mon Feb 9 23:10:14 2009 From: nbari at k9.cx (Nicolas de Bari Embriz Garcia Rojas) Date: Mon Feb 9 23:10:20 2009 Subject: GEOM: mfid1: corrupt or invalid GPT detected. GEOM: mfid1: GPT rejected -- may not be recoverable. Message-ID: <5EE9210C-B4C3-4579-BD33-A6C3CA392190@k9.cx> I have a dell poweredge 2900 III with a Dell PERC 6 Megaraid SAS driver Ver 3.00 On a raid1 (bootable) I have freebsd 7.1 latest stable version with default partion scheme and UFS, and on the raid5 I have a '/tank' partition made with ZFS The system works fine the only problem is that i keep geting this messages. GEOM: mfid1: corrupt or invalid GPT detected. GEOM: mfid1: GPT rejected -- may not be recoverable. Any idea on how to solve this or at least stop login that msg ? regards. -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 163 bytes Desc: This is a digitally signed message part Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090210/c72e9d78/PGP.pgp From jh at saunalahti.fi Tue Feb 10 08:52:10 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Tue Feb 10 08:52:22 2009 Subject: Unable to pwd in ZFS snapshot In-Reply-To: References: <20090207200918.GA58657@test71.vk2pj.dyndns.org> <20090209155521.GA3418@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <20090210165204.GA4300@a91-153-125-115.elisa-laajakaista.fi> Hi, On 2009-02-09, Robert Watson wrote: > Now that we have a new VOP to assist in reverse-name resolution, it could be > that ZFS could provide the back-end lookup to address this issue without > forcing the use of the namecache for things we don't want to cache. I think that a bigger problem is how __getcwd() works. If single path component lookup fails from cache or with VOP_VPTOCNP __getwcd() will abort. So even if ZFS supported VOP_VPTOCNP perfectly some path components may be on a file system which doesn't and the hidden ".zfs" prevents userspace traversal from succeeding. Actually ZFS caches the hidden ".zfs" directory (but nothing below it). Thus if __getcwd() reverted to readdir scan only for those components which really require it getcwd(3) should work right now (as long as ".zfs" is in cache). Looks like someone has tried to do this already: http://lists.freebsd.org/pipermail/freebsd-current/2004-May/027020.html The patch doesn't apply against head anymore. -- Jaakko From kostikbel at gmail.com Tue Feb 10 09:33:36 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Feb 10 09:33:44 2009 Subject: Unable to pwd in ZFS snapshot In-Reply-To: <20090210165204.GA4300@a91-153-125-115.elisa-laajakaista.fi> References: <20090207200918.GA58657@test71.vk2pj.dyndns.org> <20090209155521.GA3418@a91-153-125-115.elisa-laajakaista.fi> <20090210165204.GA4300@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <20090210173329.GA62256@deviant.kiev.zoral.com.ua> On Tue, Feb 10, 2009 at 06:52:05PM +0200, Jaakko Heinonen wrote: > > Hi, > > On 2009-02-09, Robert Watson wrote: > > Now that we have a new VOP to assist in reverse-name resolution, it could be > > that ZFS could provide the back-end lookup to address this issue without > > forcing the use of the namecache for things we don't want to cache. > > I think that a bigger problem is how __getcwd() works. If single path > component lookup fails from cache or with VOP_VPTOCNP __getwcd() will > abort. So even if ZFS supported VOP_VPTOCNP perfectly some path > components may be on a file system which doesn't and the hidden ".zfs" > prevents userspace traversal from succeeding. > > Actually ZFS caches the hidden ".zfs" directory (but nothing below it). > Thus if __getcwd() reverted to readdir scan only for those components > which really require it getcwd(3) should work right now (as long as > ".zfs" is in cache). > > Looks like someone has tried to do this already: > > http://lists.freebsd.org/pipermail/freebsd-current/2004-May/027020.html > > The patch doesn't apply against head anymore. It is being worked on right now. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090210/13303464/attachment.pgp From avg at icyb.net.ua Tue Feb 10 10:35:18 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Feb 10 10:35:33 2009 Subject: [repost] multiple filesystems sharing/clobbering device vnode Message-ID: <4991C8DF.1020805@icyb.net.ua> Unfortunately I wasn't able to devote enough time/thinking to this issue, so I am cowardly resorting to just reminding about it. -------- Original Message -------- Subject: multiple filesystems sharing/clobbering device vnode Date: Sat, 01 Mar 2008 11:33:37 +0200 From: Andriy Gapon To: freebsd-arch@freebsd.org First, a little demonstration suggested by Bruce Evance: [I hope you will continue reading after reboot] 1. mount_cd9660 /dev/acd0 /mnt1 2. mount -r /dev/acd0 /mnt2 # -r is important 3. ls -l /mnt1 The issue can be laconically described as follows: 1. We do not disallow multiple RO mounts of the same device (which could be done either on purpose or by an accident). 2. All popular (on-disk) filesystems use/clobber bufobj of device's vnode, even for RO mounts; some (ufs) do that even if mount fails. 3. There are no considerations for such a shared access, all filesystems act as if it is an exclusive owner of the vnode / its bufobj. Small snippet of code that speaks for itself (the most interesting lines are marked with XXX at the beginning): int g_vfs_open(struct vnode *vp, struct g_consumer **cpp, const char *fsname, int wr) { struct g_geom *gp; struct g_provider *pp; struct g_consumer *cp; struct bufobj *bo; int vfslocked; int error; g_topology_assert(); *cpp = NULL; pp = g_dev_getprovider(vp->v_rdev); if (pp == NULL) return (ENOENT); gp = g_new_geomf(&g_vfs_class, "%s.%s", fsname, pp->name); cp = g_new_consumer(gp); g_attach(cp, pp); error = g_access(cp, 1, wr, 1); if (error) { g_wither_geom(gp, ENXIO); return (error); } vfslocked = VFS_LOCK_GIANT(vp->v_mount); vnode_create_vobject(vp, pp->mediasize, curthread); VFS_UNLOCK_GIANT(vfslocked); *cpp = cp; XXX bo = &vp->v_bufobj; XXX bo->bo_ops = g_vfs_bufops; XXX bo->bo_private = cp; XXX bo->bo_bsize = pp->sectorsize; gp->softc = bo; return (error); } In addition to this, some filesystems (ufs) directly modify v_bufobj. I've been pondering this issue for over a month now, I have some ideas but they all are wanting in one aspect or other. I would like to hear ideas and opinions of the people on this list. P.S. for those who didn't actually run the test, here's a hand-copied excerpt from stack trace: g_io_request g_vfs_strategy ffs_geom_strategy cd9660_strategy VOP_STRATEGY_APV bufstrategy breadn bread cd9660_readdir -- Andriy Gapon From dan.cojocar at gmail.com Tue Feb 10 22:59:46 2009 From: dan.cojocar at gmail.com (Dan Cojocar) Date: Tue Feb 10 22:59:54 2009 Subject: zfs page fault Message-ID: Hello, I have found this morning that one of my systems was not responding. After attaching a monitor I have found that the system has encountered a page fault. Here is the bt: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4c fault code = supervisor write, protection violation instruction pointer = 0x20:0x8051ba96 stack pointer = 0x28:0x83a8db7c frame pointer = 0x28:0x83a8db94 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 18 (vnlru) Physical memory: 1015 MB Dumping 273 MB: 258 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2 #0 doadump () at pcpu.h:246 246 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:246 #1 0x80477e99 in db_fncall (dummy1=-2086086336, dummy2=0, dummy3=-2139548352, dummy4=0x83a8d920 "?yI\200\200??\203") at /usr/src/sys/ddb/db_command.c:548 #2 0x80478291 in db_command (last_cmdp=0x8073531c, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 #3 0x804783ea in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0x8047a23c in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229 #5 0x8053df03 in kdb_trap (type=12, code=0, tf=0x83a8db3c) at /usr/src/sys/kern/subr_kdb.c:534 #6 0x806a804f in trap_fatal (frame=0x83a8db3c, eva=76) at /usr/src/sys/i386/i386/trap.c:920 #7 0x806a8310 in trap_pfault (frame=0x83a8db3c, usermode=0, eva=76) at /usr/src/sys/i386/i386/trap.c:842 #8 0x806a8ce5 in trap (frame=0x83a8db3c) at /usr/src/sys/i386/i386/trap.c:522 #9 0x8069062b in calltrap () at /usr/src/sys/i386/i386/exception.s:165 #10 0x8051ba96 in _sx_xlock (sx=0x3c, opts=0, file=0x80969370 "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c", line=1807) at atomic.h:153 #11 0x80899235 in dmu_buf_update_user () from /boot/kernel/zfs.ko #12 0x808f0a53 in zfs_znode_dmu_fini () from /boot/kernel/zfs.ko #13 0x809136d6 in zfs_freebsd_reclaim () from /boot/kernel/zfs.ko #14 0x806b3132 in VOP_RECLAIM_APV (vop=0x80975580, a=0x83a8dc30) at vnode_if.c:1619 #15 0x80591b22 in vgonel (vp=0x874c4d9c) at vnode_if.h:830 #16 0x80596f13 in vnlru_free (count=245) at /usr/src/sys/kern/vfs_subr.c:899 #17 0x8059759e in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:768 #18 0x804f0f4f in fork_exit (callout=0x80597500 , arg=0x0, frame=0x83a8dd38) at /usr/src/sys/kern/kern_fork.c:821 #19 0x806906a0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 (kgdb) If someone needs more details I'm here. Thanks, Dan From peterjeremy at optushome.com.au Wed Feb 11 10:07:53 2009 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Wed Feb 11 10:08:01 2009 Subject: zio->io_cv deadlock In-Reply-To: <8E12CEFC-25DE-4B82-97BD-7ED717650089@dragondata.com> References: <8E12CEFC-25DE-4B82-97BD-7ED717650089@dragondata.com> Message-ID: <20090211180743.GC1467@server.vk2pj.dyndns.org> On 2009-Feb-08 22:59:31 -0600, Kevin Day wrote: > > I'm playing with a -CURRENT install from a couple of weeks ago. Everything > seems okay for a few days, then eventually every process ends up stuck in > zio->io_cv. If I go to the console, it's responsive until I try logging in, > then login is stuck in zio->io_cv as well. Ctrl-Alt-Esc drops me into ddb, > but then ddb hangs instantly. I think I've seen this as well, though I can't be sure because X.org 7.4 had trashed my console output. Definitely, I could enter DDB but the crashdump I requested (blind) never appeared. I don't believe I was doing anything unusual (other than trying to use X.org 7.4). -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090211/548890e2/attachment.pgp From glz at hidden-powers.com Thu Feb 12 02:25:28 2009 From: glz at hidden-powers.com (Goran Lowkrantz) Date: Thu Feb 12 02:25:36 2009 Subject: zio->io_cv deadlock In-Reply-To: <20090211180743.GC1467@server.vk2pj.dyndns.org> References: <8E12CEFC-25DE-4B82-97BD-7ED717650089@dragondata.com> <20090211180743.GC1467@server.vk2pj.dyndns.org> Message-ID: --On February 12, 2009 5:07:43 +1100 Peter Jeremy wrote: > On 2009-Feb-08 22:59:31 -0600, Kevin Day wrote: >> >> I'm playing with a -CURRENT install from a couple of weeks ago. >> Everything seems okay for a few days, then eventually every process >> ends up stuck in zio->io_cv. If I go to the console, it's responsive >> until I try logging in, then login is stuck in zio->io_cv as well. >> Ctrl-Alt-Esc drops me into ddb, but then ddb hangs instantly. > > I think I've seen this as well, though I can't be sure because X.org 7.4 > had trashed my console output. Definitely, I could enter DDB but the > crashdump I requested (blind) never appeared. I don't believe I was > doing anything unusual (other than trying to use X.org 7.4). > I see this now and then in single-user mode during make installworld, ezjail-admin update -i or in multi-user when building a system in the background and updating ports at the same time. I have an USB keyboard and have not been able to break into the debugger with that, have legacy mode enabled in the bios (award). Anyone know how to send break from USB tgb? /glz From delphij at delphij.net Fri Feb 13 00:39:12 2009 From: delphij at delphij.net (Xin LI) Date: Fri Feb 13 00:39:18 2009 Subject: patch: let msdosfs(vfat)/ntfs to support UTF-8 locale well In-Reply-To: <20090213001350.52470f39.ota@j.email.ne.jp> References: <98869b7c0902100112s6dae54bm4c14487076ceb75c@mail.gmail.com> <20090212183440.GA1446@tops> <20090213001350.52470f39.ota@j.email.ne.jp> Message-ID: <499531A4.3020308@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 (cc'ed to freebsd-fs@) I think it's important that someone familiar with the code review and evaluate the current patches and commit it against -HEAD... MSDOSFS patch (against 7.1): http://btload.googlegroups.com/web/msdosfs.patch?gda=MzIscT8AAABs_gmy4a1S9lRiXjEy-V5OpwtI67JnIGlz0zr18tjObOtoi5oIt3BJMRGeqGBbbj-ccyFKn-rNKC-d1pM_IdV0 NTFS patch: http://btload.googlegroups.com/web/ntfs.patch?gda=OqsHoDwAAABs_gmy4a1S9lRiXjEy-V5O7RN7t-m4MjZ-5dQn_EvaqDVCWO9_HyYEQJyRQYPtRCL9Wm-ajmzVoAFUlE7c_fAt Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (FreeBSD) iEUEARECAAYFAkmVMaQACgkQi+vbBBjt66DN+wCghJbOUO7IfEwt5gFOB01uAAe1 NLwAmOQXPJsB+lT7o5MMk16Ck6eUJrQ= =ZGMA -----END PGP SIGNATURE----- From scjamorim at bsd.com.br Sat Feb 14 04:53:50 2009 From: scjamorim at bsd.com.br (=?ISO-8859-1?Q?Sylvio_C=E9sar_Teixeira_Amorim?=) Date: Sat Feb 14 04:53:56 2009 Subject: Pendrive 8G+CAM_REQ_CMP_ERR Message-ID: <5859850b0902140430r585bf77fn6d70c3ce79a0c439@mail.gmail.com> Hello everyone, Gentlemen, I wonder if someone passed by the following problem, I have 3 Pendrive, 2 are 1, 1G and 8G, I'm using FreeBSD-7.1-stable, the problem is when you connect to 8G, the fbsd detects the device, da0, etc, but not create the / dev/da0 takes us about 10min trying to create this device, only appears after / dev/da0 and various error messages such as: IOERROR, CAM_REQ_CMP_ERR and not mounted. The 8G are of FAT32 and to recognize and are usually mounted in Linux, usually the mount of 1G in FreeBSD, but the filesystem is FAT16. Someone went through this problem? -- -=-=-=-=-=-=-=- Live free or die - UNIX* -=-=-=-=-=-=-= From zbeeble at gmail.com Sat Feb 14 11:29:20 2009 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Sat Feb 14 11:29:26 2009 Subject: When does the pool get bigger? Message-ID: <5f67a8c40902141100w406b0a73h7cf487369e15ec8f@mail.gmail.com> I have a ZFS raid-Z array (FreeBSD-7.1p2) that I use for storing backups and media. I'm keenly awaiting the MFC of the ZFS v13 code, but I'm not in a hurry to run -CURRENT on this box. Anyways... The array was 5x 750G drives and I decided to upgrade to 5x 1.5T drives. I removed one 750G drive and inserted a 1.5T drive each time. All 5 are done resilvering now. When does the pool get bigger? The resilver of the last drive has finished, but the pool still reads [1:20:320]root@virtual:/usr/local/etc> zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT vr2 3.41T 3.16T 251G 92% ONLINE - ... which is the size with 750G drives. From ticso at cicely7.cicely.de Sat Feb 14 12:58:30 2009 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Sat Feb 14 12:58:43 2009 Subject: When does the pool get bigger? In-Reply-To: <5f67a8c40902141100w406b0a73h7cf487369e15ec8f@mail.gmail.com> References: <5f67a8c40902141100w406b0a73h7cf487369e15ec8f@mail.gmail.com> Message-ID: <20090214203919.GV84964@cicely7.cicely.de> On Sat, Feb 14, 2009 at 02:00:23PM -0500, Zaphod Beeblebrox wrote: > I have a ZFS raid-Z array (FreeBSD-7.1p2) that I use for storing backups and > media. I'm keenly awaiting the MFC of the ZFS v13 code, but I'm not in a > hurry to run -CURRENT on this box. > > Anyways... The array was 5x 750G drives and I decided to upgrade to 5x 1.5T > drives. I removed one 750G drive and inserted a 1.5T drive each time. All > 5 are done resilvering now. > > When does the pool get bigger? The resilver of the last drive has finished, > but the pool still reads > > [1:20:320]root@virtual:/usr/local/etc> zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > vr2 3.41T 3.16T 251G 92% ONLINE - > > ... which is the size with 750G drives. You need to export/import the pool once. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From stb at lassitu.de Sun Feb 15 02:40:00 2009 From: stb at lassitu.de (Stefan Bethke) Date: Sun Feb 15 02:40:07 2009 Subject: zfs: using, then destroying a snapshot sometimes panics zfs In-Reply-To: <76873DDF-D21B-48AF-9AFB-5A2747BE406B@lassitu.de> References: <76873DDF-D21B-48AF-9AFB-5A2747BE406B@lassitu.de> Message-ID: <3A302EE1-F54D-4415-BC13-CA8ABBA320EC@lassitu.de> Am 08.02.2009 um 14:37 schrieb Stefan Bethke: > Sorry I can't be more precise at the moment, but while creating a > script that mirrors some zfs filesystems to another machine, I've > now twice gotten weird behaviour and then a panic. > > The script iterates over a couple of zfs file systems: > - creates a snapshot with zfs snapshot tank/foo@mirror > - uses rsync to copy the contents of the snapshot with rsync /tank/ > foo/.zfs/snapshot/mirror/ dest:... > - destroys the snapshot with zfs destroy tank/foo@mirror > > During testing the script, I twice got to a point where, after the > snapshot was created without an error message, rsync dropped out > with an error message similar to "invalid file handle" on /tank/ > foo/.zfs/snapshot. > > At that point, I could cd to /tank/foo/.zfs, but ls produced the > same error message. > > I then tried to unmount the snapshot with zfs umount, and got a > panic (which I also didn't manage to capture). > > Is this a generally known issue, or should I try to capture more > information when this happens again? # cd /tank/foo/.zfs # ls -l ls: snapshot: Bad file descriptor total 0 # cd snapshot -su: cd: snapshot: Not a directory I currently have no snapshots: # zfs list -t snapshot no datasets available However, on a different file system, I can list and cd into snapshot: # /tank/bar/.zfs # ls -l total 0 dr-xr-xr-x 2 root wheel 2 Feb 8 00:43 snapshot/ # cd snapshot Trying to umount produces a panic: # zfs umount /jail/foo Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xa8 fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffffff802ee565 stack pointer = 0x10:0xfffffffea29c39e0 frame pointer = 0x10:0xfffffffea29c39f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 51383 (zfs) [thread pid 51383 tid 100298 ] Stopped at _sx_xlock+0x15: lock cmpxchgq %rsi,0x18(%rdi) db> bt Tracing pid 51383 tid 100298 td 0xffffff00a598e720 _sx_xlock() at _sx_xlock+0x15 zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 zfs_umount() at zfs_umount+0xdd dounmount() at dounmount+0x2b4 unmount() at unmount+0x24b syscall() at syscall+0x1a5 Xfast_syscall() at Xfast_syscall+0xab --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp = 0x7fffffffd1a8, rbp = 0x801202300 --- db> call doadump Physical memory: 3314 MB Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113 1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873 857 841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601 585 569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41 25 9 Dump complete = 0 I've got the crashdump saved, if there's any information in there that can be helpful. This is -current from a week ago on amd64. At the current rate, this happens every couple of days, so gathering more information on the live system probably won't be a problem. Stefan -- Stefan Bethke Fon +49 151 14070811 From stb at lassitu.de Sun Feb 15 03:08:55 2009 From: stb at lassitu.de (Stefan Bethke) Date: Sun Feb 15 03:09:02 2009 Subject: zfs: using, then destroying a snapshot sometimes panics zfs In-Reply-To: <3A302EE1-F54D-4415-BC13-CA8ABBA320EC@lassitu.de> References: <76873DDF-D21B-48AF-9AFB-5A2747BE406B@lassitu.de> <3A302EE1-F54D-4415-BC13-CA8ABBA320EC@lassitu.de> Message-ID: <171C5946-63D1-4AC7-89F7-A951BEF3D1C6@lassitu.de> Am 15.02.2009 um 11:39 schrieb Stefan Bethke: > Am 08.02.2009 um 14:37 schrieb Stefan Bethke: > >> Sorry I can't be more precise at the moment, but while creating a >> script that mirrors some zfs filesystems to another machine, I've >> now twice gotten weird behaviour and then a panic. >> >> The script iterates over a couple of zfs file systems: >> - creates a snapshot with zfs snapshot tank/foo@mirror >> - uses rsync to copy the contents of the snapshot with rsync /tank/ >> foo/.zfs/snapshot/mirror/ dest:... >> - destroys the snapshot with zfs destroy tank/foo@mirror >> >> During testing the script, I twice got to a point where, after the >> snapshot was created without an error message, rsync dropped out >> with an error message similar to "invalid file handle" on /tank/ >> foo/.zfs/snapshot. >> >> At that point, I could cd to /tank/foo/.zfs, but ls produced the >> same error message. >> >> I then tried to unmount the snapshot with zfs umount, and got a >> panic (which I also didn't manage to capture). >> >> Is this a generally known issue, or should I try to capture more >> information when this happens again? > > > # cd /tank/foo/.zfs > # ls -l > ls: snapshot: Bad file descriptor > total 0 > # cd snapshot > -su: cd: snapshot: Not a directory > > I currently have no snapshots: > # zfs list -t snapshot > no datasets available > > However, on a different file system, I can list and cd into snapshot: > # /tank/bar/.zfs > # ls -l > total 0 > dr-xr-xr-x 2 root wheel 2 Feb 8 00:43 snapshot/ > # cd snapshot > > Trying to umount produces a panic: > # zfs umount /jail/foo > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0xa8 > fault code = supervisor write data, page not present > instruction pointer = 0x8:0xffffffff802ee565 > stack pointer = 0x10:0xfffffffea29c39e0 > frame pointer = 0x10:0xfffffffea29c39f0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 51383 (zfs) > [thread pid 51383 tid 100298 ] > Stopped at _sx_xlock+0x15: lock cmpxchgq %rsi,0x18(%rdi) > db> bt > Tracing pid 51383 tid 100298 td 0xffffff00a598e720 > _sx_xlock() at _sx_xlock+0x15 > zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 > zfs_umount() at zfs_umount+0xdd > dounmount() at dounmount+0x2b4 > unmount() at unmount+0x24b > syscall() at syscall+0x1a5 > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp = > 0x7fffffffd1a8, rbp = 0x801202300 --- > db> call doadump > Physical memory: 3314 MB > Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113 > 1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873 > 857 841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601 > 585 569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329 > 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41 > 25 9 > Dump complete > = 0 > > I've got the crashdump saved, if there's any information in there > that can be helpful. > > This is -current from a week ago on amd64. > > At the current rate, this happens every couple of days, so gathering > more information on the live system probably won't be a problem. Different machine, identical configuration, I just got this panic on reboot: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xa8 fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffffff802ee3b5 stack pointer = 0x10:0xfffffffe40016980 frame pointer = 0x10:0xfffffffe40016990 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1 (init) [thread pid 1 tid 100002 ] Stopped at _sx_xlock+0x15: lock cmpxchgq %rsi,0x18(%rdi) db> bt Tracing pid 1 tid 100002 td 0xffffff000141fab0 _sx_xlock() at _sx_xlock+0x15 zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 zfs_umount() at zfs_umount+0xdd dounmount() at dounmount+0x2b4 vfs_unmountall() at vfs_unmountall+0x42 boot() at boot+0x655 reboot() at reboot+0x42 syscall() at syscall+0x1a5 Xfast_syscall() at Xfast_syscall+0xab --- syscall (55, FreeBSD ELF64, reboot), rip = 0x40897c, rsp = 0x7fffffffe7b8, rbp = 0x402420 --- -- Stefan Bethke Fon +49 151 14070811 From ota at j.email.ne.jp Sun Feb 15 21:18:02 2009 From: ota at j.email.ne.jp (Yoshihiro Ota) Date: Sun Feb 15 21:18:09 2009 Subject: patch: let msdosfs(vfat)/ntfs to support UTF-8 locale well In-Reply-To: <499531A4.3020308@delphij.net> References: <98869b7c0902100112s6dae54bm4c14487076ceb75c@mail.gmail.com> <20090212183440.GA1446@tops> <20090213001350.52470f39.ota@j.email.ne.jp> <499531A4.3020308@delphij.net> Message-ID: <20090216000044.d77fec80.ota@j.email.ne.jp> FYI: This is another person who attempted the same or similar. It begins with the following one and got a couple of replies. http://docs.freebsd.org/cgi/getmsg.cgi?fetch=262846+0+archive/2006/freebsd-hackers/20060813.freebsd-hackers Is anyone intend to work on this issue? Regards, Hiro On Fri, 13 Feb 2009 00:39:00 -0800 Xin LI wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > (cc'ed to freebsd-fs@) > > I think it's important that someone familiar with the code review and > evaluate the current patches and commit it against -HEAD... > > MSDOSFS patch (against 7.1): > http://btload.googlegroups.com/web/msdosfs.patch?gda=MzIscT8AAABs_gmy4a1S9lRiXjEy-V5OpwtI67JnIGlz0zr18tjObOtoi5oIt3BJMRGeqGBbbj-ccyFKn-rNKC-d1pM_IdV0 > NTFS patch: > http://btload.googlegroups.com/web/ntfs.patch?gda=OqsHoDwAAABs_gmy4a1S9lRiXjEy-V5O7RN7t-m4MjZ-5dQn_EvaqDVCWO9_HyYEQJyRQYPtRCL9Wm-ajmzVoAFUlE7c_fAt > > Cheers, > - -- > Xin LI http://www.delphij.net/ > FreeBSD - The Power to Serve! > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.10 (FreeBSD) > > iEUEARECAAYFAkmVMaQACgkQi+vbBBjt66DN+wCghJbOUO7IfEwt5gFOB01uAAe1 > NLwAmOQXPJsB+lT7o5MMk16Ck6eUJrQ= > =ZGMA > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From bugmaster at FreeBSD.org Mon Feb 16 03:06:51 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Feb 16 03:07:52 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200902161106.n1GB6o2I096104@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131353 fs gjournal kernel lock o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs] [zfs] [panic] NFS v3 Panic when under high load o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] [lor] udf panic: getblk: size(67584) > M f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po 42 problems total. From des at des.no Mon Feb 16 07:15:26 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon Feb 16 07:15:33 2009 Subject: Pseudofs and pfs_attr_t for non-process based files In-Reply-To: (Andrew Brampton's message of "Fri, 23 Jan 2009 14:05:55 +0000") References: <868wp4pqwm.fsf@ds4.des.no> <86iqo74ba1.fsf@ds4.des.no> Message-ID: <86y6w6tnud.fsf@ds4.des.no> Andrew Brampton writes: > Here are both patches again, but only the procfs patch has changed. I > have now removed the redundant code. Sorry, I had completely forgotten about these. I'll commit them ASAP. DES -- Dag-Erling Sm?rgrav - des@des.no From brampton+freebsd-fs at gmail.com Mon Feb 16 07:17:38 2009 From: brampton+freebsd-fs at gmail.com (Andrew Brampton) Date: Mon Feb 16 07:17:48 2009 Subject: Pseudofs and pfs_attr_t for non-process based files In-Reply-To: <86y6w6tnud.fsf@ds4.des.no> References: <868wp4pqwm.fsf@ds4.des.no> <86iqo74ba1.fsf@ds4.des.no> <86y6w6tnud.fsf@ds4.des.no> Message-ID: Thanks, To which branch will they be commited? 7.1 or 8.0? thanks Andrew 2009/2/16 Dag-Erling Sm?rgrav : > Andrew Brampton writes: >> Here are both patches again, but only the procfs patch has changed. I >> have now removed the redundant code. > > Sorry, I had completely forgotten about these. I'll commit them ASAP. > > DES > -- > Dag-Erling Sm?rgrav - des@des.no > From des at des.no Mon Feb 16 07:28:16 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon Feb 16 07:28:22 2009 Subject: Pseudofs and pfs_attr_t for non-process based files In-Reply-To: (Andrew Brampton's message of "Mon, 16 Feb 2009 15:17:33 +0000") References: <868wp4pqwm.fsf@ds4.des.no> <86iqo74ba1.fsf@ds4.des.no> <86y6w6tnud.fsf@ds4.des.no> Message-ID: <86mycmtn8x.fsf@ds4.des.no> Andrew Brampton writes: > To which branch will they be commited? 7.1 or 8.0? head about 10 minutes ago, stable/7 will follow in a couple of weeks. DES -- Dag-Erling Sm?rgrav - des@des.no From linimon at FreeBSD.org Mon Feb 16 08:31:28 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Feb 16 08:31:40 2009 Subject: kern/131743: [ext2fs] utf-8 file names of ext2 partitions cause problems Message-ID: <200902161631.n1GGVRdn052520@freefall.freebsd.org> Old Synopsis: utf-8 file names of ext2 partitions cause problems New Synopsis: [ext2fs] utf-8 file names of ext2 partitions cause problems Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Feb 16 16:31:11 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=131743 From jh at saunalahti.fi Mon Feb 16 09:40:04 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Mon Feb 16 09:40:11 2009 Subject: kern/131743: utf-8 file names of ext2 partitions cause problems Message-ID: <200902161740.n1GHe3Uj001274@freefall.freebsd.org> The following reply was made to PR kern/131743; it has been noted by GNATS. From: Jaakko Heinonen To: Elmar Stellnberger Cc: bug-followup@FreeBSD.org Subject: Re: kern/131743: utf-8 file names of ext2 partitions cause problems Date: Mon, 16 Feb 2009 19:38:33 +0200 On 2009-02-16, Elmar Stellnberger wrote: > >Class: sw-bug > FreeBSD refuses to mount ext2 partitions with the iocharset=utf8 and > utf8 options. The consequence are not only wrongly displayed file > names. I don't think this is a bug. For me FreeBSD ext2fs works fine with UTF-8 encoded file names providing that you have configured locale settings correctly. Do you expect "iocharset=utf8" and "utf8" mount options to convert file names to some other encoding? AFAIK even Linux doesn't support such options for ext2. -- Jaakko From estellnb at googlemail.com Mon Feb 16 10:50:02 2009 From: estellnb at googlemail.com (Elmar Stellnberger) Date: Mon Feb 16 10:50:08 2009 Subject: kern/131743: utf-8 file names of ext2 partitions cause problems Message-ID: <200902161850.n1GIo2FX054731@freefall.freebsd.org> The following reply was made to PR kern/131743; it has been noted by GNATS. From: Elmar Stellnberger To: bug-followup@FreeBSD.org, jh@saunalahti.fi Cc: Subject: Re: Re: kern/131743: utf-8 file names of ext2 partitions cause problems Date: Mon, 16 Feb 2009 18:50:45 +0000 This is a multi-part message in MIME format. --------------070101080609050401060407 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Basically I would expect the utf8 file system option to convert from utf8 to whatever charset is selected by default at the moment. I will have to have another look on how to enable utf8 on FreeBSD. Previously consulted web resources have not worked as expected. Nevertheless if enabling utf8 as default charset is all I need to do in order to view and access these files correctly that should be fine for me. thx for any further hint. On 2009-02-16, Elmar Stellnberger wrote: >> > >Class: sw-bug >> > > FreeBSD refuses to mount ext2 partitions with the iocharset=utf8 and > > utf8 options. The consequence are not only wrongly displayed file > > names. > I don't think this is a bug. For me FreeBSD ext2fs works fine with UTF-8 encoded file names providing that you have configured locale settings correctly. Do you expect "iocharset=utf8" and "utf8" mount options to convert file names to some other encoding? AFAIK even Linux doesn't support such options for ext2. -- Jaakko --------------070101080609050401060407 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Basically I would expect the utf8 file system option to convert from utf8 to whatever charset is selected by default at the moment.
I will have to have another look on how to enable utf8 on FreeBSD. Previously consulted web resources have not worked as expected.
Nevertheless if enabling utf8 as default charset is all I need to do in order to view and access these files correctly that should be fine for me.
thx for any further hint.

On 2009-02-16, Elmar Stellnberger wrote:
 
> >Class:          sw-bug
     

 
> FreeBSD refuses to mount ext2 partitions with the iocharset=utf8 and
 > utf8 options. The consequence are not only wrongly displayed file
 > names.
   

 I don't think this is a bug. For me FreeBSD ext2fs works fine with UTF-8
 encoded file names providing that you have configured locale settings
 correctly.
 
 Do you expect "iocharset=utf8" and "utf8" mount options to convert
 file names to some other encoding? AFAIK even Linux doesn't support such
 options for ext2.
 
 
-- Jaakko

--------------070101080609050401060407-- From estellnb at googlemail.com Wed Feb 18 04:20:03 2009 From: estellnb at googlemail.com (Elmar Stellnberger) Date: Wed Feb 18 04:20:10 2009 Subject: kern/131743: utf-8 file names of ext2 partitions cause problems Message-ID: <200902181220.n1ICK3aZ077776@freefall.freebsd.org> The following reply was made to PR kern/131743; it has been noted by GNATS. From: Elmar Stellnberger To: Jaakko Heinonen , bug-followup@freebsd.org Cc: Subject: Re: kern/131743: utf-8 file names of ext2 partitions cause problems Date: Wed, 18 Feb 2009 12:16:09 +0000 This is a multi-part message in MIME format. --------------060705010904020402090803 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Interestingly LC_ALL=en_US.UTF-8 works well with xterm, but with xterm only. All other apps from the ports tree like VLC or any KDE app like konsole or konqueror do not support utf-8. Perhaps we should reassign this bug to the ports project. Or do you think about a kernel charset conversion for filenames(iocharset=utf8) as an additional option? Jaakko Heinonen wrote: > On 2009-02-16, Elmar Stellnberger wrote: > >>> Class: sw-bug >>> > > >> FreeBSD refuses to mount ext2 partitions with the iocharset=utf8 and >> utf8 options. The consequence are not only wrongly displayed file >> names. >> > > I don't think this is a bug. For me FreeBSD ext2fs works fine with UTF-8 > encoded file names providing that you have configured locale settings > correctly. > > Do you expect "iocharset=utf8" and "utf8" mount options to convert > file names to some other encoding? AFAIK even Linux doesn't support such > options for ext2. > > --------------060705010904020402090803 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Interestingly LC_ALL=en_US.UTF-8 works well with xterm, but with xterm only.
All other apps from the ports tree like VLC or any KDE app like konsole or konqueror
do not support utf-8. Perhaps we should reassign this bug to the ports project.

Or do you think about a kernel charset conversion for filenames(iocharset=utf8)
 as an additional option?


Jaakko Heinonen wrote:
On 2009-02-16, Elmar Stellnberger wrote:
   
Class:          sw-bug
       

   
FreeBSD refuses to mount ext2 partitions with the iocharset=utf8 and
 utf8 options. The consequence are not only wrongly displayed file
 names.
     

 I don't think this is a bug. For me FreeBSD ext2fs works fine with UTF-8
 encoded file names providing that you have configured locale settings
 correctly.
 
 Do you expect "iocharset=utf8" and "utf8" mount options to convert
 file names to some other encoding? AFAIK even Linux doesn't support such
 options for ext2.
 
   

--------------060705010904020402090803-- From gavin at FreeBSD.org Wed Feb 18 05:36:25 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Wed Feb 18 05:36:31 2009 Subject: kern/131743: [ext2fs] utf-8 file names of ext2 partitions cause problems Message-ID: <200902181336.n1IDaOfb039741@freefall.freebsd.org> Synopsis: [ext2fs] utf-8 file names of ext2 partitions cause problems State-Changed-From-To: open->closed State-Changed-By: gavin State-Changed-When: Wed Feb 18 13:34:49 UTC 2009 State-Changed-Why: Close, this was a misconfiguration rather than a bug. http://www.freebsd.org/cgi/query-pr.cgi?pr=131743 From hartzell at alerce.com Thu Feb 19 10:19:49 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu Feb 19 10:19:55 2009 Subject: Patch for 'zfs send -R' core dump (pr bin/130105) Message-ID: <18845.40641.33220.936902@almost.alerce.com> The following patch to /usr/src/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c seems to keep 'zfs send -R' from dumping core. I've only been able to test sending the stream to /dev/null or a file, I'm still setting up a pool to do the receiving. This is based on a bit of gdb debugging and a thread from zfs-fuse: http://groups.google.com/group/zfs-fuse/browse_thread/thread/158cb78bc3325ae3/6a0109c7b0942707?#6a0109c7b0942707 g. --- zfs_prop.c 2009/02/17 18:58:58 1.1 +++ zfs_prop.c 2009/02/19 09:54:04 @@ -297,7 +297,7 @@ /* hidden properties */ register_hidden(ZFS_PROP_CREATETXG, "createtxg", PROP_TYPE_NUMBER, - PROP_READONLY, ZFS_TYPE_DATASET, NULL); + PROP_READONLY, ZFS_TYPE_DATASET, "CREATETXG"); register_hidden(ZFS_PROP_NUMCLONES, "numclones", PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_SNAPSHOT, NULL); register_hidden(ZFS_PROP_NAME, "name", PROP_TYPE_STRING, From rdivacky at freebsd.org Thu Feb 19 10:29:54 2009 From: rdivacky at freebsd.org (Roman Divacky) Date: Thu Feb 19 10:30:00 2009 Subject: Patch for 'zfs send -R' core dump (pr bin/130105) In-Reply-To: <18845.40641.33220.936902@almost.alerce.com> References: <18845.40641.33220.936902@almost.alerce.com> Message-ID: <20090219181114.GA57360@freebsd.org> btw.... I track the opensolaris hg at work and there's a quite a few commits to zfs (almost) every week. fixing coredumps and other problems. maybe we can track the opensolaris a little closer? On Thu, Feb 19, 2009 at 10:02:41AM -0800, George Hartzell wrote: > > The following patch to > > /usr/src/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c > > seems to keep 'zfs send -R' from dumping core. I've only been able to > test sending the stream to /dev/null or a file, I'm still setting up a > pool to do the receiving. > > This is based on a bit of gdb debugging and a thread from zfs-fuse: > > http://groups.google.com/group/zfs-fuse/browse_thread/thread/158cb78bc3325ae3/6a0109c7b0942707?#6a0109c7b0942707 > > g. > > --- zfs_prop.c 2009/02/17 18:58:58 1.1 > +++ zfs_prop.c 2009/02/19 09:54:04 > @@ -297,7 +297,7 @@ > > /* hidden properties */ > register_hidden(ZFS_PROP_CREATETXG, "createtxg", PROP_TYPE_NUMBER, > - PROP_READONLY, ZFS_TYPE_DATASET, NULL); > + PROP_READONLY, ZFS_TYPE_DATASET, "CREATETXG"); > register_hidden(ZFS_PROP_NUMCLONES, "numclones", PROP_TYPE_NUMBER, > PROP_READONLY, ZFS_TYPE_SNAPSHOT, NULL); > register_hidden(ZFS_PROP_NAME, "name", PROP_TYPE_STRING, > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From hartzell at alerce.com Thu Feb 19 10:40:03 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu Feb 19 10:40:11 2009 Subject: bin/130105: [zfs] zfs send -R dumps core Message-ID: <200902191840.n1JIe3Ep087537@freefall.freebsd.org> The following reply was made to PR bin/130105; it has been noted by GNATS. From: George Hartzell To: bug-followup@FreeBSD.org, goran.lowkrantz@ismobile.com Cc: Subject: Re: bin/130105: [zfs] zfs send -R dumps core Date: Thu, 19 Feb 2009 10:05:27 -0800 The following patch seems to fix the core dumps. I've only tested sending the stream to /dev/null and to a file, still working on setting up a receiver. This is based on a thread in the zfs-fuse mailing list. g. --- zfs_prop.c 2009/02/17 18:58:58 1.1 +++ zfs_prop.c 2009/02/19 09:54:04 @@ -297,7 +297,7 @@ /* hidden properties */ register_hidden(ZFS_PROP_CREATETXG, "createtxg", PROP_TYPE_NUMBER, - PROP_READONLY, ZFS_TYPE_DATASET, NULL); + PROP_READONLY, ZFS_TYPE_DATASET, "CREATETXG"); register_hidden(ZFS_PROP_NUMCLONES, "numclones", PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_SNAPSHOT, NULL); register_hidden(ZFS_PROP_NAME, "name", PROP_TYPE_STRING, From pjd at FreeBSD.org Thu Feb 19 13:04:22 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Thu Feb 19 13:04:35 2009 Subject: Patch for 'zfs send -R' core dump (pr bin/130105) In-Reply-To: <20090219181114.GA57360@freebsd.org> References: <18845.40641.33220.936902@almost.alerce.com> <20090219181114.GA57360@freebsd.org> Message-ID: <20090219203541.GB2083@garage.freebsd.pl> On Thu, Feb 19, 2009 at 07:11:14PM +0100, Roman Divacky wrote: > > btw.... I track the opensolaris hg at work and there's a quite a few commits > to zfs (almost) every week. fixing coredumps and other problems. > > maybe we can track the opensolaris a little closer? It is beeing tracked very close, I'd say, in perforce (//depot/user/pjd/zfs/...). The problem is that trivial fixes are mixed with very intrusive changes, so it is too risky to track it even in HEAD (I'm running HEAD on my ZFS-only laptop!). OpenSolaris development is also very different from FreeBSD's. We use to describe every single change very carefully in commit logs, where OpenSolaris commits are based only on bug number and bug descriptions. Many changes are committed at once, so it is hard to pick only some changes. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090219/e398f9b8/attachment.pgp From hartzell at alerce.com Thu Feb 19 16:44:46 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu Feb 19 16:44:52 2009 Subject: Patch for 'zfs send -R' core dump (pr bin/130105) In-Reply-To: <20090219203541.GB2083@garage.freebsd.pl> References: <18845.40641.33220.936902@almost.alerce.com> <20090219181114.GA57360@freebsd.org> <20090219203541.GB2083@garage.freebsd.pl> Message-ID: <18845.64765.71312.244570@almost.alerce.com> Pawel Jakub Dawidek writes: > On Thu, Feb 19, 2009 at 07:11:14PM +0100, Roman Divacky wrote: > > > > btw.... I track the opensolaris hg at work and there's a quite a few commits > > to zfs (almost) every week. fixing coredumps and other problems. > > > > maybe we can track the opensolaris a little closer? > > It is beeing tracked very close, I'd say, in perforce > (//depot/user/pjd/zfs/...). > > The problem is that trivial fixes are mixed with very intrusive changes, > so it is too risky to track it even in HEAD (I'm running HEAD on my > ZFS-only laptop!). OpenSolaris development is also very different from > FreeBSD's. We use to describe every single change very carefully in > commit logs, where OpenSolaris commits are based only on bug number and > bug descriptions. Many changes are committed at once, so it is hard to > pick only some changes. Is it possible for fixes like the one that I posted to get merged into -CURRENT w/out waiting for your next mega-relase? On the one hand it would be very useful, on the other I can imagine that if it happens too often it would make your life difficult. g. From k0802647 at telus.net Sun Feb 22 00:38:49 2009 From: k0802647 at telus.net (Carl) Date: Sun Feb 22 00:39:00 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? Message-ID: <49A10626.8060705@telus.net> I've come across what I'm thinking may be a bug in the context of FreeBSD 7.0 with a pair of gmirrored drives and gjournaled partitions when copying a large number of files into a file-backed memory device. The consequence of this problem is that a process enters the 'D' state (process in disk) indefinitely, cannot be killed, and the system cannot be shutdown. The only solution is to cold reboot the system, which is a really big problem for remote systems. This is happening to me intermittently with the standard tar-tar pipeline form of copying, but has happened with the rsync 3.0.4 port as well. I would appreciate it if some of you would see if you can repeat this problem. Here is a sequence of tcsh shell commands which manifest the problem (on occasion but not every time), which I will refer to as the "truncate sequence" (depends on fully populated /usr/src tree as data set): # truncate -s 671088640 target # mdconfig -f target -S 512 -y 255 -x 63 -u 7 # bsdlabel -w /dev/md7 auto # newfs -O2 -m 0 -o space /dev/md7a # mount /dev/md7a /media # tar -cvf - -C /usr/src . | tar -xvpof - -C /media # umount /media ; mdconfig -d -u 7 ; rm target An alternate version has yet to fail for me and involves replacing the first line with this one: # dd if=/dev/zero of=target bs=1M count=640 I'll call that the "dd sequence". Here is an ordered series of tests I just completed: a) Repeated truncate sequence 7 times - 1st, 5th, and 7th failed. b) Repeated dd sequence 7 times - no failures. c) Repeated truncate sequence 6 time - no failures. d) Used following sequence to ensure all disk caches flushed: # dd if=/dev/random of=target bs=1M count=4096 # dd if=target of=/dev/null bs=1M # rm target e) Repeated truncate sequence 4 times - no failures. f) Performed orderly reboot. g) Repeated truncate sequence 2 times - 2nd failed. h) Performed orderly reboot. i) Repeated dd sequence 7 times - no failures. All failures involve the second tar in the pipeline hanging in the 'D' state. In each case I do a cold reboot before proceeding with the next test. It's tempting to speculate that a bug exists in code related to handling sparse files specifically, but perhaps it just raises the probability of tripping a bug that would eventually manifest in the dd sequence as well. OTOH, I don't know how to rule out a physical disk or disk firmware problem. This problem has occurred with different data sets and different sized memory disks, but only with the source and destination filesystems being UFS2. I have done similar sequences with EXT2 and FAT16 destinations with no failures thus far, but the memory disks and data sets were smaller so it's conceivable that probability worked against me. I should note that the drives are Seagate ST31000340AS Barracudas, but both drives have been upgraded to firmware version SD1A and are therefore supposedly free of the infamous little horror Seagate inflicted on so many of us. smartctl tells me that both disks still have a raw value of 0 for Reallocated_Sector_Ct and both pass the "short" self test. Carl / K0802647 From kostikbel at gmail.com Sun Feb 22 03:01:00 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Sun Feb 22 03:01:07 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A10626.8060705@telus.net> References: <49A10626.8060705@telus.net> Message-ID: <20090222110052.GH41617@deviant.kiev.zoral.com.ua> On Sun, Feb 22, 2009 at 12:00:38AM -0800, Carl wrote: > I've come across what I'm thinking may be a bug in the context of > FreeBSD 7.0 with a pair of gmirrored drives and gjournaled partitions > when copying a large number of files into a file-backed memory device. > > The consequence of this problem is that a process enters the 'D' state > (process in disk) indefinitely, cannot be killed, and the system cannot > be shutdown. The only solution is to cold reboot the system, which is a > really big problem for remote systems. This is happening to me > intermittently with the standard tar-tar pipeline form of copying, but > has happened with the rsync 3.0.4 port as well. > > I would appreciate it if some of you would see if you can repeat this > problem. Here is a sequence of tcsh shell commands which manifest the > problem (on occasion but not every time), which I will refer to as the > "truncate sequence" (depends on fully populated /usr/src tree as data set): > > # truncate -s 671088640 target > # mdconfig -f target -S 512 -y 255 -x 63 -u 7 > # bsdlabel -w /dev/md7 auto > # newfs -O2 -m 0 -o space /dev/md7a > # mount /dev/md7a /media > # tar -cvf - -C /usr/src . | tar -xvpof - -C /media > # umount /media ; mdconfig -d -u 7 ; rm target > > An alternate version has yet to fail for me and involves replacing the > first line with this one: > > # dd if=/dev/zero of=target bs=1M count=640 > > I'll call that the "dd sequence". Here is an ordered series of tests I > just completed: > > a) Repeated truncate sequence 7 times - 1st, 5th, and 7th failed. > b) Repeated dd sequence 7 times - no failures. > c) Repeated truncate sequence 6 time - no failures. > d) Used following sequence to ensure all disk caches flushed: > > # dd if=/dev/random of=target bs=1M count=4096 > # dd if=target of=/dev/null bs=1M > # rm target > > e) Repeated truncate sequence 4 times - no failures. > f) Performed orderly reboot. > g) Repeated truncate sequence 2 times - 2nd failed. > h) Performed orderly reboot. > i) Repeated dd sequence 7 times - no failures. > > All failures involve the second tar in the pipeline hanging in the 'D' > state. In each case I do a cold reboot before proceeding with the next test. > > It's tempting to speculate that a bug exists in code related to handling > sparse files specifically, but perhaps it just raises the probability of > tripping a bug that would eventually manifest in the dd sequence as > well. OTOH, I don't know how to rule out a physical disk or disk > firmware problem. > > This problem has occurred with different data sets and different sized > memory disks, but only with the source and destination filesystems being > UFS2. I have done similar sequences with EXT2 and FAT16 destinations > with no failures thus far, but the memory disks and data sets were > smaller so it's conceivable that probability worked against me. > > I should note that the drives are Seagate ST31000340AS Barracudas, but > both drives have been upgraded to firmware version SD1A and are > therefore supposedly free of the infamous little horror Seagate > inflicted on so many of us. smartctl tells me that both disks still have > a raw value of 0 for Reallocated_Sector_Ct and both pass the "short" > self test. Please, see http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html for instructions on how to gather the required information to diagnose the issue. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090222/221d4b12/attachment.pgp From k0802647 at telus.net Sun Feb 22 15:55:02 2009 From: k0802647 at telus.net (Carl) Date: Sun Feb 22 15:55:09 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <20090222110052.GH41617@deviant.kiev.zoral.com.ua> References: <49A10626.8060705@telus.net> <20090222110052.GH41617@deviant.kiev.zoral.com.ua> Message-ID: <49A1E5CE.5000501@telus.net> Kostik Belousov wrote: > Please, see > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > for instructions on how to gather the required information to diagnose > the issue. I'm not sure that it's possible for me to get into rebuilding and debugging the kernel, tar, or whatever component is at issue right now. If others can reproduce the problem, that would at least rule out a hardware problem as a starting point and hopefully garner further insight from someone more knowledgeable than I. FWIW, my system did not "stop doing useful work". Since I was using 'screen', upon the tar process hanging I switched to another window and was able to use ps, mount the procfs, try killing things, etc. Aside from being unable to kill the tar process or reboot the system, at least some other forms of work are still possible. Does this qualify as a kernel deadlock? Is there some other way to forcibly reboot a remote system from the command line when a normal shutdown command is going to totally hang the system in this way? Or perhaps some kind of watchdog that has a good chance of surviving long enough to unjam a situation like this? Carl / K0802647 From toasty at dragondata.com Sun Feb 22 17:16:57 2009 From: toasty at dragondata.com (Kevin Day) Date: Sun Feb 22 17:17:04 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A1E5CE.5000501@telus.net> References: <49A10626.8060705@telus.net> <20090222110052.GH41617@deviant.kiev.zoral.com.ua> <49A1E5CE.5000501@telus.net> Message-ID: On Feb 22, 2009, at 5:54 PM, Carl wrote: > > Is there some other way to forcibly reboot a remote system from the > command line when a normal shutdown command is going to totally hang > the system in this way? Or perhaps some kind of watchdog that has a > good chance of surviving long enough to unjam a situation like this? reboot(8)'s man page: -n The file system cache is not flushed. This option should proba- bly not be used. -q The system is halted or restarted quickly and ungracefully, and only the flushing of the file system cache is performed (if the -n option is not specified). This option should probably not be used. One or both of those would probably do it. -- Kevin From k0802647 at telus.net Sun Feb 22 19:43:44 2009 From: k0802647 at telus.net (Carl) Date: Sun Feb 22 19:43:50 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: References: <49A10626.8060705@telus.net> <20090222110052.GH41617@deviant.kiev.zoral.com.ua> <49A1E5CE.5000501@telus.net> Message-ID: <49A21B6B.7060709@telus.net> Kevin Day wrote: > On Feb 22, 2009, at 5:54 PM, Carl wrote: >> Is there some other way to forcibly reboot a remote system from the >> command line when a normal shutdown command is going to totally hang >> the system in this way? Or perhaps some kind of watchdog that has a >> good chance of surviving long enough to unjam a situation like this? > > reboot(8)'s man page: > > -n The file system cache is not flushed. This option should > probably not be used. > > -q The system is halted or restarted quickly and ungracefully, > and only the flushing of the file system cache is > performed (if the -n option is not specified). This > option should probably not be used. > > One or both of those would probably do it. Obviously I need to work on RTFM. I didn't look past shutdown(8). In my defence... I've got nothing. Thanks Kevin :-) A watchdog would still be valuable for those situations when one attempts a normal remote reboot, the system hangs, and sshd dies with it. Carl / K0802647 From linimon at FreeBSD.org Mon Feb 23 00:50:02 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Feb 23 00:50:12 2009 Subject: kern/131995: [nfs] Failure to mount NFSv4 server Message-ID: <200902230849.n1N8nxXa049593@freefall.freebsd.org> Old Synopsis: Failure to mount NFSv4 server New Synopsis: [nfs] Failure to mount NFSv4 server Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Feb 23 08:49:12 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=131995 From rwatson at FreeBSD.org Mon Feb 23 02:26:16 2009 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Feb 23 02:26:22 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A10626.8060705@telus.net> References: <49A10626.8060705@telus.net> Message-ID: On Sun, 22 Feb 2009, Carl wrote: > I've come across what I'm thinking may be a bug in the context of FreeBSD > 7.0 with a pair of gmirrored drives and gjournaled partitions when copying a > large number of files into a file-backed memory device. > > The consequence of this problem is that a process enters the 'D' state > (process in disk) indefinitely, cannot be killed, and the system cannot be > shutdown. The only solution is to cold reboot the system, which is a really > big problem for remote systems. This is happening to me intermittently with > the standard tar-tar pipeline form of copying, but has happened with the > rsync 3.0.4 port as well. It would be interesting to get kernel stack traces of the involved processes/threads; there are various ways to do this, such as using DDB. If you have a kernel.symbols for the kernel, then you can run kgdb on kernel.symbols and /dev/mem to generate traces without interrupting operation (although if the system is in the throes of deadlocking, that may not be a concern or even possible). You can also use procstat -kk to retrieve kernel stack traces, with a bit less information (such as no arguments) to help narrow things down more. Unfortunately, debugging this type of problem, as you've intuited, is best done with serial console access and a local box so that the debugging information can be extracted. It would be interesting to know if you can force a crashdump on the box to get the information for post-mortem debugging. This may be possible using "reboot -d" -- I've never used this, but have every reason to think it will work. Robert N M Watson Computer Laboratory University of Cambridge > > I would appreciate it if some of you would see if you can repeat this > problem. Here is a sequence of tcsh shell commands which manifest the problem > (on occasion but not every time), which I will refer to as the "truncate > sequence" (depends on fully populated /usr/src tree as data set): > > # truncate -s 671088640 target > # mdconfig -f target -S 512 -y 255 -x 63 -u 7 > # bsdlabel -w /dev/md7 auto > # newfs -O2 -m 0 -o space /dev/md7a > # mount /dev/md7a /media > # tar -cvf - -C /usr/src . | tar -xvpof - -C /media > # umount /media ; mdconfig -d -u 7 ; rm target > > An alternate version has yet to fail for me and involves replacing the first > line with this one: > > # dd if=/dev/zero of=target bs=1M count=640 > > I'll call that the "dd sequence". Here is an ordered series of tests I just > completed: > > a) Repeated truncate sequence 7 times - 1st, 5th, and 7th failed. > b) Repeated dd sequence 7 times - no failures. > c) Repeated truncate sequence 6 time - no failures. > d) Used following sequence to ensure all disk caches flushed: > > # dd if=/dev/random of=target bs=1M count=4096 > # dd if=target of=/dev/null bs=1M > # rm target > > e) Repeated truncate sequence 4 times - no failures. > f) Performed orderly reboot. > g) Repeated truncate sequence 2 times - 2nd failed. > h) Performed orderly reboot. > i) Repeated dd sequence 7 times - no failures. > > All failures involve the second tar in the pipeline hanging in the 'D' state. > In each case I do a cold reboot before proceeding with the next test. > > It's tempting to speculate that a bug exists in code related to handling > sparse files specifically, but perhaps it just raises the probability of > tripping a bug that would eventually manifest in the dd sequence as well. > OTOH, I don't know how to rule out a physical disk or disk firmware problem. > > This problem has occurred with different data sets and different sized memory > disks, but only with the source and destination filesystems being UFS2. I > have done similar sequences with EXT2 and FAT16 destinations with no failures > thus far, but the memory disks and data sets were smaller so it's conceivable > that probability worked against me. > > I should note that the drives are Seagate ST31000340AS Barracudas, but both > drives have been upgraded to firmware version SD1A and are therefore > supposedly free of the infamous little horror Seagate inflicted on so many of > us. smartctl tells me that both disks still have a raw value of 0 for > Reallocated_Sector_Ct and both pass the "short" self test. > > Carl / K0802647 > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From bugmaster at FreeBSD.org Mon Feb 23 03:06:51 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Feb 23 03:07:40 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200902231106.n1NB6oHh055482@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131353 fs gjournal kernel lock o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs] [zfs] [panic] NFS v3 Panic when under high load o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] [lor] udf panic: getblk: size(67584) > M f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po 43 problems total. From ktk at netlabs.org Mon Feb 23 07:11:22 2009 From: ktk at netlabs.org (Adrian Gschwend) Date: Mon Feb 23 07:11:29 2009 Subject: kernel panic while writing to a ZFS volume on iSCSI LUN Message-ID: Hi, I am testing ZFS, iSCSI & snapshots on FreeBSD as I want to use it later for some productive data. While doing an rsync from a maildir with lots of small files, the box crashed. Until then there were about 3.5GB of data transferred Platform: FreeBSD 7.1, i386, install from CD Box: Xeon 3.2 something RAM: 1.5GB 50GB iSCSI LUN on Network Appliance Filer I tried to get some hints out of the dump file according to http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html but I noticed I had to get source first & compile the kernel on my own. The backtrace here is done using a kernel.debug which got compiled out of RELENG_7_1, but it was *not* the same kernel which was running at the time of the crash (kernel was the default one, not the compiled one). Does that make a difference for my case or is that ok like this? Message in /var/log/messages on reboot: - Feb 20 00:01:23 chewbacca savecore: reboot after panic: kmem_malloc(131072): kmem_map too small: 228896768 total allocated Feb 20 00:01:23 chewbacca savecore: writing core to vmcore.0 - backtrace: http://freebsd.pastebin.com/m36e7ddd0 My question is if it is crashing in iSCSI or ZFS code & what I should try to get rid of it :-) thanks Adrian -- Adrian Gschwend @ netlabs.org ktk [a t] netlabs.org ------- Open Source Project http://www.netlabs.org From ml-ktk at netlabs.org Mon Feb 23 08:26:49 2009 From: ml-ktk at netlabs.org (ml-ktk@netlabs.org) Date: Mon Feb 23 08:26:55 2009 Subject: kernel panic while writing to a ZFS volume on iSCSI LUN Message-ID: <660f28ee8aa9c3a76b7d736e5ae3c229.squirrel@mail.netlabs.org> Hi, I am testing ZFS, iSCSI & snapshots on FreeBSD as I want to use it later for some productive data. While doing an rsync from a maildir with lots of small files, the box crashed. Until then there were about 3.5GB of data transferred Platform: FreeBSD 7.1, i386, install from CD Box: Xeon 3.2 something RAM: 1.5GB 50GB iSCSI LUN on Network Appliance Filer I tried to get some hints out of the dump file according to http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html but I noticed I had to get source first & compile the kernel on my own. The backtrace here is done using a kernel.debug which got compiled out of RELENG_7_1, but it was *not* the same kernel which was running at the time of the crash (kernel was the default one, not the compiled one). Does that make a difference for my case or is that ok like this? Message in /var/log/messages on reboot: - Feb 20 00:01:23 chewbacca savecore: reboot after panic: kmem_malloc(131072): kmem_map too small: 228896768 total allocated Feb 20 00:01:23 chewbacca savecore: writing core to vmcore.0 - backtrace: http://freebsd.pastebin.com/m36e7ddd0 My question is if it is crashing in iSCSI or ZFS code & what I should try to get rid of it :-) thanks Adrian From k0802647 at telus.net Mon Feb 23 15:47:14 2009 From: k0802647 at telus.net (Carl) Date: Mon Feb 23 15:47:20 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: References: <49A10626.8060705@telus.net> Message-ID: <49A3357A.7080008@telus.net> Robert Watson wrote: > It would be interesting to get kernel stack traces of the involved > processes/threads; there are various ways to do this, such as using > DDB. If you have a kernel.symbols for the kernel, then you can run kgdb > on kernel.symbols and /dev/mem to generate traces without interrupting > operation (although if the system is in the throes of deadlocking, that > may not be a concern or even possible). You can also use procstat -kk > to retrieve kernel stack traces, with a bit less information (such as no > arguments) to help narrow things down more. > > Unfortunately, debugging this type of problem, as you've intuited, is > best done with serial console access and a local box so that the > debugging information can be extracted. It would be interesting to know > if you can force a crashdump on the box to get the information for > post-mortem debugging. This may be possible using "reboot -d" -- I've > never used this, but have every reason to think it will work. I have both a local and remote box. The problems I'm seeing are all occurring on the local box because as yet I cannot afford to cause them on a remote box. If you were to guess I've never used DDB or any other kernel debugging, you'd be spot on. I'm currently running the 7.0-RELEASE GENERIC kernel. I see a /boot/kernel/kernel.symbols in the filesystem. The system is nominally headless with a serial console, although I primarily use SSH. Even if I knew what to do with them, actually collecting kernel dumps is a hit or miss affair because of gmirror, but this particular problem doesn't cause kernel core dumps on its own (thankfully, since gmirror resyncs take a long time on terabyte drives). So, if you were able to clearly spell out the stripped down steps I should take in conjunction with my earlier truncate sequence and if it doesn't require rebuilding the kernel, I might be able to accommodate. Learning all about kernel debugging would be interesting but doesn't fit in my schedule right now. Anyone willing to attempt to reproduce this problem on their system? Carl / K0802647 From dimitar.vassilev at gmail.com Mon Feb 23 21:31:57 2009 From: dimitar.vassilev at gmail.com (Dimitar Vasilev) Date: Mon Feb 23 21:32:03 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A3357A.7080008@telus.net> References: <49A10626.8060705@telus.net> <49A3357A.7080008@telus.net> Message-ID: <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> 2009/2/24 Carl > Robert Watson wrote: > >> It would be interesting to get kernel stack traces of the involved >> processes/threads; there are various ways to do this, such as using DDB. If >> you have a kernel.symbols for the kernel, then you can run kgdb on >> kernel.symbols and /dev/mem to generate traces without interrupting >> operation (although if the system is in the throes of deadlocking, that may >> not be a concern or even possible). You can also use procstat -kk to >> retrieve kernel stack traces, with a bit less information (such as no >> arguments) to help narrow things down more. >> >> Unfortunately, debugging this type of problem, as you've intuited, is best >> done with serial console access and a local box so that the debugging >> information can be extracted. It would be interesting to know if you can >> force a crashdump on the box to get the information for post-mortem >> debugging. This may be possible using "reboot -d" -- I've never used this, >> but have every reason to think it will work. >> > > I have both a local and remote box. The problems I'm seeing are all > occurring on the local box because as yet I cannot afford to cause them on a > remote box. > > If you were to guess I've never used DDB or any other kernel debugging, > you'd be spot on. I'm currently running the 7.0-RELEASE GENERIC kernel. I > see a /boot/kernel/kernel.symbols in the filesystem. The system is nominally > headless with a serial console, although I primarily use SSH. Even if I knew > what to do with them, actually collecting kernel dumps is a hit or miss > affair because of gmirror, but this particular problem doesn't cause kernel > core dumps on its own (thankfully, since gmirror resyncs take a long time on > terabyte drives). So, if you were able to clearly spell out the stripped > down steps I should take in conjunction with my earlier truncate sequence > and if it doesn't require rebuilding the kernel, I might be able to > accommodate. Learning all about kernel debugging would be interesting but > doesn't fit in my schedule right now. > > Anyone willing to attempt to reproduce this problem on their system? > > > Carl / K0802647 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Hi Carl, How about a soekris board, a USB stick and a null modem cable for collecting data from the local box? Or a laptop with a USB to serial adapter ? Cheers, Dimitar From k0802647 at telus.net Tue Feb 24 00:42:39 2009 From: k0802647 at telus.net (Carl) Date: Tue Feb 24 00:42:46 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> References: <49A10626.8060705@telus.net> <49A3357A.7080008@telus.net> <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> Message-ID: <49A3B2FC.4050601@telus.net> The same truncate sequence has now been tried on a second system which is hardware- and OS-identical. The same hanging scenario occurs on it. That should eliminate the possibility of a defective hard disk as the cause. The same sequence has also now been tried on a laptop, both natively and in a virtual machine on top of WinXP, although both were FreeBSD 7.1 rather than 7.0. Neither have failed so far. Given the latest hang opportunity, "reboot -q" and "reboot -nq" were tried as suggested by Kevin Day. The former didn't work and itself went into the 'D' state permanently. The "-nq" option *did* work, but obviously one has to recognize it's needed before totally hanging the remote system. Once the tar process has hung, screen keeps working, which gives a chance to create/change to a new window and issue the reboot. New SSH connections don't work. Commands like ps and date still work, but commands like ls go straight into a permanent 'D' state themselves. Bad scene for a remote system :-( Dimitar Vasilev wrote: > How about a soekris board, a USB stick and a null modem cable for collecting > data from the local box? Or a laptop with a USB to serial adapter ? I'll take them ;-) Sorry, Dimitar, if I was unclear. I've already got equipment connected to the serial console on my local machine, so capturing output is no problem. It's all the kernel debugging stuff I've no knowledge of. Carl / K0802647 From ulf.lilleengen at gmail.com Tue Feb 24 06:02:56 2009 From: ulf.lilleengen at gmail.com (Ulf Lilleengen) Date: Tue Feb 24 06:03:03 2009 Subject: kernel panic while writing to a ZFS volume on iSCSI LUN In-Reply-To: <660f28ee8aa9c3a76b7d736e5ae3c229.squirrel@mail.netlabs.org> References: <660f28ee8aa9c3a76b7d736e5ae3c229.squirrel@mail.netlabs.org> Message-ID: <20090224150217.GA15114@carrot> On Mon, Feb 23, 2009 at 05:00:06PM +0100, ml-ktk@netlabs.org wrote: > Hi, > > I am testing ZFS, iSCSI & snapshots on FreeBSD as I want to use it later > for some productive data. While doing an rsync from a maildir with lots > of small files, the box crashed. Until then there were about 3.5GB of > data transferred > > Platform: FreeBSD 7.1, i386, install from CD > Box: Xeon 3.2 something > RAM: 1.5GB > 50GB iSCSI LUN on Network Appliance Filer > > I tried to get some hints out of the dump file according to > > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html > > but I noticed I had to get source first & compile the kernel on my own. > The backtrace here is done using a kernel.debug which got compiled out > of RELENG_7_1, but it was *not* the same kernel which was running at the > time of the crash (kernel was the default one, not the compiled one). > Does that make a difference for my case or is that ok like this? > > Message in /var/log/messages on reboot: > > - > Feb 20 00:01:23 chewbacca savecore: reboot after panic: > kmem_malloc(131072): kmem_map too small: 228896768 total allocated > Feb 20 00:01:23 chewbacca savecore: writing core to vmcore.0 > - > > backtrace: > http://freebsd.pastebin.com/m36e7ddd0 > > My question is if it is crashing in iSCSI or ZFS code & what I should > try to get rid of it :-) > This is a problem with ZFS due to exhaustion of the kernel memory address space (ZFS is quite hungry for memory). This can be solved by finely tuning the different limits specified here: http://wiki.freebsd.org/ZFSTuningGuide You might have to do some try and fail to get it to work, but my experience is that the problems is usually solveable if you invest enough time in the tuning. -- Ulf Lilleengen From linimon at FreeBSD.org Tue Feb 24 09:47:41 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Tue Feb 24 09:47:54 2009 Subject: kern/132068: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200902241747.n1OHlfhW098277@freefall.freebsd.org> Synopsis: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Feb 24 17:47:30 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=132068 From dimitar.vassilev at gmail.com Tue Feb 24 09:58:05 2009 From: dimitar.vassilev at gmail.com (Dimitar Vasilev) Date: Tue Feb 24 09:58:11 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A3B2FC.4050601@telus.net> References: <49A10626.8060705@telus.net> <49A3357A.7080008@telus.net> <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> <49A3B2FC.4050601@telus.net> Message-ID: <59adc1a0902240958o3d3e26f6kdd339c6dbad1aa7c@mail.gmail.com> > Dimitar Vasilev wrote: >> >> How about a soekris board, a USB stick and a null modem cable for >> collecting >> data from the local box? Or a laptop with a USB to serial adapter ? > > I'll take them ;-) > > Sorry, Dimitar, if I was unclear. I've already got equipment connected to > the serial console on my local machine, so capturing output is no problem. > It's all the kernel debugging stuff I've no knowledge of. > > Carl ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? / K0802647 > Well, like most things in life you should give a try. I have made some type of action plan for you: 1) Compile a kernel with the options mentioned by Kostik at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html 2) Set the sio flags to 0x80 of consoles on the local machine per http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-gdb.html 3)get a copy of DDD if you want to 4) start having fun From Greeting at Greetings.com Tue Feb 24 10:49:48 2009 From: Greeting at Greetings.com (Greetings.com) Date: Tue Feb 24 10:50:14 2009 Subject: Hey, you have a new Greeting !!! Message-ID: Hello friend ! You have just received a postcard Greeting from someone who cares about you... Just click [1]here to receive your Animated Greeting ! Thank you for using www.Greetings.com services !!! Please take this opportunity to let your friends hear about us by sending them a postcard from our collection ! References 1. http://summitsteelinc.com/views/e-greetings.exe From jh at saunalahti.fi Tue Feb 24 11:40:05 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Tue Feb 24 11:40:11 2009 Subject: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200902241940.n1OJe58i080736@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: Jaakko Heinonen To: Edward Fisk <7ogcg7g02@sneakemail.com> Cc: bug-followup@FreeBSD.org Subject: Re: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: Tue, 24 Feb 2009 21:32:42 +0200 Hi, On 2009-02-24, Edward Fisk wrote: > If you require any more information or testing, please let me know. > (kgdb) frame 8 > #8 0xffffffff80651673 in nfsrv_readdirplus (nfsd=3D0xffffff0041546700, slp= > =3D0xffffff0079a7ad00, td=3D0xffffff0003793370, mrq=3D0xffffffffdd9d6b00) a= > t /usr/src/sys/nfsserver/nfs_serv.c:3645 > 3645 nfhp->fh_fsid =3D Can you give output of these commands in frame 8: p *nvp p *nvp->v_mount If nvp is NULL it's likely a bug in zfs_zget(). Looks like the zfs version in head has updated zfs_zget() which might fix the issue. -- Jaakko From nbari at k9.cx Tue Feb 24 12:36:26 2009 From: nbari at k9.cx (Nicolas de Bari Embriz Garcia Rojas) Date: Tue Feb 24 12:36:33 2009 Subject: jail with zfs and quotas Message-ID: <2284D7E5-BCA9-4B1F-9270-7483176CCE23@k9.cx> Hi, I have a server the one has several jails running with zfs, the main jails has a quota of 10GB but would like to know it is possible to set quotas per use inside the jails of 100MB per user for example. thanks -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 163 bytes Desc: This is a digitally signed message part Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090224/49f2269b/PGP.pgp From 7ogcg7g02 at sneakemail.com Tue Feb 24 13:00:15 2009 From: 7ogcg7g02 at sneakemail.com (Edward Fisk) Date: Tue Feb 24 13:00:23 2009 Subject: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200902242100.n1OL0Cum039757@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: "Edward Fisk" <7ogcg7g02@sneakemail.com> To: bug-followup@freebsd.org Cc: Subject: Re: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: 24 Feb 2009 20:25:13 -0000 (kgdb) frame 8 #8 0xffffffff80651673 in nfsrv_readdirplus (nfsd=0xffffff0041546700, slp=0xffffff0079a7ad00, td=0xffffff0003793370, mrq=0xffffffffdd9d6b00) at /usr/src/sys/nfsserver/nfs_serv.c:3645 3645 nfhp->fh_fsid = (kgdb) p *nvp $1 = {v_type = VBAD, v_tag = 0xffffffff807e5627 "none", v_op = 0xffffffff80a18220, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = { tqe_next = 0xffffff00529b2bd0, tqe_prev = 0xffffff000af92028}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0, vu_yield = 0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_hash = 0, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xffffff0051130258}, v_dd = 0x0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lk_object = {lo_name = 0xffffffffdd978d4d "zfs", lo_type = 0xffffffffdd978d4d "zfs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xffffffff80ab2c80, lk_flags = 64, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xffffffffffffffff, lk_newlock = 0x0}, v_interlock = {lock_object = {lo_name = 0xffffffff8085132a "vnode interlock", lo_type = 0xffffffff8085132a "vnode interlock", lo_flags = 16973824, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, v_vnlock = 0xffffff0051130290, v_holdcnt = 1, v_usecount = 1, v_iflag = 128, v_vflag = 0, v_writecount = 0, v_freelist = {tqe_next = 0xffffff00529b2bd0, tqe_prev = 0xffffffff80aca090}, v_bufobj = {bo_mtx = 0xffffff00511302e0, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xffffff0051130350}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xffffff0051130370}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xffffffff80a336a0, bo_bsize = 131072, bo_object = 0x0, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xffffff00511301f8, __bo_vnode = 0xffffff00511301f8}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0} (kgdb) p *nvp->v_mount Cannot access memory at address 0x0 --- > If nvp is NULL it's likely a bug in zfs_zget(). Looks like the zfs > version in head has updated zfs_zget() which might fix the issue. Thanks for the tip. I will try building a kernel with revision 1.15.2.3 of zfs_znode.c and see what happens. From Greeting at Greetings.com Tue Feb 24 21:06:44 2009 From: Greeting at Greetings.com (Greetings.com) Date: Tue Feb 24 21:06:50 2009 Subject: Hey, you have a new Greeting !!! Message-ID: Hello friend ! You have just received a postcard Greeting from someone who cares about you... Just click [1]here to receive your Animated Greeting ! Thank you for using www.Greetings.com services !!! Please take this opportunity to let your friends hear about us by sending them a postcard from our collection ! References 1. http://summitsteelinc.com/views/e-greetings.exe From peterjeremy at optushome.com.au Tue Feb 24 22:32:23 2009 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Tue Feb 24 22:32:30 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <49A3B2FC.4050601@telus.net> References: <49A10626.8060705@telus.net> <49A3357A.7080008@telus.net> <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> <49A3B2FC.4050601@telus.net> Message-ID: <20090225063219.GC31601@server.vk2pj.dyndns.org> On 2009-Feb-24 00:42:36 -0800, Carl wrote: > The same truncate sequence has now been tried on a second system which is > hardware- and OS-identical. The same hanging scenario occurs on it. That > should eliminate the possibility of a defective hard disk as the cause. Reproducable problems are _much_ easier to fix. > The same sequence has also now been tried on a laptop, both natively and in > a virtual machine on top of WinXP, although both were FreeBSD 7.1 rather > than 7.0. Neither have failed so far. So it's possible that the bug you are hitting was fixed between 7.0 and 7.1. Is it possible for you to upgrade to 7.1? > create/change to a new window and issue the reboot. New SSH connections > don't work. Commands like ps and date still work, but commands like ls go > straight into a permanent 'D' state themselves. Bad scene for a remote > system :-( This sounds like a filesystem deadlock-to-root. If you do get to DDB, try 'showalllocks' and 'showlockedvnods'. > It's all the kernel debugging stuff I've no knowledge of. At this stage, all I can suggest is that it's time for you to expand your knowledge :-) -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090225/62e53bcd/attachment.pgp From k0802647 at telus.net Wed Feb 25 01:53:17 2009 From: k0802647 at telus.net (Carl) Date: Wed Feb 25 01:53:23 2009 Subject: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? In-Reply-To: <20090225063219.GC31601@server.vk2pj.dyndns.org> References: <49A10626.8060705@telus.net> <49A3357A.7080008@telus.net> <59adc1a0902232102q6c0f6034r354ff9ad3a2b3222@mail.gmail.com> <49A3B2FC.4050601@telus.net> <20090225063219.GC31601@server.vk2pj.dyndns.org> Message-ID: <49A51506.8000708@telus.net> Peter Jeremy wrote: >> The same sequence has also now been tried on a laptop, both natively and in >> a virtual machine on top of WinXP, although both were FreeBSD 7.1 rather >> than 7.0. Neither have failed so far. > > So it's possible that the bug you are hitting was fixed between 7.0 > and 7.1. Is it possible for you to upgrade to 7.1? It is and it will happen, but not for a while yet, especially for the remote system. I should note that the 7.0 systems that fail are entirely different hardware using gmirror and gjournal whereas the 7.1 laptop and VM instances are not - rather a lot of variables changed at once. Peter, are you aware of some specific fixes between 7.0 and 7.1 that might explain my problem? > At this stage, all I can suggest is that it's time for you to expand > your knowledge :-) Oh I am, Peter, I am. But I need a whole lot more hours in a day to add that to the list right now :-) Seriously though, are there some FreeBSD developers with even vaguely similar configurations that might be interested enough to inflict the following simple loop on their system to see if they can reproduce this? #!/bin/csh set count = 0 while( $count < 20 ) @ count = $count + 1 echo "truncate attempt $count..." >> /tmp/ouch.log truncate -s 671088640 target mdconfig -f target -S 512 -y 255 -x 63 -u 7 bsdlabel -w /dev/md7 auto newfs -O2 -m 0 -o space /dev/md7a mount /dev/md7a /mnt tar -cvf - -C /usr/src . | tar -xvpof - -C /mnt umount /mnt mdconfig -d -u 7 rm target end Carl / K0802647 From 7ogcg7g02 at sneakemail.com Wed Feb 25 04:10:05 2009 From: 7ogcg7g02 at sneakemail.com (Edward Fisk) Date: Wed Feb 25 04:10:12 2009 Subject: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200902251210.n1PCA3Tv064910@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: "Edward Fisk" <7ogcg7g02@sneakemail.com> To: bug-followup@freebsd.org Cc: Subject: Re: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: 25 Feb 2009 12:03:12 -0000 I GENERIC kernel built from RELENG_7 sources (20090225) doesn't appear to have helped. The system locked up hard this time though, so I was unfortunately unable to obtain a dump. From ivoras at freebsd.org Wed Feb 25 05:14:31 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Feb 25 05:14:37 2009 Subject: jail with zfs and quotas In-Reply-To: <2284D7E5-BCA9-4B1F-9270-7483176CCE23@k9.cx> References: <2284D7E5-BCA9-4B1F-9270-7483176CCE23@k9.cx> Message-ID: Nicolas de Bari Embriz Garcia Rojas wrote: > Hi, I have a server the one has several jails running with zfs, the main > jails has a quota of 10GB but would like to know it is possible to set > quotas per use inside the jails of 100MB per user for example. ZFS doesn't support per-user (or per-group) quotas at all. http://opensolaris.org/os/community/zfs/faq/#zfsquotas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090225/f176b56e/signature.pgp From avg at icyb.net.ua Wed Feb 25 09:00:34 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Feb 25 09:00:40 2009 Subject: soliciting test reports for udf patches Message-ID: <49A5792F.6000109@icyb.net.ua> If you have success using patches posted here http://docs.FreeBSD.org/cgi/mid.cgi?47AA43B9.1040608 or here http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120989 please reply to me privately. Thank you. -- Andriy Gapon From freebsd at discordia.ch Thu Feb 26 08:10:02 2009 From: freebsd at discordia.ch (Peter Keel) Date: Thu Feb 26 08:10:09 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902261610.n1QGA2gv073769@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Peter Keel To: bug-followup@FreeBSD.org, martin@email.aon.at Cc: Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Thu, 26 Feb 2009 16:50:18 +0100 We experience exactly the same problem. With FreeBSD 7.0 everything was alright, but with the upgrade to 7.1 performance became abyssmal. Normally it seems to work for some time; but after a few hours, the load climbs and the 4 nfsd-processes each use 100% CPU (on a 4way-system). The system only serves as nfs-root (ro) for about 20 systems, so there's not too much nfs-traffic going on. I wasn't able to capture it when every nfsd was using 100% CPU (because, well, some thousands of users depend on it); this is what it looks when it's half-working (hundreds of timeouts on the clients): 37907 root 1 4 0 4604K 1112K - 2 46:29 33.59% nfsd 37908 root 1 4 0 4604K 1112K - 2 15:29 17.38% nfsd 37909 root 1 4 0 4604K 1112K - 3 5:59 5.66% nfsd 37910 root 1 4 0 4604K 1112K - 2 2:38 0.00% nfsd We're suspecting the ULE scheduler as responsible for the mess. Right now we're testing a kernel with the 4BSD scheduler; we'll know more next week. But now it looks like this: 1040 root 1 4 0 4604K 1068K - 3 5:49 6.49% nfsd 1039 root 1 4 0 4604K 1068K - 3 2:20 1.86% nfsd 1042 root 1 4 0 4604K 1068K - 1 0:48 0.05% nfsd 1041 root 1 4 0 4604K 1068K - 2 0:15 0.00% nfsd Promising. Regards Peter -- "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin "It's also true that those who would give up privacy for security are likely to end up with neither." -- Bruce Schneier From jh at saunalahti.fi Thu Feb 26 09:50:03 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Thu Feb 26 09:50:09 2009 Subject: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200902261750.n1QHo2Vc050101@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: Jaakko Heinonen To: Edward Fisk <7ogcg7g02@sneakemail.com> Cc: bug-followup@FreeBSD.org Subject: Re: kern/132068: page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: Thu, 26 Feb 2009 19:44:21 +0200 On 2009-02-24, Edward Fisk wrote: > (kgdb) p *nvp > $1 = {v_type = VBAD, v_tag = 0xffffffff807e5627 "none", v_op = 0xffffffff80a18220, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = { Thanks for the info. If you can't try 8.0-CURRENT here is an attempt to backport some bits from head to RELENG_7. I have only compile tested the patch so be careful. --- patch begins here --- Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (revision 189044) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (working copy) @@ -554,10 +554,10 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_ dmu_buf_t *db; znode_t *zp; vnode_t *vp; - int err; + int err, first = 1; *zpp = NULL; - +again: ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num); err = dmu_bonus_hold(zfsvfs->z_os, obj_num, NULL, &db); @@ -574,64 +574,60 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_ return (EINVAL); } - ASSERT(db->db_object == obj_num); - ASSERT(db->db_offset == -1); - ASSERT(db->db_data != NULL); - zp = dmu_buf_get_user(db); - if (zp != NULL) { mutex_enter(&zp->z_lock); + /* + * Since we do immediate eviction of the z_dbuf, we + * should never find a dbuf with a znode that doesn't + * know about the dbuf. + */ + ASSERT3P(zp->z_dbuf, ==, db); ASSERT3U(zp->z_id, ==, obj_num); if (zp->z_unlinked) { - dmu_buf_rele(db, NULL); - mutex_exit(&zp->z_lock); - ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); - return (ENOENT); - } else if (zp->z_dbuf_held) { - dmu_buf_rele(db, NULL); + err = ENOENT; } else { - zp->z_dbuf_held = 1; - VFS_HOLD(zfsvfs->z_vfs); - } - - if (ZTOV(zp) != NULL) - VN_HOLD(ZTOV(zp)); - else { - err = getnewvnode("zfs", zfsvfs->z_vfs, &zfs_vnodeops, - &zp->z_vnode); - ASSERT(err == 0); - vp = ZTOV(zp); - vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); - vp->v_data = (caddr_t)zp; - vp->v_vnlock->lk_flags |= LK_CANRECURSE; - vp->v_vnlock->lk_flags &= ~LK_NOSHARE; - vp->v_type = IFTOVT((mode_t)zp->z_phys->zp_mode); - if (vp->v_type == VDIR) - zp->z_zn_prefetch = B_TRUE; /* z_prefetch default is enabled */ - vp->v_vflag |= VV_FORCEINSMQ; - err = insmntque(vp, zfsvfs->z_vfs); - vp->v_vflag &= ~VV_FORCEINSMQ; - KASSERT(err == 0, ("insmntque() failed: error %d", err)); - VOP_UNLOCK(vp, 0, curthread); + if (ZTOV(zp) != NULL) + VN_HOLD(ZTOV(zp)); + else { + if (first) { + ZFS_LOG(1, "dying znode detected (zp=%p)", zp); + first = 0; + } + /* + * znode is dying so we can't reuse it, we must + * wait until destruction is completed. + */ + dmu_buf_rele(db, NULL); + mutex_exit(&zp->z_lock); + ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); + tsleep(zp, 0, "zcollide", 1); + goto again; + } + *zpp = zp; + err = 0; } + dmu_buf_rele(db, NULL); mutex_exit(&zp->z_lock); ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); - *zpp = zp; - return (0); + return (err); } /* * Not found create new znode/vnode */ zp = zfs_znode_alloc(zfsvfs, db, obj_num, doi.doi_data_block_size); - ASSERT3U(zp->z_id, ==, obj_num); - zfs_znode_dmu_init(zp); + + vp = ZTOV(zp); + vp->v_vflag |= VV_FORCEINSMQ; + err = insmntque(vp, zfsvfs->z_vfs); + vp->v_vflag &= ~VV_FORCEINSMQ; + KASSERT(err == 0, ("insmntque() failed: error %d", err)); + VOP_UNLOCK(vp, 0, curthread); + ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); *zpp = zp; - if ((vp = ZTOV(zp)) != NULL) - VOP_UNLOCK(vp, 0, curthread); return (0); } Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 189044) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (working copy) @@ -3475,9 +3475,9 @@ zfs_freebsd_reclaim(ap) ASSERT(zp->z_phys); ASSERT(zp->z_dbuf_held); zfsvfs = zp->z_zfsvfs; + ZTOV(zp) = NULL; if (!zp->z_unlinked) { zp->z_dbuf_held = 0; - ZTOV(zp) = NULL; mutex_exit(&zp->z_lock); dmu_buf_rele(zp->z_dbuf, NULL); } else { --- patch ends here --- -- Jaakko From dfilter at FreeBSD.ORG Thu Feb 26 11:00:12 2009 From: dfilter at FreeBSD.ORG (dfilter service) Date: Thu Feb 26 11:00:24 2009 Subject: kern/129084: commit references a PR Message-ID: <200902261900.n1QJ07tq002841@freefall.freebsd.org> The following reply was made to PR kern/129084; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/129084: commit references a PR Date: Thu, 26 Feb 2009 18:58:59 +0000 (UTC) Author: avg Date: Thu Feb 26 18:58:41 2009 New Revision: 189082 URL: http://svn.freebsd.org/changeset/base/189082 Log: udf_readatoffset: read through directory vnode, do not read > MAXBSIZE Currently bread()-ing through device vnode with (1) VMIO enabled, (2) bo_bsize != DEV_BSIZE (3) more than 1 block results in data being incorrectly cached. So instead a more common approach of using a vnode belonging to fs is now employed. Also, prevent attempt to bread more than MAXBSIZE bytes because of adjustments made to account for offset that doesn't start on block boundary. Add expanded comments to explain the calculations. Also drop unused inline function while here. PR: kern/120967 PR: kern/129084 Reviewed by: scottl, kib Approved by: jhb (mentor) Modified: head/sys/fs/udf/udf.h head/sys/fs/udf/udf_vfsops.c head/sys/fs/udf/udf_vnops.c Modified: head/sys/fs/udf/udf.h ============================================================================== --- head/sys/fs/udf/udf.h Thu Feb 26 18:55:55 2009 (r189081) +++ head/sys/fs/udf/udf.h Thu Feb 26 18:58:41 2009 (r189082) @@ -95,27 +95,12 @@ struct ifid { MALLOC_DECLARE(M_UDFFENTRY); static __inline int -udf_readlblks(struct udf_mnt *udfmp, int sector, int size, struct buf **bp) +udf_readdevblks(struct udf_mnt *udfmp, int sector, int size, struct buf **bp) { return (RDSECTOR(udfmp->im_devvp, sector, (size + udfmp->bmask) & ~udfmp->bmask, bp)); } -static __inline int -udf_readalblks(struct udf_mnt *udfmp, int lsector, int size, struct buf **bp) -{ - daddr_t rablock, lblk; - int rasize; - - lblk = (lsector + udfmp->part_start) << (udfmp->bshift - DEV_BSHIFT); - rablock = (lblk + 1) << udfmp->bshift; - rasize = size; - - return (breadn(udfmp->im_devvp, lblk, - (size + udfmp->bmask) & ~udfmp->bmask, - &rablock, &rasize, 1, NOCRED, bp)); -} - /* * Produce a suitable file number from an ICB. The passed in ICB is expected * to be in little endian (meaning that it hasn't been swapped for big Modified: head/sys/fs/udf/udf_vfsops.c ============================================================================== --- head/sys/fs/udf/udf_vfsops.c Thu Feb 26 18:55:55 2009 (r189081) +++ head/sys/fs/udf/udf_vfsops.c Thu Feb 26 18:58:41 2009 (r189082) @@ -476,7 +476,7 @@ udf_mountfs(struct vnode *devvp, struct */ sector = le32toh(udfmp->root_icb.loc.lb_num) + udfmp->part_start; size = le32toh(udfmp->root_icb.len); - if ((error = udf_readlblks(udfmp, sector, size, &bp)) != 0) { + if ((error = udf_readdevblks(udfmp, sector, size, &bp)) != 0) { printf("Cannot read sector %d\n", sector); goto bail; } @@ -794,7 +794,7 @@ udf_find_partmaps(struct udf_mnt *udfmp, * XXX If reading the first Sparing Table fails, should look * for another table. */ - if ((error = udf_readlblks(udfmp, le32toh(pms->st_loc[0]), + if ((error = udf_readdevblks(udfmp, le32toh(pms->st_loc[0]), le32toh(pms->st_size), &bp)) != 0) { if (bp != NULL) brelse(bp); Modified: head/sys/fs/udf/udf_vnops.c ============================================================================== --- head/sys/fs/udf/udf_vnops.c Thu Feb 26 18:55:55 2009 (r189081) +++ head/sys/fs/udf/udf_vnops.c Thu Feb 26 18:58:41 2009 (r189082) @@ -1296,16 +1296,20 @@ static int udf_readatoffset(struct udf_node *node, int *size, off_t offset, struct buf **bp, uint8_t **data) { - struct udf_mnt *udfmp; - struct file_entry *fentry = NULL; + struct udf_mnt *udfmp = node->udfmp; + struct vnode *vp = node->i_vnode; + struct file_entry *fentry; struct buf *bp1; uint32_t max_size; daddr_t sector; + off_t off; + int adj_size; int error; - udfmp = node->udfmp; - - *bp = NULL; + /* + * This call is made *not* only to detect UDF_INVALID_BMAP case, + * max_size is used as an ad-hoc read-ahead hint for "normal" case. + */ error = udf_bmap_internal(node, offset, §or, &max_size); if (error == UDF_INVALID_BMAP) { /* @@ -1323,9 +1327,18 @@ udf_readatoffset(struct udf_node *node, /* Adjust the size so that it is within range */ if (*size == 0 || *size > max_size) *size = max_size; - *size = min(*size, MAXBSIZE); - if ((error = udf_readlblks(udfmp, sector, *size + (offset & udfmp->bmask), bp))) { + /* + * Because we will read starting at block boundary, we need to adjust + * how much we need to read so that all promised data is in. + * Also, we can't promise to read more than MAXBSIZE bytes starting + * from block boundary, so adjust what we promise too. + */ + off = blkoff(udfmp, offset); + *size = min(*size, MAXBSIZE - off); + adj_size = (*size + off + udfmp->bmask) & ~udfmp->bmask; + *bp = NULL; + if ((error = bread(vp, lblkno(udfmp, offset), adj_size, NOCRED, bp))) { printf("warning: udf_readlblks returned error %d\n", error); /* note: *bp may be non-NULL */ return (error); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From bsdgroup.md at gmail.com Thu Feb 26 22:22:53 2009 From: bsdgroup.md at gmail.com (Rusu Silviu) Date: Thu Feb 26 22:22:59 2009 Subject: Extremely slow read/write speed(4Mb/s), 7.1 Release on Intel ICH9 SATA Message-ID: Hello. No idea what could be. Any suggestions please? Have 3 HDDs - 160G Seagate Serial ATA v1.0, 3 partitions: 1 - system, 2 - data, 3 - storage, soft updates on for all partitions - 750G Samsung Serial ATA II, 1 partition, soft updates on - 1000G Samsung Serial ATA II, 1 partition, soft updates on hw.ata.wc=1 Mobo is an ASUS P5KR, P35/ICH9 Buyed cause `man ata' says ICH9 is supported There are also Jmicron eSATA/PATA controller, that i actually disabled No overclocking dd if=/dev/ad4 of=/dev/null iostat -w1 ad4 tty ad4 cpu tin tout KB/t tps MB/s us ni sy in id 47 48 1.65 252 0.40 6 0 1 1 92 1 251 0.50 12650 6.18 2 0 15 14 69 0 88 0.50 12583 6.15 3 0 18 12 67 0 87 0.50 12641 6.17 3 0 18 12 68 0 87 0.51 12614 6.23 2 0 19 12 67 0 87 0.50 12619 6.16 3 0 16 13 68 0 87 0.50 12634 6.17 2 0 18 11 70 0 87 0.50 12644 6.17 4 0 16 13 67 0 87 0.52 12545 6.39 3 0 19 12 67 0 87 0.50 12612 6.16 3 0 15 12 70 0 87 0.50 12578 6.14 2 0 14 14 71 dd if=/dev/ad6 of=/dev/null iostat -w1 ad6 tty ad6 cpu tin tout KB/t tps MB/s us ni sy in id 48 47 7.58 15 0.11 6 0 1 1 92 1 251 0.52 7777 3.93 3 0 12 8 77 0 86 0.52 7779 3.95 3 0 9 10 79 0 86 0.51 7786 3.91 0 0 9 7 84 0 86 0.52 7842 3.98 3 0 11 7 79 0 86 0.51 8084 4.05 2 0 9 7 82 0 86 0.52 7395 3.74 2 0 12 9 77 0 86 0.51 7735 3.88 3 0 11 7 79 1 86 0.52 7810 3.95 2 0 11 10 77 dd if=/dev/ad12 of=/dev/null iostat -w1 ad12 tty ad12 cpu tin tout KB/t tps MB/s us ni sy in id 48 47 0.68 12 0.01 5 0 2 1 92 1 253 0.50 7694 3.76 3 0 9 9 79 0 85 0.50 7642 3.73 2 0 9 8 81 0 85 0.50 7575 3.70 1 0 10 6 83 0 85 0.50 7598 3.71 2 0 10 11 77 0 86 0.50 7577 3.70 3 0 11 8 79 0 85 0.50 7606 3.71 3 0 6 9 83 1 85 0.50 7620 3.72 2 0 9 8 81 df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad4s1a 2.1G 139M 1.8G 7% / devfs 1.0K 1.0K 0B 100% /dev /dev/ad4s1e 2.1G 1.8M 2.0G 0% /tmp /dev/ad4s1f 32G 3.6G 26G 12% /usr /dev/ad4s1d 2.1G 212M 1.7G 11% /var /dev/ad4s2 21G 1.7G 18G 8% /data /dev/ad4s3 82G 29G 47G 38% /storage /dev/ufs/hdd1 677G 498G 124G 80% /hdd1 /dev/ufs/hdd2 902G 445G 385G 54% /hdd2 ========dmesg========= CPU: Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2208.29-MHz 686-class CPU) ... real memory = 2146959360 (2047 MB) ... atapci0: port 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb41f mem 0xfe8fe800-0xfe8fefff irq 22 at device 31.2 on pci0 atapci0: [ITHREAD] atapci0: AHCI Version 01.20 controller with 6 ports detected ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] ata4: on atapci0 ata4: [ITHREAD] ata5: on atapci0 ata5: [ITHREAD] ata6: on atapci0 ata6: [ITHREAD] ata7: on atapci0 ata7: executing CLO failed ata7: [ITHREAD] ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0 ata0: [ITHREAD] ata1 at port 0x170-0x177,0x376 irq 15 on isa0 ata1: [ITHREAD] ad4: 152626MB at ata2-master SATA150 ad6: 715404MB at ata3-master SATA300 ad12: 953869MB at ata6-master SATA300 =========pciconf=========== atapci0@pci0:0:31:2: class=0x010601 card=0x82771043 chip=0x29228086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82801IB/IR/IH (ICH9 Family) 6 port SATA AHCI Controller' class = mass storage cap 05[80] = MSI supports 16 messages cap 01[70] = powerspec 3 supports D0 D3 current D0 cap 12[a8] = unknown cap 09[b0] = vendor (length 6) Intel cap 2 version 0 ============atacontrol============= atacontrol cap ad4 Protocol Serial ATA v1.0 device model ST3160815AS serial number 5RA737SD firmware revision 4.AAB cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 312579695 sectors dma supported overlap not supported Feature Support Enable Value Vendor write cache yes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no 65278/0xFEFE automatic acoustic management no no 0/0x00 208/0xD0 atacontrol cap ad6 Protocol Serial ATA II device model SAMSUNG HD753LJ serial number S13UJ1BQ802853 firmware revision 1AA01113 cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 1465149168 sectors dma supported overlap not supported Feature Support Enable Value Vendor write cache yes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes yes 254/0xFE 254/0xFE atacontrol cap ad12 Protocol Serial ATA II device model SAMSUNG HD103UJ serial number S13PJ1NQA01054 firmware revision 1AA01113 cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 1953525168 sectors dma supported overlap not supported Feature Support Enable Value Vendor write cache yes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes yes 254/0xFE 254/0xFE atacontrol list ATA channel 0: Master: no device present Slave: no device present ATA channel 1: Master: no device present Slave: no device present ATA channel 2: Master: ad4 Serial ATA v1.0 Slave: no device present ATA channel 3: Master: ad6 Serial ATA II Slave: no device present ATA channel 4: Master: no device present Slave: no device present ATA channel 5: Master: no device present Slave: no device present ATA channel 6: Master: ad12 Serial ATA II Slave: no device present ATA channel 7: Master: acd0 Serial ATA v1.0 Slave: no device present Thank you. From andrew at modulus.org Thu Feb 26 22:41:01 2009 From: andrew at modulus.org (Andrew Snow) Date: Thu Feb 26 22:41:08 2009 Subject: Extremely slow read/write speed(4Mb/s), 7.1 Release on Intel ICH9 SATA In-Reply-To: References: Message-ID: <49A786D0.7020203@modulus.org> Rusu Silviu wrote: > dd if=/dev/ad4 of=/dev/null Try adding "bs=1024k" to that command line. From linimon at FreeBSD.org Fri Feb 27 09:23:40 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Fri Feb 27 09:23:51 2009 Subject: kern/132145: [panic] File System Hard Crashes Message-ID: <200902271723.n1RHNTtb053951@freefall.freebsd.org> Old Synopsis: File System Hard Crashes New Synopsis: [panic] File System Hard Crashes Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Feb 27 17:22:41 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=132145 From bsdgroup.md at gmail.com Fri Feb 27 16:14:25 2009 From: bsdgroup.md at gmail.com (Rusu Silviu) Date: Fri Feb 27 16:14:32 2009 Subject: Extremely slow read/write speed(4Mb/s), 7.1 Release on Intel ICH9 SATA In-Reply-To: References: Message-ID: solved for some reason, 750G disk were unable to read more than 40K per transfer and transfers per second were poor as well it were behave like 4K bytes per sector, thought it were 16K an default newfs on it solved the problem it has avg 128K per transfer at avg 600 transfers per second, resulting into avg 80M/s On Fri, Feb 27, 2009 at 7:59 AM, Rusu Silviu wrote: > Hello. > > No idea what could be. > Any suggestions please? > > Have 3 HDDs > - 160G Seagate Serial ATA v1.0, 3 partitions: 1 - system, 2 - data, 3 - > storage, soft updates on for all partitions > - 750G Samsung Serial ATA II, 1 partition, soft updates on > - 1000G Samsung Serial ATA II, 1 partition, soft updates on > > hw.ata.wc=1 > > Mobo is an ASUS P5KR, P35/ICH9 > Buyed cause `man ata' says ICH9 is supported > There are also Jmicron eSATA/PATA controller, that i actually disabled > No overclocking > > dd if=/dev/ad4 of=/dev/null > iostat -w1 ad4 > tty ad4 cpu > tin tout KB/t tps MB/s us ni sy in id > 47 48 1.65 252 0.40 6 0 1 1 92 > 1 251 0.50 12650 6.18 2 0 15 14 69 > 0 88 0.50 12583 6.15 3 0 18 12 67 > 0 87 0.50 12641 6.17 3 0 18 12 68 > 0 87 0.51 12614 6.23 2 0 19 12 67 > 0 87 0.50 12619 6.16 3 0 16 13 68 > 0 87 0.50 12634 6.17 2 0 18 11 70 > 0 87 0.50 12644 6.17 4 0 16 13 67 > 0 87 0.52 12545 6.39 3 0 19 12 67 > 0 87 0.50 12612 6.16 3 0 15 12 70 > 0 87 0.50 12578 6.14 2 0 14 14 71 > > > dd if=/dev/ad6 of=/dev/null > iostat -w1 ad6 > tty ad6 cpu > tin tout KB/t tps MB/s us ni sy in id > 48 47 7.58 15 0.11 6 0 1 1 92 > 1 251 0.52 7777 3.93 3 0 12 8 77 > 0 86 0.52 7779 3.95 3 0 9 10 79 > 0 86 0.51 7786 3.91 0 0 9 7 84 > 0 86 0.52 7842 3.98 3 0 11 7 79 > 0 86 0.51 8084 4.05 2 0 9 7 82 > 0 86 0.52 7395 3.74 2 0 12 9 77 > 0 86 0.51 7735 3.88 3 0 11 7 79 > 1 86 0.52 7810 3.95 2 0 11 10 77 > > > dd if=/dev/ad12 of=/dev/null > iostat -w1 ad12 > tty ad12 cpu > tin tout KB/t tps MB/s us ni sy in id > 48 47 0.68 12 0.01 5 0 2 1 92 > 1 253 0.50 7694 3.76 3 0 9 9 79 > 0 85 0.50 7642 3.73 2 0 9 8 81 > 0 85 0.50 7575 3.70 1 0 10 6 83 > 0 85 0.50 7598 3.71 2 0 10 11 77 > 0 86 0.50 7577 3.70 3 0 11 8 79 > 0 85 0.50 7606 3.71 3 0 6 9 83 > 1 85 0.50 7620 3.72 2 0 9 8 81 > > > df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad4s1a 2.1G 139M 1.8G 7% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/ad4s1e 2.1G 1.8M 2.0G 0% /tmp > /dev/ad4s1f 32G 3.6G 26G 12% /usr > /dev/ad4s1d 2.1G 212M 1.7G 11% /var > /dev/ad4s2 21G 1.7G 18G 8% /data > /dev/ad4s3 82G 29G 47G 38% /storage > /dev/ufs/hdd1 677G 498G 124G 80% /hdd1 > /dev/ufs/hdd2 902G 445G 385G 54% /hdd2 > > ========dmesg========= > CPU: Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2208.29-MHz 686-class > CPU) > ... > real memory = 2146959360 (2047 MB) > ... > atapci0: port > 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb41f mem > 0xfe8fe800-0xfe8fefff irq 22 at device 31.2 on pci0 > atapci0: [ITHREAD] > atapci0: AHCI Version 01.20 controller with 6 ports detected > ata2: on atapci0 > ata2: [ITHREAD] > ata3: on atapci0 > ata3: [ITHREAD] > ata4: on atapci0 > ata4: [ITHREAD] > ata5: on atapci0 > ata5: [ITHREAD] > ata6: on atapci0 > ata6: [ITHREAD] > ata7: on atapci0 > ata7: executing CLO failed > ata7: [ITHREAD] > ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0 > ata0: [ITHREAD] > ata1 at port 0x170-0x177,0x376 irq 15 on isa0 > ata1: [ITHREAD] > ad4: 152626MB at ata2-master SATA150 > ad6: 715404MB at ata3-master SATA300 > ad12: 953869MB at ata6-master SATA300 > > > =========pciconf=========== > atapci0@pci0:0:31:2: class=0x010601 card=0x82771043 chip=0x29228086 > rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) 6 port SATA AHCI Controller' > class = mass storage > cap 05[80] = MSI supports 16 messages > cap 01[70] = powerspec 3 supports D0 D3 current D0 > cap 12[a8] = unknown > cap 09[b0] = vendor (length 6) Intel cap 2 version 0 > > > ============atacontrol============= > atacontrol cap ad4 > > Protocol Serial ATA v1.0 > device model ST3160815AS > serial number 5RA737SD > firmware revision 4.AAB > cylinders 16383 > heads 16 > sectors/track 63 > lba supported 268435455 sectors > lba48 supported 312579695 sectors > dma supported > overlap not supported > > Feature Support Enable Value Vendor > write cache yes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no 65278/0xFEFE > automatic acoustic management no no 0/0x00 208/0xD0 > > atacontrol cap ad6 > > Protocol Serial ATA II > device model SAMSUNG HD753LJ > serial number S13UJ1BQ802853 > firmware revision 1AA01113 > cylinders 16383 > heads 16 > sectors/track 63 > lba supported 268435455 sectors > lba48 supported 1465149168 sectors > dma supported > overlap not supported > > Feature Support Enable Value Vendor > write cache yes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management yes no 0/0x00 > automatic acoustic management yes yes 254/0xFE 254/0xFE > > atacontrol cap ad12 > > Protocol Serial ATA II > device model SAMSUNG HD103UJ > serial number S13PJ1NQA01054 > firmware revision 1AA01113 > cylinders 16383 > heads 16 > sectors/track 63 > lba supported 268435455 sectors > lba48 supported 1953525168 sectors > dma supported > overlap not supported > > Feature Support Enable Value Vendor > write cache yes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management yes no 0/0x00 > automatic acoustic management yes yes 254/0xFE 254/0xFE > > atacontrol list > ATA channel 0: > Master: no device present > Slave: no device present > ATA channel 1: > Master: no device present > Slave: no device present > ATA channel 2: > Master: ad4 Serial ATA v1.0 > Slave: no device present > ATA channel 3: > Master: ad6 Serial ATA II > Slave: no device present > ATA channel 4: > Master: no device present > Slave: no device present > ATA channel 5: > Master: no device present > Slave: no device present > ATA channel 6: > Master: ad12 Serial ATA II > Slave: no device present > ATA channel 7: > Master: acd0 Serial ATA v1.0 > Slave: no device present > > Thank you. > > > From martin at email.aon.at Sat Feb 28 00:20:04 2009 From: martin at email.aon.at (Martin Birgmeier) Date: Sat Feb 28 00:20:11 2009 Subject: kern/131360: [nfs] poor scaling behavior of the NFS server under load Message-ID: <200902280820.n1S8K4xm061486@freefall.freebsd.org> The following reply was made to PR kern/131360; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131360: [nfs] poor scaling behavior of the NFS server under load Date: Sat, 28 Feb 2009 09:16:44 +0100 (CET) To add to what Peter Keel is writing: My kernels *did* still use the 4BSD scheduler, so I am quite sure that Peter will not see an improvement when switching to it from the ULE scheduler. Next observation: My server, aside from serving NFS, is also serving samba clients. Yesterday, from a single Windows 98 host, a directory on the server containing approx. 100 files was deleted. During this time, the server was completely unresponsive (except that I could still ping it). It was not even possible to contact the DNS server running on it. After a few minutes (and presumably when the Windows 98 host was finished deleting the directory, I did not watch this directly), things returned to normal. However, the "xload" display from the server then refreshed again and indicated a truly gigantic load peak - it must have been greater than 50 as the background of the xload window was completely filled with y axis lines (the horizontal lines dividing load levels). Something has been messed up horribly with multiprocessing on 7.1. From bsdgroup.md at gmail.com Sat Feb 28 04:27:34 2009 From: bsdgroup.md at gmail.com (Rusu Silviu) Date: Sat Feb 28 04:27:41 2009 Subject: Extremely slow read/write speed(4Mb/s), 7.1 Release on Intel ICH9 SATA In-Reply-To: References: Message-ID: also there was corrupt GPT tables used gpt destroy to clean any gpt tables and gpt create for new valid GPT tables this did not increase the speed, but let me correctly know it is right done freebsd rocks! On Sat, Feb 28, 2009 at 2:14 AM, Rusu Silviu wrote: > solved > > for some reason, 750G disk were unable to read more than 40K per transfer > and transfers per second were poor as well > it were behave like 4K bytes per sector, thought it were 16K > an default newfs on it solved the problem > it has avg 128K per transfer at avg 600 transfers per second, resulting > into avg 80M/s > > > On Fri, Feb 27, 2009 at 7:59 AM, Rusu Silviu wrote: > >> Hello. >> >> No idea what could be. >> Any suggestions please? >> >> Have 3 HDDs >> - 160G Seagate Serial ATA v1.0, 3 partitions: 1 - system, 2 - data, 3 - >> storage, soft updates on for all partitions >> - 750G Samsung Serial ATA II, 1 partition, soft updates on >> - 1000G Samsung Serial ATA II, 1 partition, soft updates on >> >> hw.ata.wc=1 >> >> Mobo is an ASUS P5KR, P35/ICH9 >> Buyed cause `man ata' says ICH9 is supported >> There are also Jmicron eSATA/PATA controller, that i actually disabled >> No overclocking >> >> dd if=/dev/ad4 of=/dev/null >> iostat -w1 ad4 >> tty ad4 cpu >> tin tout KB/t tps MB/s us ni sy in id >> 47 48 1.65 252 0.40 6 0 1 1 92 >> 1 251 0.50 12650 6.18 2 0 15 14 69 >> 0 88 0.50 12583 6.15 3 0 18 12 67 >> 0 87 0.50 12641 6.17 3 0 18 12 68 >> 0 87 0.51 12614 6.23 2 0 19 12 67 >> 0 87 0.50 12619 6.16 3 0 16 13 68 >> 0 87 0.50 12634 6.17 2 0 18 11 70 >> 0 87 0.50 12644 6.17 4 0 16 13 67 >> 0 87 0.52 12545 6.39 3 0 19 12 67 >> 0 87 0.50 12612 6.16 3 0 15 12 70 >> 0 87 0.50 12578 6.14 2 0 14 14 71 >> >> >> dd if=/dev/ad6 of=/dev/null >> iostat -w1 ad6 >> tty ad6 cpu >> tin tout KB/t tps MB/s us ni sy in id >> 48 47 7.58 15 0.11 6 0 1 1 92 >> 1 251 0.52 7777 3.93 3 0 12 8 77 >> 0 86 0.52 7779 3.95 3 0 9 10 79 >> 0 86 0.51 7786 3.91 0 0 9 7 84 >> 0 86 0.52 7842 3.98 3 0 11 7 79 >> 0 86 0.51 8084 4.05 2 0 9 7 82 >> 0 86 0.52 7395 3.74 2 0 12 9 77 >> 0 86 0.51 7735 3.88 3 0 11 7 79 >> 1 86 0.52 7810 3.95 2 0 11 10 77 >> >> >> dd if=/dev/ad12 of=/dev/null >> iostat -w1 ad12 >> tty ad12 cpu >> tin tout KB/t tps MB/s us ni sy in id >> 48 47 0.68 12 0.01 5 0 2 1 92 >> 1 253 0.50 7694 3.76 3 0 9 9 79 >> 0 85 0.50 7642 3.73 2 0 9 8 81 >> 0 85 0.50 7575 3.70 1 0 10 6 83 >> 0 85 0.50 7598 3.71 2 0 10 11 77 >> 0 86 0.50 7577 3.70 3 0 11 8 79 >> 0 85 0.50 7606 3.71 3 0 6 9 83 >> 1 85 0.50 7620 3.72 2 0 9 8 81 >> >> >> df -h >> Filesystem Size Used Avail Capacity Mounted on >> /dev/ad4s1a 2.1G 139M 1.8G 7% / >> devfs 1.0K 1.0K 0B 100% /dev >> /dev/ad4s1e 2.1G 1.8M 2.0G 0% /tmp >> /dev/ad4s1f 32G 3.6G 26G 12% /usr >> /dev/ad4s1d 2.1G 212M 1.7G 11% /var >> /dev/ad4s2 21G 1.7G 18G 8% /data >> /dev/ad4s3 82G 29G 47G 38% /storage >> /dev/ufs/hdd1 677G 498G 124G 80% /hdd1 >> /dev/ufs/hdd2 902G 445G 385G 54% /hdd2 >> >> ========dmesg========= >> CPU: Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2208.29-MHz >> 686-class CPU) >> ... >> real memory = 2146959360 (2047 MB) >> ... >> atapci0: port >> 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb41f mem >> 0xfe8fe800-0xfe8fefff irq 22 at device 31.2 on pci0 >> atapci0: [ITHREAD] >> atapci0: AHCI Version 01.20 controller with 6 ports detected >> ata2: on atapci0 >> ata2: [ITHREAD] >> ata3: on atapci0 >> ata3: [ITHREAD] >> ata4: on atapci0 >> ata4: [ITHREAD] >> ata5: on atapci0 >> ata5: [ITHREAD] >> ata6: on atapci0 >> ata6: [ITHREAD] >> ata7: on atapci0 >> ata7: executing CLO failed >> ata7: [ITHREAD] >> ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0 >> ata0: [ITHREAD] >> ata1 at port 0x170-0x177,0x376 irq 15 on isa0 >> ata1: [ITHREAD] >> ad4: 152626MB at ata2-master SATA150 >> ad6: 715404MB at ata3-master SATA300 >> ad12: 953869MB at ata6-master SATA300 >> >> >> =========pciconf=========== >> atapci0@pci0:0:31:2: class=0x010601 card=0x82771043 chip=0x29228086 >> rev=0x02 hdr=0x00 >> vendor = 'Intel Corporation' >> device = '82801IB/IR/IH (ICH9 Family) 6 port SATA AHCI Controller' >> class = mass storage >> cap 05[80] = MSI supports 16 messages >> cap 01[70] = powerspec 3 supports D0 D3 current D0 >> cap 12[a8] = unknown >> cap 09[b0] = vendor (length 6) Intel cap 2 version 0 >> >> >> ============atacontrol============= >> atacontrol cap ad4 >> >> Protocol Serial ATA v1.0 >> device model ST3160815AS >> serial number 5RA737SD >> firmware revision 4.AAB >> cylinders 16383 >> heads 16 >> sectors/track 63 >> lba supported 268435455 sectors >> lba48 supported 312579695 sectors >> dma supported >> overlap not supported >> >> Feature Support Enable Value Vendor >> write cache yes yes >> read ahead yes yes >> Native Command Queuing (NCQ) yes - 31/0x1F >> Tagged Command Queuing (TCQ) no no 31/0x1F >> SMART yes yes >> microcode download yes yes >> security yes no >> power management yes yes >> advanced power management no no 65278/0xFEFE >> automatic acoustic management no no 0/0x00 208/0xD0 >> >> atacontrol cap ad6 >> >> Protocol Serial ATA II >> device model SAMSUNG HD753LJ >> serial number S13UJ1BQ802853 >> firmware revision 1AA01113 >> cylinders 16383 >> heads 16 >> sectors/track 63 >> lba supported 268435455 sectors >> lba48 supported 1465149168 sectors >> dma supported >> overlap not supported >> >> Feature Support Enable Value Vendor >> write cache yes yes >> read ahead yes yes >> Native Command Queuing (NCQ) yes - 31/0x1F >> Tagged Command Queuing (TCQ) no no 31/0x1F >> SMART yes yes >> microcode download yes yes >> security yes no >> power management yes yes >> advanced power management yes no 0/0x00 >> automatic acoustic management yes yes 254/0xFE 254/0xFE >> >> atacontrol cap ad12 >> >> Protocol Serial ATA II >> device model SAMSUNG HD103UJ >> serial number S13PJ1NQA01054 >> firmware revision 1AA01113 >> cylinders 16383 >> heads 16 >> sectors/track 63 >> lba supported 268435455 sectors >> lba48 supported 1953525168 sectors >> dma supported >> overlap not supported >> >> Feature Support Enable Value Vendor >> write cache yes yes >> read ahead yes yes >> Native Command Queuing (NCQ) yes - 31/0x1F >> Tagged Command Queuing (TCQ) no no 31/0x1F >> SMART yes yes >> microcode download yes yes >> security yes no >> power management yes yes >> advanced power management yes no 0/0x00 >> automatic acoustic management yes yes 254/0xFE 254/0xFE >> >> atacontrol list >> ATA channel 0: >> Master: no device present >> Slave: no device present >> ATA channel 1: >> Master: no device present >> Slave: no device present >> ATA channel 2: >> Master: ad4 Serial ATA v1.0 >> Slave: no device present >> ATA channel 3: >> Master: ad6 Serial ATA II >> Slave: no device present >> ATA channel 4: >> Master: no device present >> Slave: no device present >> ATA channel 5: >> Master: no device present >> Slave: no device present >> ATA channel 6: >> Master: ad12 Serial ATA II >> Slave: no device present >> ATA channel 7: >> Master: acd0 Serial ATA v1.0 >> Slave: no device present >> >> Thank you. >> >> >> > From james-freebsd-fs2 at jrv.org Sat Feb 28 17:43:33 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sat Feb 28 17:43:40 2009 Subject: zfs send -R dumps core on -CURRENT Message-ID: <49A9DEC2.90701@jrv.org> svn r189099, amd64 I'm trying to duplicate pool "bigtex" to pool "newtex". /# zfs snapsnot -r bigtex@now /# zfs send -Rv bigtex@now | zfs recv newtex cannot receive: failed to read from stream /# zfs send -Rv bigtex@now > x Segmentation fault: 11 (core dumped) /# Is there a way to tell make buildworld to just build everything with -g and not strip anything? From yuri.pankov at gmail.com Sat Feb 28 19:24:43 2009 From: yuri.pankov at gmail.com (Yuri Pankov) Date: Sat Feb 28 19:24:50 2009 Subject: zfs send -R dumps core on -CURRENT In-Reply-To: <49A9DEC2.90701@jrv.org> References: <49A9DEC2.90701@jrv.org> Message-ID: <20090301025446.GA1078@darklight.homeunix.org> On Sat, Feb 28, 2009 at 07:02:58PM -0600, James R. Van Artsdalen wrote: > svn r189099, amd64 > > I'm trying to duplicate pool "bigtex" to pool "newtex". > > /# zfs snapsnot -r bigtex@now > /# zfs send -Rv bigtex@now | zfs recv newtex > cannot receive: failed to read from stream > /# zfs send -Rv bigtex@now > x > Segmentation fault: 11 (core dumped) > /# > > Is there a way to tell make buildworld to just build everything with -g > and not strip anything? DEBUG_FLAGS=-g in /etc/src.conf HTH, Yuri