From bugmaster at FreeBSD.org Mon Jun 1 11:06:53 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jun 1 11:07:59 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200906011106.n51B6no8021049@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/135039 fs [zfs] mkstemp() fails over NFS when server uses ZFS (7 f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 137 problems total. From samankaya at netscape.net Mon Jun 1 12:16:20 2009 From: samankaya at netscape.net (samankaya@netscape.net) Date: Mon Jun 1 12:16:28 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? Message-ID: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> Hello all, This is my first post and first time on a mailing list for a very long time :) I am just about to switch my Debian Linux install over to a dual boot between Solaris Express Community Edition (SXCE) and FreeBSD 7.2. Currently my setup is as follows: hda1 - ext3 hdb1 - ext3 Linux root / hdb2 - ext3 Linux home /home Because of the lack of space in my network and the fact that the master IDE drive is 250GB and the slave IDE drive is 160GB I wanted to wipe / on hdb1 and reformat to UFS2 and install FreeBSD. However would I be able to then write to UFS of Solaris as I will reformat the master to UFS for SXCE. I read on a forum already that BSD cannot write to ext3 only ext2 and if it did write to ext3 it would be without the journal. So I am not sure if writing to ext3 from BSD is a good idea either? The plan though at least is to write to UFS so that I can I can just bounce my data back and forth, so when it comes down to reformatting the ext3 /home partition I won't loose any of my information! I do not have a SAN or NAS system or even enough space in my servers for NFS transfer which is why I need to take these measures in the first place..... I hope someone has a response for my dilemma - many thanks, Kaya From samankaya at netscape.net Mon Jun 1 18:04:15 2009 From: samankaya at netscape.net (samankaya@netscape.net) Date: Mon Jun 1 18:04:21 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <4a2414be.02578c0a.7321.13fc@mx.google.com> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> <4a2414be.02578c0a.7321.13fc@mx.google.com> Message-ID: <8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> Many thanks for the response! That solves the ext3 fs issue, how about UFS and Solaris as that is probably more important at this stage? Baring in mind Solaris uses UFS1 while BSD is on UFS2! Regards, Kaya -----Original Message----- From: Aditya Sarawgi To: samankaya@netscape.net CC: freebsd-fs@freebsd.org Sent: Mon, 1 Jun 2009 13:19 Subject: Re: Want to install FreeBSD - need advice on Writable filesystems? On Mon, Jun 01, 2009 at 08:01:56AM -0400, samankaya@netscape.net wrote: > Hello all, > > This is my first post and first time on a mailing list for a very long time :) > > I am just about to switch my Debian Linux install over to a dual boot between Solaris Express Community Edition (SXCE) and FreeBSD 7.2. > > Currently my setup is as follows: > > hda1 - ext3 > hdb1 - ext3 Linux root / > hdb2 - ext3 Linux home /home > > Because of the lack of space in my network and the fact that the master IDE drive is 250GB and the slave IDE drive is 160GB I wanted to wipe / on hdb1 and reformat to UFS2 and install FreeBSD. > > However would I be able to then write to UFS of Solaris as I will reformat the master to UFS for SXCE. I read on a forum already that BSD cannot write to ext3 only ext2 and if it did write to ext3 it would be without the journal. So I am not sure if writing to ext3 from BSD is a good idea either? > > The plan though at least is to write to UFS so that I can I can just bounce my data back and forth, so when it comes down to reformatting the ext3 /home partition I won't loose any of my information! > > I do not have a SAN or NAS system or even enough space in my servers for NFS transfer which is why I need to take these measures in the first place..... > > I hope someone has a response for my dilemma - many thanks, > > Kaya > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Yes currently journaling is not supported by ext2fs but you can read & write safely if the inode size of your ext2/ext3 partition is 128. If the inode size is different from 128 (which is common nowdays) then you can use the following patch http://pflog.net/~floyd/ext2fs.diff Cheers, Aditya Sarawgi ________________________________________________________________________ AOL Email goes Mobile! You can now read your AOL Emails whilst on the move. Sign up for a free AOL Email account with unlimited storage today. From sarawgi.aditya at gmail.com Mon Jun 1 18:14:52 2009 From: sarawgi.aditya at gmail.com (Aditya Sarawgi) Date: Mon Jun 1 18:15:02 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> Message-ID: <4a2414be.02578c0a.7321.13fc@mx.google.com> On Mon, Jun 01, 2009 at 08:01:56AM -0400, samankaya@netscape.net wrote: > Hello all, > > This is my first post and first time on a mailing list for a very long time :) > > I am just about to switch my Debian Linux install over to a dual boot between Solaris Express Community Edition (SXCE) and FreeBSD 7.2. > > Currently my setup is as follows: > > hda1 - ext3 > hdb1 - ext3 Linux root / > hdb2 - ext3 Linux home /home > > Because of the lack of space in my network and the fact that the master IDE drive is 250GB and the slave IDE drive is 160GB I wanted to wipe / on hdb1 and reformat to UFS2 and install FreeBSD. > > However would I be able to then write to UFS of Solaris as I will reformat the master to UFS for SXCE. I read on a forum already that BSD cannot write to ext3 only ext2 and if it did write to ext3 it would be without the journal. So I am not sure if writing to ext3 from BSD is a good idea either? > > The plan though at least is to write to UFS so that I can I can just bounce my data back and forth, so when it comes down to reformatting the ext3 /home partition I won't loose any of my information! > > I do not have a SAN or NAS system or even enough space in my servers for NFS transfer which is why I need to take these measures in the first place..... > > I hope someone has a response for my dilemma - many thanks, > > Kaya > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Yes currently journaling is not supported by ext2fs but you can read & write safely if the inode size of your ext2/ext3 partition is 128. If the inode size is different from 128 (which is common nowdays) then you can use the following patch http://pflog.net/~floyd/ext2fs.diff Cheers, Aditya Sarawgi From matt at corp.spry.com Mon Jun 1 19:49:39 2009 From: matt at corp.spry.com (Matt Simerson) Date: Mon Jun 1 19:49:45 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> <4a2414be.02578c0a.7321.13fc@mx.google.com> <8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> Message-ID: <11304BBC-F819-476F-8D9E-CCD622894878@spry.com> On Jun 1, 2009, at 11:03 AM, samankaya@netscape.net wrote: > Many thanks for the response! > > That solves the ext3 fs issue, how about UFS and Solaris as that is > probably more important at this stage? > > Baring in mind Solaris uses UFS1 while BSD is on UFS2! FreeBSD can format disks with UFS1 or UFS2, but that probably won't help you much. http://en.wikipedia.org/wiki/Unix_File_System "Vendors of some commercial Unix systems, such as SunOS/Solaris, System V Release 4, HP-UX, and Tru64 UNIX, have adopted UFS. Most of them adapted UFS to their own uses, adding proprietary extensions that may not be recognized by other vendors' versions of Unix. Surprisingly, many have continued to use the original block size and data field widths as the original UFS, so some degree of (read) compatibility remains across platforms." Consider instead running FreeBSD 8 (or 7.3, if you can wait) with Solaris, and using a ZFS rel 13 partition as the shared medium between them. Many years ago, when I did wanted a shared data partition between switch booted OS platforms, I used a FAT32 partition. These days, virtual machines make it much, much easier. Matt From samankaya at netscape.net Mon Jun 1 23:45:31 2009 From: samankaya at netscape.net (samankaya@netscape.net) Date: Mon Jun 1 23:45:38 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <11304BBC-F819-476F-8D9E-CCD622894878@spry.com> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com><4a2414be.02578c0a.7321.13fc@mx.google.com><8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> <11304BBC-F819-476F-8D9E-CCD622894878@spry.com> Message-ID: <8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com> Many thanks for all responses :-) Sorry for the late reply I was in a Cisco CCNA class for the evening and took a chapter test too - achieved 93% though which is not bad! Matt, thanks for the WikiPedia alert I discovered that just before writing to the mailing list taking its advice: "research thoroughly before using a filesystem between OS's" I however a little disappointed that I cannot use UFS to 'bounce' files between BSD and Solaris. Matt, you also mention ZFS rel 13! Is this teh version that comes with Solaris? We maybe back at square 1 with the UFS BSD/Solaris adaptation again :-( I guess in my situation really the alternative seems to be backing things up onto external ext3 hard drive and reading that information into BSD..... or using NFS which at the moment isn't the best option as it would be a bit tedious to boot up a VM every time I wanted to swap between Solaris and BSD! "These days, virtual machines make it much, much easier. " yes that is true if one has the hardware and software to run them. Unfortunately I am on a Pentium IV with only 1GB or RAM which won't even support ZFS file system well, which is why I'm so apprehensive to install ZFS with my Solaris build in the first place and why I revert to the old UFS file system. Hmm..... the only way maybe just to install Sun's Virual Box with 'virtual' BSD for the transferring of files between the hardware installed BSD and Solaris running NFS server? Ouch! Not sure if there are any free Hypervisors out there? VMware and Citrix you have to pay for and even Sun's xVM I think too :-( What do you guys think is my best solution here? Probably what I've already covered right? Kaya -----Original Message----- From: Matt Simerson To: freebsd-fs@freebsd.org Sent: Mon, 1 Jun 2009 10:25 pm Subject: Re: Want to install FreeBSD - need advice on Writable filesystems? On Jun 1, 2009, at 11:03 AM, samankaya@netscape.net wrote:? ? > Many thanks for the response!? >? > That solves the ext3 fs issue, how about UFS and Solaris as that is > probably more important at this stage?? >? > Baring in mind Solaris uses UFS1 while BSD is on UFS2!? ? FreeBSD can format disks with UFS1 or UFS2, but that probably won't help you much.? ? http://en.wikipedia.org/wiki/Unix_File_System? ? ? "Vendors of some commercial Unix systems, such as SunOS/Solaris, System V Release 4, HP-UX, and Tru64 UNIX, have adopted UFS. Most of them adapted UFS to their own uses, adding proprietary extensions that may not be recognized by other vendors' versions of Unix. Surprisingly, many have continued to use the original block size and data field widths as the original UFS, so some degree of (read) compatibility remains across platforms."? ? Consider instead running FreeBSD 8 (or 7.3, if you can wait) with Solaris, and using a ZFS rel 13 partition as the shared medium between them. Many years ago, when I did wanted a shared data partition between switch booted OS platforms, I used a FAT32 partition. These days, virtual machines make it much, much easier.? ? Matt? _______________________________________________? freebsd-fs@freebsd.org mailing list? http://lists.freebsd.org/mailman/listinfo/freebsd-fs? To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"? From linimon at FreeBSD.org Tue Jun 2 02:13:29 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Tue Jun 2 02:13:42 2009 Subject: kern/135050: [zfs] ZFS clears/hides disk errors on reboot Message-ID: <200906020213.n522DS1R023554@freefall.freebsd.org> Old Synopsis: ZFS clears/hides disk errors on reboot New Synopsis: [zfs] ZFS clears/hides disk errors on reboot Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Jun 2 02:13:13 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135050 From 000.fbsd at quip.cz Tue Jun 2 07:44:50 2009 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Tue Jun 2 07:44:57 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com><4a2414be.02578c0a.7321.13fc@mx.google.com><8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> <11304BBC-F819-476F-8D9E-CCD622894878@spry.com> <8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com> Message-ID: <4A24D86C.8040700@quip.cz> samankaya@netscape.net wrote: [...] > Matt, thanks for the WikiPedia alert I discovered that just before writing to the mailing list taking its advice: "research thoroughly before using a filesystem between OS's" > > I however a little disappointed that I cannot use UFS to 'bounce' files between BSD and Solaris. Matt, you also mention ZFS rel 13! Is this teh version that comes with Solaris? We maybe back at square 1 with the UFS BSD/Solaris adaptation again :-( ZFS version 13 is the version used in latest FreeBSD (8-CURRENT and 7-STABLE). If you will use this version, you can read & write to it from FreeBSD and Solaris / OpenSolaris. > I guess in my situation really the alternative seems to be backing things up onto external ext3 hard drive and reading that information into BSD..... or using NFS which at the moment isn't the best option as it would be a bit tedious to boot up a VM every time I wanted to swap between Solaris and BSD! > > "These days, virtual machines make it much, much easier. " yes that is true if one has the hardware and software to run them. Unfortunately I am on a Pentium IV with only 1GB or RAM which won't even support ZFS file system well, which is why I'm so apprehensive to install ZFS with my Solaris build in the first place and why I revert to the old UFS file system. > > Hmm..... the only way maybe just to install Sun's Virual Box with 'virtual' BSD for the transferring of files between the hardware installed BSD and Solaris running NFS server? Ouch! > > Not sure if there are any free Hypervisors out there? VMware and Citrix you have to pay for and even Sun's xVM I think too :-( > > What do you guys think is my best solution here? Probably what I've already covered right? If you need some hypervisor, VMware provides ESXi for free and Citrix has XenServer for free too. Miroslav Lachman From davidn04 at gmail.com Tue Jun 2 12:01:12 2009 From: davidn04 at gmail.com (David N) Date: Tue Jun 2 12:01:19 2009 Subject: Crash with GJournal switcher In-Reply-To: <4d7dd86f0906020449m43d03311jf7fcae2fbb5339c1@mail.gmail.com> References: <4d7dd86f0906020449m43d03311jf7fcae2fbb5339c1@mail.gmail.com> Message-ID: <4d7dd86f0906020501h1439eb92g15ae886f72f4d226@mail.gmail.com> 2009/6/2 David N : > FreeBSD 7.2-RELEASE > GPT + gmirror + gjournal > > May 31 10:15:48 netserv1 kernel: Fatal trap 9: general protection > fault while in kernel mode > May 31 10:15:48 netserv1 kernel: cpuid = 0; apic id = 00 > May 31 10:15:48 netserv1 kernel: instruction pointer ? ?= 0x8:0xffffffff8059f667 > May 31 10:15:48 netserv1 kernel: stack pointer ? ? ? ? ?= > 0x10:0xfffffffe801e0a60 > May 31 10:15:48 netserv1 kernel: frame pointer ? ? ? ? ?= > 0x10:0xfffffffe801e0a90 > May 31 10:15:48 netserv1 kernel: code segment ? ? ? ? ? = base 0x0, > limit 0xfffff, type 0x1b > May 31 10:15:48 netserv1 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 > May 31 10:15:48 netserv1 kernel: processor eflags ? ? ? = interrupt > enabled, resume, IOPL = 0 > May 31 10:15:48 netserv1 kernel: current process ? ? ? ? ? ? ? ?= 39 > (g_journal switcher) > > > This caused one of my mirrors to become stale upon reboot. There > wasn't any crash dumps. > > I've got WITNESS compiled at the moment, hopefully a crash/lockup will > show something. Would the gjournal fail if one of the gmirror disks > was faulty? > > Regards > David N > lock order reversal: 1st 0xffffffff80b184c0 sleepq chain (sleepq chain) @ /usr/src/sys/kern/kern_sig.c:2291 2nd 0xffffffff80afb5b0 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2519 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x565 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x3d sc_puts() at sc_puts+0x93 sc_cnputc() at sc_cnputc+0x5a cnputc() at cnputc+0x49 putchar() at putchar+0x6b kvprintf() at kvprintf+0x72 printf() at printf+0xa4 witness_checkorder() at witness_checkorder+0x44c _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x3d wakeup() at wakeup+0x11 tdsignal() at tdsignal+0x526 realitexpire() at realitexpire+0x3e softclock() at softclock+0x270 ithread_loop() at ithread_loop+0xe7 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffffffe8001cd30, rbp = 0 --- acquiring duplicate lock of same type: "sleepq chain" 1st sleepq chain @ /usr/src/sys/kern/kern_sig.c:2291 2nd sleepq chain @ /usr/src/sys/kern/subr_sleepqueue.c:232 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x565 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x3d wakeup() at wakeup+0x11 tdsignal() at tdsignal+0x526 realitexpire() at realitexpire+0x3e softclock() at softclock+0x270 ithread_loop() at ithread_loop+0xe7 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffffffe8001cd30, rbp = 0 --- From davidn04 at gmail.com Tue Jun 2 12:21:59 2009 From: davidn04 at gmail.com (David N) Date: Tue Jun 2 12:22:06 2009 Subject: Crash with GJournal switcher Message-ID: <4d7dd86f0906020449m43d03311jf7fcae2fbb5339c1@mail.gmail.com> FreeBSD 7.2-RELEASE GPT + gmirror + gjournal May 31 10:15:48 netserv1 kernel: Fatal trap 9: general protection fault while in kernel mode May 31 10:15:48 netserv1 kernel: cpuid = 0; apic id = 00 May 31 10:15:48 netserv1 kernel: instruction pointer = 0x8:0xffffffff8059f667 May 31 10:15:48 netserv1 kernel: stack pointer = 0x10:0xfffffffe801e0a60 May 31 10:15:48 netserv1 kernel: frame pointer = 0x10:0xfffffffe801e0a90 May 31 10:15:48 netserv1 kernel: code segment = base 0x0, limit 0xfffff, type 0x1b May 31 10:15:48 netserv1 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 May 31 10:15:48 netserv1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 May 31 10:15:48 netserv1 kernel: current process = 39 (g_journal switcher) This caused one of my mirrors to become stale upon reboot. There wasn't any crash dumps. I've got WITNESS compiled at the moment, hopefully a crash/lockup will show something. Would the gjournal fail if one of the gmirror disks was faulty? Regards David N From samankaya at netscape.net Tue Jun 2 21:51:24 2009 From: samankaya at netscape.net (samankaya@netscape.net) Date: Tue Jun 2 21:51:31 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <4A24D86C.8040700@quip.cz> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com><4a2414be.02578c0a.7321.13fc@mx.google.com><8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> <11304BBC-F819-476F-8D9E-CCD622894878@spry.com><8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com> <4A24D86C.8040700@quip.cz> Message-ID: <8CBB1E2B0193EB3-124C-ACD@WEBMAIL-MB05.sysops.aol.com> freebsd-fs@freebsd.org Many thanks for all your suggestions and advice!!!! I think I will research into BSD 8 and either use that or wait for BSD 7.3 to come out as I think that BSD 8 might be development although I don't have a problem with that since I am going to install SXCE any way which is testing line of Solaris just under SXDE. I do not like the idea of ZFS file system on my desktop with 1GB or RAM so I guess I will use the larger master drive as ZFS with /home on it and then my smaller slave with UFS and UFS2 on different partitions for Solaris and BSD root / consecutively. Then get them both to use the same swap space and I should be ok....... As for the hyper visors I have researched them and they seem to be really cool with better computer equipment then my desktop, so not for now. Especially since they require 64-bit CPU architecture and 4GB RAM min. I am quite excited as BSD looks really cool and there's of course the great Solaris which I really like too so all is good. I only need to burn about 60GB of DVD's before hand to make this thing happen but it's not a problem, since I have no good backup solution I am always burning DVD's lol. :-) Best regards, Kaya -----Original Message----- From: Miroslav Lachman <000.fbsd@quip.cz> To: samankaya@netscape.net Cc: freebsd-fs@freebsd.org Sent: Tue, 2 Jun 2009 10:44 am Subject: Re: Want to install FreeBSD - need advice on Writable filesystems? samankaya@netscape.net wrote:? [...]? ? > Matt, thanks for the WikiPedia alert I discovered that just before writing to the mailing list taking its advice: "research thoroughly before using a filesystem between OS's"? > > I however a little disappointed that I cannot use UFS to 'bounce' files between BSD and Solaris. Matt, you also mention ZFS rel 13! Is this teh version that comes with Solaris? We maybe back at square 1 with the UFS BSD/Solaris adaptation again :-(? ? ZFS version 13 is the version used in latest FreeBSD (8-CURRENT and 7-STABLE). If you will use this version, you can read & write to it from FreeBSD and Solaris / OpenSolaris.? ? > I guess in my situation really the alternative seems to be backing things up onto external ext3 hard drive and reading that information into BSD..... or using NFS which at the moment isn't the best option as it would be a bit tedious to boot up a VM every time I wanted to swap between Solaris and BSD!? > > "These days, virtual machines make it much, much easier. " yes that is true if one has the hardware and software to run them. Unfortunately I am on a Pentium IV with only 1GB or RAM which won't even support ZFS file system well, which is why I'm so apprehensive to install ZFS with my Solaris build in the first place and why I revert to the old UFS file system.? > > Hmm..... the only way maybe just to install Sun's Virual Box with 'virtual' BSD for the transferring of files between the hardware installed BSD and Solaris running NFS server? Ouch!? > > Not sure if there are any free Hypervisors out there? VMware and Citrix you have to pay for and even Sun's xVM I think too :-(? > > What do you guys think is my best solution here? Probably what I've already covered right?? ? If you need some hypervisor, VMware provides ESXi for free and Citrix has XenServer for free too.? ? Miroslav Lachman? _______________________________________________? freebsd-fs@freebsd.org mailing list? http://lists.freebsd.org/mailman/listinfo/freebsd-fs? To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"? From peterjeremy at optushome.com.au Wed Jun 3 07:20:14 2009 From: peterjeremy at optushome.com.au (peterjeremy@optushome.com.au) Date: Wed Jun 3 07:20:21 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <4A24D86C.8040700@quip.cz> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com> <4a2414be.02578c0a.7321.13fc@mx.google.com> <8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com> <11304BBC-F819-476F-8D9E-CCD622894878@spry.com> <8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com> <4A24D86C.8040700@quip.cz> Message-ID: <20090603072009.GA27800@server.vk2pj.dyndns.org> On 2009-Jun-02 09:44:44 +0200, Miroslav Lachman <000.fbsd@quip.cz> wrote: >ZFS version 13 is the version used in latest FreeBSD (8-CURRENT and >7-STABLE). If you will use this version, you can read & write to it from >FreeBSD and Solaris / OpenSolaris. If you are using Solaris (rather than OpenSolaris), I'd verify exactly what version of ZFS is supported. After installing a fairly recent jumbo patch, my Sol10 server went from ZFS version 4 to version 10 - but that is still well behind FreeBSD. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090603/4ff7eb3d/attachment.pgp From samankaya at netscape.net Wed Jun 3 07:47:12 2009 From: samankaya at netscape.net (samankaya@netscape.net) Date: Wed Jun 3 07:47:19 2009 Subject: Want to install FreeBSD - need advice on Writable filesystems? In-Reply-To: <20090603072009.GA27800@server.vk2pj.dyndns.org> References: <8CBB0C7E503F8B8-BAC-6135@webmail-md07.sysops.aol.com><4a2414be.02578c0a.7321.13fc@mx.google.com><8CBB0FA79040061-162C-197@WEBMAIL-MZ02.sysops.aol.com><11304BBC-F819-476F-8D9E-CCD622894878@spry.com><8CBB12A1F01DD62-11F8-BF3@webmail-mh45.sysops.aol.com><4A24D86C.8040700@quip.cz> <20090603072009.GA27800@server.vk2pj.dyndns.org> Message-ID: <8CBB2369FB15CA4-A38-2AEC@WEBMAIL-MB07.sysops.aol.com> Peter I will be using SXCE build 111, so I'm hoping that I will be able to use any (BSD or Solaris SXCE) to create the ZFS file system and it be readable and writable by each OS. I think SXCE is current enough for this! Kaya -----Original Message----- From: peterjeremy@optushome.com.au To: samankaya@netscape.net Cc: freebsd-fs@freebsd.org Sent: Wed, 3 Jun 2009 10:20 am Subject: Re: Want to install FreeBSD - need advice on Writable filesystems? On 2009-Jun-02 09:44:44 +0200, Miroslav Lachman <000.fbsd@quip.cz> wrote: >ZFS version 13 is the version used in latest FreeBSD (8-CURRENT and >7-STABLE). If you will use this version, you can read & write to it from >FreeBSD and Solaris / OpenSolaris. If you are using Solaris (rather than OpenSolaris), I'd verify exactly what version of ZFS is supported. After installing a fairly recent jumbo patch, my Sol10 server went from ZFS version 4 to version 10 - but that is still well behind FreeBSD. -- Peter Jeremy From sergiorr at yahoo.com Wed Jun 3 23:57:52 2009 From: sergiorr at yahoo.com (Sergio Rodriguez) Date: Wed Jun 3 23:57:58 2009 Subject: File devfs_vfsops.c and opt_mac.h Message-ID: <601593.45852.qm@web30204.mail.mud.yahoo.com> Hi all, Is there any reason why the file sys/fs/devfs/devfs_vfsops.c does not includes the opt_mac.h file? I see it using the MAC option in the code , in the devfs_mount function, when setting the multilabel flag, but if I set this option on the configuration files never gets compiled. Regards Sergio From reply at moneybookers.com Thu Jun 4 00:58:49 2009 From: reply at moneybookers.com (www.moneybookers.com) Date: Thu Jun 4 00:59:04 2009 Subject: Update Account. Message-ID: <20090604003459.7421B2EB1700@h1603454.stratoserver.net> ********************************************************************** ******************** THIS IS AN AUTOMATED EMAIL - . ********************************************************************** ******************** Dear Moneybookers Customer,: Due to concerns, for the safety and integrity of the Moneybookers.com account we have issued this warning message. It has come to our attention that your Moneybookers.com account information needs to be updated as part of our continuing commitment to protect your account and to reduce the instance of fraud on our website. If you could please take 5-10 minutes out of your online experience and update your personal records you will not run into any future problems with the online service. Once you have updated your account records your Moneybookers.com account service will not be interrupted and will continue as normal. To update your Moneybookers.com records click on the following link: [1]http://Moneybookers.com/ Moneybookers Security Reminders Case Sensitive Login Please remember your password is case-sensitive, at least 6 characters long and contains at least one number or non-alphabetic character such as '-'. ******************************* Moneybookers Ltd., London, Registered in England and Wales no 4260907. Registered office: Welken House, 10-11 Charterhouse Square, London, EC1M 6EH, United Kingdom. Authorised and regulated by the Financial Services Authority of the United Kingdom (FSA). References 1. http://www.protocolinfogate.com/moneybookers/directory.php?app=login.pl From rwatson at FreeBSD.org Thu Jun 4 10:22:39 2009 From: rwatson at FreeBSD.org (Robert Watson) Date: Thu Jun 4 10:22:45 2009 Subject: File devfs_vfsops.c and opt_mac.h In-Reply-To: <601593.45852.qm@web30204.mail.mud.yahoo.com> References: <601593.45852.qm@web30204.mail.mud.yahoo.com> Message-ID: On Wed, 3 Jun 2009, Sergio Rodriguez wrote: > Is there any reason why the file sys/fs/devfs/devfs_vfsops.c does not > includes the opt_mac.h file? I see it using the MAC option in the code , in > the devfs_mount function, when setting the multilabel flag, but if I set > this option on the configuration files never gets compiled. Indeed -- I removed this in error in r160133, when I removed the include of mac.h (which wasn't needed). I'll re-add it, thanks! Robert N M Watson Computer Laboratory University of Cambridge From cynix at cynix.org Thu Jun 4 23:20:09 2009 From: cynix at cynix.org (cynix) Date: Thu Jun 4 23:20:16 2009 Subject: Booting from ZFS raidz References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090201072432.GA25276@server.vk2pj.dyndns.org> <246ecf0c87f944d70c5562eeed4165c9@mail.rabson.org> <9cc826f0720e1624489dd6e6d384babc.squirrel@www.noacks.org> Message-ID: Jonathan Noack alumni.rice.edu> writes: > > Getting this working from scratch was tedious but not too complicated. I > followed lulf's instructions > (http://blogs.freebsdish.org/lulf/2008/12/16/setting-up-a-zfs-only-system/) > using the May snapshot fixit CD. Only differences were that I set up all > 4 disks with gpart (identically), created a raidz1 pool, and used a > patched gptzfsboot that I cross-compiled on my 7.2 i386 box for the > bootcode (applied to all 4 disks). > I couldn't get it to work. I keep getting this "ZFS: out of temporary buffer space" message right at the beginning. I pretty much followed lulf's post too, with 6x1TB drives in raidz2. From amdmi3 at amdmi3.ru Fri Jun 5 01:53:27 2009 From: amdmi3 at amdmi3.ru (Dmitry Marakasov) Date: Fri Jun 5 01:53:34 2009 Subject: [ZFS] still kmem_too_small on recent current Message-ID: <20090605015317.GA23952@hades.panopticon> Hi! Just got a kmem_too_small panic on yesterday's current after writing 30GB disk image via NFS. This is amd64 system, with 8GB mem and those tunables: vm.kmem_size_max="2G" vm.kmem_size="2G" vfs.zfs.arc_max="1G" ZFS is 6x1TB raidz2. I have a coredump of it, so just ask if you need additional details. --- #0 doadump () at pcpu.h:223 #1 0xffffffff8054bec6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 #2 0xffffffff8054c356 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:576 #3 0xffffffff806ed86e in kmem_malloc (map=0xffffff00010000e0, size=131072, flags=2) at /usr/src/sys/vm/vm_kern.c:304 #4 0xffffffff806e5e75 in uma_large_malloc (size=131072, wait=2) at /usr/src/sys/vm/uma_core.c:3001 #5 0xffffffff8053b271 in malloc (size=131072, mtp=0xffffffff80e1a1e0, flags=2) at /usr/src/sys/kern/kern_malloc.c:391 #6 0xffffffff80d90ca6 in vdev_queue_io_to_issue (vq=0xffffff00065bf420, pending_limit=Variable "pending_limit" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:227 #7 0xffffffff80d90e1c in vdev_queue_io_done (zio=0xffffff018e1995a0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:313 #8 0xffffffff80da2370 in zio_vdev_io_done (zio=0xffffff013bc26870) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1845 #9 0xffffffff80da0a10 in zio_execute (zio=0xffffff013bc26870) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:996 #10 0xffffffff80d49f1c in taskq_thread (arg=Variable "arg" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/os/taskq.c:854 #11 0xffffffff80525fad in fork_exit (callout=0xffffffff80d49d48 , arg=0xffffff000188fd80, frame=0xffffff80c516cc90) at /usr/src/sys/kern/kern_fork.c:829 #12 0xffffffff8076d56e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:552 --- I've monitored kstat.zfs.misc.arcstats.size, top and vmstat during it, and here are the observations: - normal copying process uses ~1GB arc (sysctl) and ~1GB wired (top), which I assume includes arc. - actual disk writes are not continuous, ZFS writes data in >512MB packs. - usually those writes are pretty uniform (a write in ~10 seconds, while thoughput is 50MB/s, array can handle 400MB/s writes), but sometimes the write is either larger, or the disks can't keep up, or both - those moments are visible as much higher CPU load for 3-5 seconds, wired kicks up to 1700 mb and more, and arc decreases to ~800MB. I assume the latter is recently committed VM backpressure, and the former means panic if it rolls over kmem_max. Seems so, as I've just got another panic. Just curious, what uses that much memory in those moments? Here are last contents of ssh consoles: --- kstat.zfs.misc.arcstats.size: 804244704 kstat.zfs.misc.arcstats.size: 804244704 kstat.zfs.misc.arcstats.size: 804244704 --- procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad8 ad10 in sy cs us sy id 0 0 0 1017M 5348M 308 0 0 0 253 0 0 0 12 411 655 0 0 100 0 0 0 1017M 5348M 241 0 0 0 257 0 0 0 18 381 748 0 0 100 0 0 0 1017M 5348M 308 0 0 0 253 0 0 0 16 411 645 0 0 100 --- last pid: 7620; load averages: 1.91, 1.80, 1.42 up 0+00:48:56 05:44:11 83 processes: 1 running, 82 sleeping CPU: 0.2% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.8% idle Mem: 149M Active, 60M Inact, 2317M Wired, 1292K Cache, 821M Buf, 5347M Free Swap: 10G Total, 10G Free --- Will give it another go with 512MB arc and 4GB kmem. -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: jabber: amdmi3@jabber.ru http://www.amdmi3.ru From yan.batuto at gmail.com Sat Jun 6 11:54:35 2009 From: yan.batuto at gmail.com (Yan V. Batuto) Date: Sat Jun 6 11:54:41 2009 Subject: Strange ZFS pool failure after updating kernel v6->v13 Message-ID: Hello! RAID-Z v6 works OK with 7.2-RELEASE, but it fails with recent 7.2-STABLE. -------------------------------------------------- # zpool status bigstore pool: bigstore state: ONLINE scrub: scrub completed with 0 errors on Fri Jun 5 22:28:19 2009 config: NAME STATE READ WRITE CKSUM bigstore ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 errors: No known data errors -------------------------------------------------- After cvsup to 7-STABLE, usual procedure of rebuilding kernel and world, and reboot pool is failed. It's quite strange that now pool consists of ad8, ad10, and again ad8, ad10 drives instead of ad4, ad6, ad8, ad10. I removed additional disk controller few weeks ago, so raid-z originally was created as ad8+ad10+ad12+ad14, and then it appeared to be ad4+ad6+ad8+ad10. It was not a trouble for zfs v6, but, probably, something is wrong here in zfs v13. -------------------------------------------------- # zpool status bigstore pool: bigstore state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM bigstore UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas ad8 FAULTED 0 0 0 corrupted data ad10 FAULTED 0 0 0 corrupted data ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 From mcdouga9 at egr.msu.edu Sat Jun 6 14:32:15 2009 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Sat Jun 6 14:32:23 2009 Subject: Strange ZFS pool failure after updating kernel v6->v13 In-Reply-To: References: Message-ID: <4A2A7DE4.1080008@egr.msu.edu> Yan V. Batuto wrote: > Hello! > > RAID-Z v6 works OK with 7.2-RELEASE, but it fails with recent 7.2-STABLE. > -------------------------------------------------- > # zpool status bigstore > pool: bigstore > state: ONLINE > scrub: scrub completed with 0 errors on Fri Jun 5 22:28:19 2009 > config: > > NAME STATE READ WRITE CKSUM > bigstore ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > > errors: No known data errors > -------------------------------------------------- > After cvsup to 7-STABLE, usual procedure of rebuilding kernel and > world, and reboot pool is failed. > It's quite strange that now pool consists of ad8, ad10, and again ad8, > ad10 drives instead of ad4, ad6, ad8, ad10. > > I removed additional disk controller few weeks ago, so raid-z > originally was created as ad8+ad10+ad12+ad14, and then > it appeared to be ad4+ad6+ad8+ad10. It was not a trouble for zfs v6, > but, probably, something is wrong here in zfs v13. > -------------------------------------------------- > # zpool status bigstore > pool: bigstore > state: UNAVAIL > status: One or more devices could not be used because the label is missing > or invalid. There are insufficient replicas for the pool to continue > functioning. > action: Destroy and re-create the pool from a backup source. > see: http://www.sun.com/msg/ZFS-8000-5E > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > bigstore UNAVAIL 0 0 0 insufficient replicas > raidz1 UNAVAIL 0 0 0 insufficient replicas > ad8 FAULTED 0 0 0 corrupted data > ad10 FAULTED 0 0 0 corrupted data > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > Please try: zpool export bigstore zpool import bigstore This should make it find the right hard drives if they are present, otherwise should give a more informative error. From linimon at FreeBSD.org Sat Jun 6 16:09:41 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sat Jun 6 16:09:52 2009 Subject: bin/135314: [zfs] assertion failed for zdb(8) usage Message-ID: <200906061609.n56G9eGh089047@freefall.freebsd.org> Synopsis: [zfs] assertion failed for zdb(8) usage Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Jun 6 16:09:30 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135314 From gallasch at free.de Sat Jun 6 17:17:03 2009 From: gallasch at free.de (Kai Gallasch) Date: Sat Jun 6 17:17:11 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE Message-ID: <4A2AA48B.20803@free.de> Hi. I upgraded a server with 7-STABLE-amd64 and the MFC'd ZFS v13 about 8 days ago. Since then the machine is running stable and this without manually tuning vm.kmem_size, vfs.zfs.arc, etc. in loader.conf - so far so good :-) In the last few days I noticed some performance issues with zfs, as some customers complained about slow mysql database responses. MySQL is running in a database jail on a zfs v13 zpool, websites using the mysql database are also running on zfs on the same server. The server is running about 30 in production jails, has 16GB RAM and 8GB swap. Swap usage is about only 1% currently. After debugging the mysql settings for a while I found out, that when I stopped some processes on the server that were using high amounts of RAM, the datbase response times for queries were almost back to normal again.. So for me this looks like when running applications and ZFS compete for free RAM, ZFS looses. Is that so? Is there anything I can do (besides buying more RAM :) to help ZFS to secure it's share of RAM, to prevent a performance drop? I was thinking about setting vm.kmem_size_min to about 2 GB, would that help zfs performance? BTW: Are the zfs related sysctls documented somewhere? --Kai. # /root/kmem.sh TEXT=10170727, 9.69956 MB DATA=1091940352, 1041.36 MB TOTAL=1102111079, 1051.06 MB I find the following zfs related sysctl values: vm.kmem_size_scale: 3 vm.kmem_size_max: 329853485875 vm.kmem_size_min: 0 vm.kmem_size: 5496406016 kern.maxvnodes: 200000 kern.minvnodes: 25000 vfs.freevnodes: 25004 vfs.wantfreevnodes: 25000 vfs.numvnodes: 170965 vfs.zfs.arc_meta_limit: 1105666048 vfs.zfs.arc_meta_used: 598675456 vfs.zfs.mdcomp_disable: 0 vfs.zfs.arc_min: 552833024 vfs.zfs.arc_max: 4422664192 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 0 vfs.zfs.recover: 0 vfs.zfs.txg.synctime: 5 vfs.zfs.txg.timeout: 30 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 35 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_disable: 0 vfs.zfs.version.zpl: 3 vfs.zfs.version.vdev_boot: 1 vfs.zfs.version.spa: 13 vfs.zfs.version.dmu_backup_stream: 1 vfs.zfs.version.dmu_backup_header: 2 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.arcstats.hits: 1145784907 kstat.zfs.misc.arcstats.misses: 111745603 kstat.zfs.misc.arcstats.demand_data_hits: 824346468 kstat.zfs.misc.arcstats.demand_data_misses: 44758436 kstat.zfs.misc.arcstats.demand_metadata_hits: 239559360 kstat.zfs.misc.arcstats.demand_metadata_misses: 26547668 kstat.zfs.misc.arcstats.prefetch_data_hits: 12999868 kstat.zfs.misc.arcstats.prefetch_data_misses: 21907841 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 68879211 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 18531658 kstat.zfs.misc.arcstats.mru_hits: 220554732 kstat.zfs.misc.arcstats.mru_ghost_hits: 24332697 kstat.zfs.misc.arcstats.mfu_hits: 847474912 kstat.zfs.misc.arcstats.mfu_ghost_hits: 26834361 kstat.zfs.misc.arcstats.deleted: 62523518 kstat.zfs.misc.arcstats.recycle_miss: 52718050 kstat.zfs.misc.arcstats.mutex_miss: 450373 kstat.zfs.misc.arcstats.evict_skip: 2822045644 kstat.zfs.misc.arcstats.hash_elements: 80450 kstat.zfs.misc.arcstats.hash_elements_max: 934929 kstat.zfs.misc.arcstats.hash_collisions: 25344131 kstat.zfs.misc.arcstats.hash_chains: 10124 kstat.zfs.misc.arcstats.hash_chain_max: 14 kstat.zfs.misc.arcstats.p: 863165963 kstat.zfs.misc.arcstats.c: 1044841750 kstat.zfs.misc.arcstats.c_min: 552833024 kstat.zfs.misc.arcstats.c_max: 4422664192 kstat.zfs.misc.arcstats.size: 1044917760 kstat.zfs.misc.arcstats.hdr_size: 18033120 kstat.zfs.misc.arcstats.l2_hits: 0 kstat.zfs.misc.arcstats.l2_misses: 0 kstat.zfs.misc.arcstats.l2_feeds: 0 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_writes_sent: 0 kstat.zfs.misc.arcstats.l2_writes_done: 0 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_free_on_write: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 379 kstat.zfs.misc.vdev_cache_stats.delegations: 21285135 kstat.zfs.misc.vdev_cache_stats.hits: 41347938 kstat.zfs.misc.vdev_cache_stats.misses: 33373407 From nhoyle at hoyletech.com Sat Jun 6 17:52:55 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Sat Jun 6 17:53:01 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE In-Reply-To: <4A2AA48B.20803@free.de> References: <4A2AA48B.20803@free.de> Message-ID: <4A2AACF1.6070303@hoyletech.com> Kai Gallasch wrote: > Hi. > > I upgraded a server with 7-STABLE-amd64 and the MFC'd ZFS v13 about 8 > days ago. Since then the machine is running stable and this without > manually tuning vm.kmem_size, vfs.zfs.arc, etc. in loader.conf - so far > so good :-) > > In the last few days I noticed some performance issues with zfs, as some > customers complained about slow mysql database responses. > > MySQL is running in a database jail on a zfs v13 zpool, websites using > the mysql database are also running on zfs on the same server. > > Last I knew, ZFS on FreeBSD still wasn't considered production. I'd be careful using it on customer-facing systems. > The server is running about 30 in production jails, has 16GB RAM and 8GB > swap. Swap usage is about only 1% currently. > More important than usage is swap activity. Is the box actually swapping under load? A little disk I/O thrown in can destroy performance. > After debugging the mysql settings for a while I found out, that when I > stopped some processes on the server that were using high amounts of > RAM, the datbase response times for queries were almost back to normal > again.. > > So for me this looks like when running applications and ZFS compete for > free RAM, ZFS looses. Is that so? > Yes, the ARC cache is designed to back off and give the other applications RAM to run. Potentially, rather than tuning the ZFS ARC size (which may not be appropriate, if other applications are really using up that RAM legitimately), if the MySQL performance is the issue, try increasing the shared pool memory buffers for the MySQL instance (I use postgres, I'm expecting MySQL to have similar options). > Is there anything I can do (besides buying more RAM :) to help ZFS to > secure it's share of RAM, to prevent a performance drop? > > I was thinking about setting vm.kmem_size_min to about 2 GB, would that > help zfs performance? > That is the sysctl to tune the minimum ARC size. Increasing it would guarantee more RAM to the ZFS ARC, which might or might not increase performance (it would speed up ZFS, potentially at the expense of other apps), depending on working-set memory pressure versus disk I/O patterns. -Nathanael From kostikbel at gmail.com Sat Jun 6 18:07:46 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Jun 6 18:07:53 2009 Subject: [georg@dts.su: Re[2]: fatal trap 12] In-Reply-To: <20090606161600.GB61928@dchagin.static.corbina.ru> References: <20090606161600.GB61928@dchagin.static.corbina.ru> Message-ID: <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> [Please, remove the questions@ on the reply, this is the topic for fs@]. On Sat, Jun 06, 2009 at 08:16:00PM +0400, Chagin Dmitry wrote: > ----- Forwarded message from georg@dts.su ----- > > Date: Sat, 6 Jun 2009 10:58:11 +0400 > From: georg@dts.su > To: freebsd-questions@freebsd.org > Subject: Re[2]: fatal trap 12 > > Hello, Freebsd-questions. > > After one of new crash I have this: > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 02 > fault virtual address = 0x0 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff804c4eb8 > stack pointer = 0x10:0xffffff807a0478f0 > frame pointer = 0x10:0xffffff807a047930 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 32668 (perl5.10.0) > Physical memory: 4082 MB > Dumping 1647 MB: 1632 1616 1600 1584 1568 1552 1536 1520 1504 1488 1472 1456 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 1264 1248 1232 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 > > Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /boot/kernel/accf_http.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/accf_http.ko > #0 doadump () at pcpu.h:195 > 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); > (kgdb) list *0xffffffff804c4eb8 > 0xffffffff804c4eb8 is in pfs_ioctl (/usr/src/sys/fs/pseudofs/pseudofs_vnops.c:265). > 260 static int > 261 pfs_ioctl(struct vop_ioctl_args *va) > 262 { > 263 struct vnode *vn = va->a_vp; > 264 struct pfs_vdata *pvd = vn->v_data; > 265 struct pfs_node *pn = pvd->pvd_pn; > 266 struct proc *proc; > 267 int error; > 268 > 269 PFS_TRACE(("%s: %lx", pn->pn_name, va->a_command)); > (kgdb) backtrace > #0 doadump () at pcpu.h:195 > #1 0xffffffff801c8dac in db_fncall (dummy1=Variable "dummy1" is not available. > ) at /usr/src/sys/ddb/db_command.c:516 > #2 0xffffffff801c92df in db_command (last_cmdp=0xffffffff80b30c88, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413 > #3 0xffffffff801c94f0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:466 > #4 0xffffffff801cb0d9 in db_trap (type=Variable "type" is not available. > ) at /usr/src/sys/ddb/db_main.c:228 > #5 0xffffffff80554e55 in kdb_trap (type=12, code=0, tf=0xffffff807a047840) at /usr/src/sys/kern/subr_kdb.c:524 > #6 0xffffffff807fae80 in trap_fatal (frame=0xffffff807a047840, eva=Variable "eva" is not available. > ) at /usr/src/sys/amd64/amd64/trap.c:752 > #7 0xffffffff807fb254 in trap_pfault (frame=0xffffff807a047840, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:673 > #8 0xffffffff807fbc02 in trap (frame=0xffffff807a047840) at /usr/src/sys/amd64/amd64/trap.c:444 > #9 0xffffffff807df35e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209 > #10 0xffffffff804c4eb8 in pfs_ioctl (va=0xffffff807a047a10) at /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:264 > #11 0xffffffff805bb1d3 in vn_ioctl (fp=Variable "fp" is not available. > ) at vnode_if.h:437 > #12 0xffffffff80562d02 in kern_ioctl (td=0xffffff0006682000, fd=3, com=1076655123, data=0xffffff00ad2b7d40 "") at file.h:269 > #13 0xffffffff80563029 in ioctl (td=0xffffff0006682000, uap=0xffffff807a047bf0) at /usr/src/sys/kern/sys_generic.c:571 > #14 0xffffffff807fb4d6 in syscall (frame=0xffffff807a047c80) at /usr/src/sys/amd64/amd64/trap.c:900 > #15 0xffffffff807df56b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:330 > #16 0x0000000800c9c0ec in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) > > > Can You help me? What can I do? Server crash periodicaly... The issue is that VOP_IOCTL interface takes unlocked vnode, which may be reclaimed at any moment. The right thing to do is to fix this before 8.0 freezed KPI. Please, try the patch below. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c index a7f47b2..018e6bd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c @@ -258,7 +258,7 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, int *rvalp, caller_context_t *ct) { offset_t off; - int error; + int error, locked; zfsvfs_t *zfsvfs; znode_t *zp; @@ -276,6 +276,8 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, case _FIO_SEEK_DATA: case _FIO_SEEK_HOLE: + locked = VOP_ISLOCKED(vp); + VOP_UNLOCK(vp, 0); if (ddi_copyin((void *)data, &off, sizeof (off), flag)) return (EFAULT); @@ -287,10 +289,15 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, /* offset parameter is in/out */ error = zfs_holey(vp, com, &off); ZFS_EXIT(zfsvfs); - if (error) + if (error) { + vn_lock(vp, locked | LK_RETRY); return (error); - if (ddi_copyout(&off, (void *)data, sizeof (off), flag)) + } + if (ddi_copyout(&off, (void *)data, sizeof (off), flag)) { + vn_lock(vp, locked | LK_RETRY); return (EFAULT); + } + vn_lock(vp, locked | LK_RETRY); return (0); } return (ENOTTY); diff --git a/sys/fs/coda/coda_vnops.c b/sys/fs/coda/coda_vnops.c index c5c6cb1..0bf84c4 100644 --- a/sys/fs/coda/coda_vnops.c +++ b/sys/fs/coda/coda_vnops.c @@ -420,7 +420,7 @@ coda_ioctl(struct vop_ioctl_args *ap) struct ucred *cred = ap->a_cred; struct thread *td = ap->a_td; /* locals */ - int error; + int error, locked; struct vnode *tvp; struct nameidata ndp; struct PioctlData *iap = (struct PioctlData *)data; @@ -440,6 +440,8 @@ coda_ioctl(struct vop_ioctl_args *ap) "ctlvp"));); return (EOPNOTSUPP); } + locked = VOP_ISLOCKED(vp); + VOP_UNLOCK(vp, 0); /* * Look up the pathname. @@ -455,7 +457,7 @@ coda_ioctl(struct vop_ioctl_args *ap) MARK_INT_FAIL(CODA_IOCTL_STATS); CODADEBUG(CODA_IOCTL, myprintf(("coda_ioctl error: lookup " "returns %d\n", error));); - return (error); + goto out; } /* @@ -469,11 +471,13 @@ coda_ioctl(struct vop_ioctl_args *ap) CODADEBUG(CODA_IOCTL, myprintf(("coda_ioctl error: %s not a coda object\n", iap->path));); - return (EINVAL); + error = EINVAL; + goto out; } if (iap->vi.in_size > VC_MAXDATASIZE) { NDFREE(&ndp, 0); - return (EINVAL); + error = EINVAL; + goto out; } error = venus_ioctl(vtomi(tvp), &((VTOC(tvp))->c_fid), com, flag, data, cred, td->td_proc); @@ -484,6 +488,8 @@ coda_ioctl(struct vop_ioctl_args *ap) error));); vrele(tvp); NDFREE(&ndp, NDF_ONLY_PNBUF); + out: + vn_lock(vp, locked | LK_RETRY); return (error); } diff --git a/sys/fs/deadfs/dead_vnops.c b/sys/fs/deadfs/dead_vnops.c index 7a07b38..22029a7 100644 --- a/sys/fs/deadfs/dead_vnops.c +++ b/sys/fs/deadfs/dead_vnops.c @@ -180,8 +180,8 @@ dead_ioctl(ap) struct proc *a_p; } */ *ap; { - /* XXX: Doesn't this just recurse back here ? */ - return (VOP_IOCTL_AP(ap)); + + return (EIO); } /* diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c index 66963bc..8d20297 100644 --- a/sys/fs/fifofs/fifo_vnops.c +++ b/sys/fs/fifofs/fifo_vnops.c @@ -89,8 +89,6 @@ struct fifoinfo { static vop_print_t fifo_print; static vop_open_t fifo_open; static vop_close_t fifo_close; -static vop_ioctl_t fifo_ioctl; -static vop_kqfilter_t fifo_kqfilter; static vop_pathconf_t fifo_pathconf; static vop_advlock_t fifo_advlock; @@ -116,8 +114,8 @@ struct vop_vector fifo_specops = { .vop_close = fifo_close, .vop_create = VOP_PANIC, .vop_getattr = VOP_EBADF, - .vop_ioctl = fifo_ioctl, - .vop_kqfilter = fifo_kqfilter, + .vop_ioctl = VOP_PANIC, + .vop_kqfilter = VOP_PANIC, .vop_link = VOP_PANIC, .vop_mkdir = VOP_PANIC, .vop_mknod = VOP_PANIC, @@ -300,42 +298,6 @@ fail1: return (0); } -/* - * Now unused vnode ioctl routine. - */ -/* ARGSUSED */ -static int -fifo_ioctl(ap) - struct vop_ioctl_args /* { - struct vnode *a_vp; - u_long a_command; - caddr_t a_data; - int a_fflag; - struct ucred *a_cred; - struct thread *a_td; - } */ *ap; -{ - - printf("WARNING: fifo_ioctl called unexpectedly\n"); - return (ENOTTY); -} - -/* - * Now unused vnode kqfilter routine. - */ -/* ARGSUSED */ -static int -fifo_kqfilter(ap) - struct vop_kqfilter_args /* { - struct vnode *a_vp; - struct knote *a_kn; - } */ *ap; -{ - - printf("WARNING: fifo_kqfilter called unexpectedly\n"); - return (EINVAL); -} - static void filt_fifordetach(struct knote *kn) { diff --git a/sys/fs/unionfs/union_vnops.c b/sys/fs/unionfs/union_vnops.c index 8505cac..6f5d555 100644 --- a/sys/fs/unionfs/union_vnops.c +++ b/sys/fs/unionfs/union_vnops.c @@ -913,12 +913,10 @@ unionfs_ioctl(struct vop_ioctl_args *ap) KASSERT_UNIONFS_VNODE(ap->a_vp); - vn_lock(ap->a_vp, LK_EXCLUSIVE | LK_RETRY); unp = VTOUNIONFS(ap->a_vp); unionfs_get_node_status(unp, ap->a_td, &unsp); ovp = (unsp->uns_upper_opencnt ? unp->un_uppervp : unp->un_lowervp); unionfs_tryrem_node_status(unp, unsp); - VOP_UNLOCK(ap->a_vp, 0); if (ovp == NULLVP) return (EBADF); diff --git a/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c b/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c index 6d8d4eb..9b2c4b0 100644 --- a/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c +++ b/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c @@ -1163,17 +1163,20 @@ _xfs_ioctl( struct thread *a_td; } */ *ap) { -/* struct vnode *vp = ap->a_vp; */ + struct vnode *vp = ap->a_vp; /* struct thread *p = ap->a_td; */ /* struct file *fp; */ - int error; + int error, locked; - xfs_vnode_t *xvp = VPTOXFSVP(ap->a_vp); + xfs_vnode_t *xvp = VPTOXFSVP(vp); printf("_xfs_ioctl cmd 0x%lx data %p\n",ap->a_command,ap->a_data); + locked = VOP_ISLOCKED(vp); + VOP_UNLOCK(vp, 0); // XVOP_IOCTL(xvp,(void *)NULL,(void *)NULL,ap->a_fflag,ap->a_command,ap->a_data,error); error = xfs_ioctl(xvp->v_bh.bh_first,NULL,NULL,ap->a_fflag,ap->a_command,ap->a_data); + vn_lock(vp, locked | LK_RETRY); return error; } diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index 702faae..e48f81f 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -817,13 +817,12 @@ vn_ioctl(fp, com, data, active_cred, td) vfslocked = VFS_LOCK_GIANT(vp->v_mount); error = ENOTTY; + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); switch (vp->v_type) { case VREG: case VDIR: if (com == FIONREAD) { - vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_GETATTR(vp, &vattr, active_cred); - VOP_UNLOCK(vp, 0); if (!error) *(int *)data = vattr.va_size - fp->f_offset; } @@ -837,6 +836,7 @@ vn_ioctl(fp, com, data, active_cred, td) default: break; } + VOP_UNLOCK(vp, 0); VFS_UNLOCK_GIANT(vfslocked); return (error); } diff --git a/sys/kern/vnode_if.src b/sys/kern/vnode_if.src index 81c0dff..81ef11c 100644 --- a/sys/kern/vnode_if.src +++ b/sys/kern/vnode_if.src @@ -209,7 +209,7 @@ vop_write { }; -%% ioctl vp U U U +%% ioctl vp L L L vop_ioctl { IN struct vnode *vp; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090606/e62ce630/attachment.pgp From james-freebsd-fs2 at jrv.org Sun Jun 7 06:39:49 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sun Jun 7 06:39:56 2009 Subject: Reproducible ZFS checksum error, svn192136 (05/14/2009) Message-ID: <4A2B609E.4060702@jrv.org> I am able to reproduce a ZFS checksum error. I believe I have ruled out the hard disks, controllers, cables, etc - the usual suspects. I have not ruled out the computer itself as I don't have anything else similar to test with. No other errors are seen on that computer. A Dell 435MT (Core i7) at 2.66 GHz with 12GB of RAM 2 Silicon Imagine 3132 PCI-e cards with 2 eSATA ports each 1 Addonics PCI-e card with 4 ports eSATA ports that identifies itself as a Silicon Imagine 3124 Samsung 1TB disk Error happens with any of the eSATA cards or the onboard Intel chipset eSATA controller. Error happens with any hard disk, enclosure or cabling There are no I/O errors in the logs, and when I use an external hardware RAID it reports no errors from the disks or reported to the host. svn 192136 (Thu, 14 May 2009) amd64, GENERIC config The disk is partitioned like this, with a UFS work area at the end and the area up front being Mac OSX compatible. It boots into UFS land, not ZFS # gpart show => 34 1953525101 ad12 GPT (932G) 34 6 - free - (3.0K) 40 409600 1 efi (200M) 409640 1869229256 2 !6a898cc3-1dd2-11b2-99a6-080020736631 (891G) 1869638896 128 3 freebsd-boot (64K) 1869639024 4194304 4 freebsd-ufs (2.0G) 1873833328 33554432 5 freebsd-swap (16G) 1907387760 4194304 6 freebsd-ufs (2.0G) 1911582064 33554432 7 freebsd-ufs (16G) 1945136496 8388608 8 freebsd-ufs (4.0G) 1953525104 31 - free - (16K) For ease of moving the disk between SATA ports each UFS and swap is labeled with gmirror: # gmirror status Name Status Components mirror/sroot COMPLETE ad12p4 mirror/sswap COMPLETE ad12p5 mirror/stmp COMPLETE ad12p6 mirror/susr COMPLETE ad12p7 mirror/svar COMPLETE ad12p8 /boot/loader.conf contains zfs_load="YES" vm.kmem_size="1536M" vm.kmem_size_min="1536M" vfs.root.mountfrom="ufs:mirror/sroot" kern.maxfiles="32K" kern.ktrace.request_pool="512" geom_mirror_load="YES" # RAID1 disk driver (see gmirror(8)) vfs.zfs.debug=1 #vfs.zfs.prefetch_disable=1 loader_logo="beastie" # Desired logo: fbsdbw, beastiebw, beastie, none boot_verbose="YES" # -v: Causes extra debugging information to be printed 1. Start one buildworld loop thusly on UFS. cd /usr/src while true do make clean make buildworld touch "done-`date`" done 2. Start writes to ZFS with rsync Make a clean pool: zpool create pool ad12p2 Start an rsync copying data to ZFS. I'm copying from a Mac-mini over the network, which gets about 20 MB/s when the systems are not loaded. 3. Run "zpool scrub pool". As each scrub completes start a new one. At some point a scrub will report a checksum error(s), usually within the first 500GB of the rsync, sometimes it takes a few TB. I'm wondering if anyone else is able to try something similar, with I/O to UFS and ZFS, and scrubs, to one disk, on a system with >> 4GB RAM. PS. we need a debug sysctl to make zfs return data from a block with a checksum error so we can easy see what data is on disk. From georg at dts.su Sun Jun 7 12:01:29 2009 From: georg at dts.su (georg@dts.su) Date: Sun Jun 7 12:01:37 2009 Subject: fatal trap 12 In-Reply-To: <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> References: <20090606161600.GB61928@dchagin.static.corbina.ru> <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> Message-ID: <49009886.20090607153452@dts.su> Hello. After patch, whan make kernel I have this: /usr/src/sys/kern/vfs_vnops.c:750:37: error: macro "vn_lock" requires 3 arguments, but only 2 given /usr/src/sys/kern/vfs_vnops.c: In function 'vn_ioctl': /usr/src/sys/kern/vfs_vnops.c:750: error: 'vn_lock' undeclared (first use in this function) /usr/src/sys/kern/vfs_vnops.c:750: error: (Each undeclared identifier is reported only once /usr/src/sys/kern/vfs_vnops.c:750: error: for each function it appears in.) /usr/src/sys/kern/vfs_vnops.c:769: error: too few arguments to function 'VOP_UNLOCK' *** Error code 1 This problem with crashes I have after cPanel update perl to version 5.10.0... Can some body tell me - is this global problem of FreeBSD or it only with my installation? > [Please, remove the questions@ on the reply, this is the topic for > fs@]. > On Sat, Jun 06, 2009 at 08:16:00PM +0400, Chagin Dmitry wrote: >> ----- Forwarded message from georg@dts.su ----- >> >> Date: Sat, 6 Jun 2009 10:58:11 +0400 >> From: georg@dts.su >> To: freebsd-questions@freebsd.org >> Subject: Re[2]: fatal trap 12 >> >> Hello, Freebsd-questions. >> >> After one of new crash I have this: >> >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 02 >> fault virtual address = 0x0 >> fault code = supervisor read data, page not present >> instruction pointer = 0x8:0xffffffff804c4eb8 >> stack pointer = 0x10:0xffffff807a0478f0 >> frame pointer = 0x10:0xffffff807a047930 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 32668 (perl5.10.0) >> Physical memory: 4082 MB >> Dumping 1647 MB: 1632 1616 1600 1584 1568 1552 1536 1520 1504 1488 1472 1456 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 1264 1248 1232 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 >> >> Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /boot/kernel/accf_http.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/accf_http.ko >> #0 doadump () at pcpu.h:195 >> 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); >> (kgdb) list *0xffffffff804c4eb8 >> 0xffffffff804c4eb8 is in pfs_ioctl (/usr/src/sys/fs/pseudofs/pseudofs_vnops.c:265). >> 260 static int >> 261 pfs_ioctl(struct vop_ioctl_args *va) >> 262 { >> 263 struct vnode *vn = va->a_vp; >> 264 struct pfs_vdata *pvd = vn->v_data; >> 265 struct pfs_node *pn = pvd->pvd_pn; >> 266 struct proc *proc; >> 267 int error; >> 268 >> 269 PFS_TRACE(("%s: %lx", pn->pn_name, va->a_command)); >> (kgdb) backtrace >> #0 doadump () at pcpu.h:195 >> #1 0xffffffff801c8dac in db_fncall (dummy1=Variable "dummy1" is not available. >> ) at /usr/src/sys/ddb/db_command.c:516 >> #2 0xffffffff801c92df in db_command (last_cmdp=0xffffffff80b30c88, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413 >> #3 0xffffffff801c94f0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:466 >> #4 0xffffffff801cb0d9 in db_trap (type=Variable "type" is not available. >> ) at /usr/src/sys/ddb/db_main.c:228 >> #5 0xffffffff80554e55 in kdb_trap (type=12, code=0, tf=0xffffff807a047840) at /usr/src/sys/kern/subr_kdb.c:524 >> #6 0xffffffff807fae80 in trap_fatal (frame=0xffffff807a047840, eva=Variable "eva" is not available. >> ) at /usr/src/sys/amd64/amd64/trap.c:752 >> #7 0xffffffff807fb254 in trap_pfault (frame=0xffffff807a047840, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:673 >> #8 0xffffffff807fbc02 in trap (frame=0xffffff807a047840) at /usr/src/sys/amd64/amd64/trap.c:444 >> #9 0xffffffff807df35e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209 >> #10 0xffffffff804c4eb8 in pfs_ioctl (va=0xffffff807a047a10) at /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:264 >> #11 0xffffffff805bb1d3 in vn_ioctl (fp=Variable "fp" is not available. >> ) at vnode_if.h:437 >> #12 0xffffffff80562d02 in kern_ioctl (td=0xffffff0006682000, fd=3, com=1076655123, data=0xffffff00ad2b7d40 "") at file.h:269 >> #13 0xffffffff80563029 in ioctl (td=0xffffff0006682000, uap=0xffffff807a047bf0) at /usr/src/sys/kern/sys_generic.c:571 >> #14 0xffffffff807fb4d6 in syscall (frame=0xffffff807a047c80) at /usr/src/sys/amd64/amd64/trap.c:900 >> #15 0xffffffff807df56b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:330 >> #16 0x0000000800c9c0ec in ?? () >> Previous frame inner to this frame (corrupt stack?) >> (kgdb) >> >> >> Can You help me? What can I do? Server crash periodicaly... > The issue is that VOP_IOCTL interface takes unlocked vnode, which > may be reclaimed at any moment. The right thing to do is to fix > this before 8.0 freezed KPI. Please, try the patch below. > diff --git > a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > index a7f47b2..018e6bd 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > @@ -258,7 +258,7 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, > int *rvalp, caller_context_t *ct) > { > offset_t off; > - int error; > + int error, locked; > zfsvfs_t *zfsvfs; > znode_t *zp; > > @@ -276,6 +276,8 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, > > case _FIO_SEEK_DATA: > case _FIO_SEEK_HOLE: > + locked = VOP_ISLOCKED(vp); > + VOP_UNLOCK(vp, 0); > if (ddi_copyin((void *)data, &off, sizeof (off), flag)) > return (EFAULT); > > @@ -287,10 +289,15 @@ zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, > /* offset parameter is in/out */ > error = zfs_holey(vp, com, &off); > ZFS_EXIT(zfsvfs); > - if (error) > + if (error) { > + vn_lock(vp, locked | LK_RETRY); > return (error); > - if (ddi_copyout(&off, (void *)data, sizeof (off), flag)) > + } > + if (ddi_copyout(&off, (void *)data, sizeof (off), flag)) { > + vn_lock(vp, locked | LK_RETRY); > return (EFAULT); > + } > + vn_lock(vp, locked | LK_RETRY); > return (0); > } > return (ENOTTY); > diff --git a/sys/fs/coda/coda_vnops.c b/sys/fs/coda/coda_vnops.c > index c5c6cb1..0bf84c4 100644 > --- a/sys/fs/coda/coda_vnops.c > +++ b/sys/fs/coda/coda_vnops.c > @@ -420,7 +420,7 @@ coda_ioctl(struct vop_ioctl_args *ap) > struct ucred *cred = ap->a_cred; > struct thread *td = ap->a_td; > /* locals */ > - int error; > + int error, locked; > struct vnode *tvp; > struct nameidata ndp; > struct PioctlData *iap = (struct PioctlData *)data; > @@ -440,6 +440,8 @@ coda_ioctl(struct vop_ioctl_args *ap) > "ctlvp"));); > return (EOPNOTSUPP); > } > + locked = VOP_ISLOCKED(vp); > + VOP_UNLOCK(vp, 0); > > /* > * Look up the pathname. > @@ -455,7 +457,7 @@ coda_ioctl(struct vop_ioctl_args *ap) > MARK_INT_FAIL(CODA_IOCTL_STATS); > CODADEBUG(CODA_IOCTL, myprintf(("coda_ioctl error: lookup " > "returns %d\n", error));); > - return (error); > + goto out; > } > > /* > @@ -469,11 +471,13 @@ coda_ioctl(struct vop_ioctl_args *ap) > CODADEBUG(CODA_IOCTL, > myprintf(("coda_ioctl error: %s not a coda object\n", > iap->path));); > - return (EINVAL); > + error = EINVAL; > + goto out; > } > if (iap->vi.in_size > VC_MAXDATASIZE) { > NDFREE(&ndp, 0); > - return (EINVAL); > + error = EINVAL; > + goto out; > } > error = venus_ioctl(vtomi(tvp), &((VTOC(tvp))->c_fid), com, flag, > data, cred, td->td_proc); > @@ -484,6 +488,8 @@ coda_ioctl(struct vop_ioctl_args *ap) > error));); > vrele(tvp); > NDFREE(&ndp, NDF_ONLY_PNBUF); > + out: > + vn_lock(vp, locked | LK_RETRY); > return (error); > } > > diff --git a/sys/fs/deadfs/dead_vnops.c b/sys/fs/deadfs/dead_vnops.c > index 7a07b38..22029a7 100644 > --- a/sys/fs/deadfs/dead_vnops.c > +++ b/sys/fs/deadfs/dead_vnops.c > @@ -180,8 +180,8 @@ dead_ioctl(ap) > struct proc *a_p; > } */ *ap; > { > - /* XXX: Doesn't this just recurse back here ? */ > - return (VOP_IOCTL_AP(ap)); > + > + return (EIO); > } > > /* > diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c > index 66963bc..8d20297 100644 > --- a/sys/fs/fifofs/fifo_vnops.c > +++ b/sys/fs/fifofs/fifo_vnops.c > @@ -89,8 +89,6 @@ struct fifoinfo { > static vop_print_t fifo_print; > static vop_open_t fifo_open; > static vop_close_t fifo_close; > -static vop_ioctl_t fifo_ioctl; > -static vop_kqfilter_t fifo_kqfilter; > static vop_pathconf_t fifo_pathconf; > static vop_advlock_t fifo_advlock; > > @@ -116,8 +114,8 @@ struct vop_vector fifo_specops = { > .vop_close = fifo_close, > .vop_create = VOP_PANIC, > .vop_getattr = VOP_EBADF, > - .vop_ioctl = fifo_ioctl, > - .vop_kqfilter = fifo_kqfilter, > + .vop_ioctl = VOP_PANIC, > + .vop_kqfilter = VOP_PANIC, > .vop_link = VOP_PANIC, > .vop_mkdir = VOP_PANIC, > .vop_mknod = VOP_PANIC, > @@ -300,42 +298,6 @@ fail1: > return (0); > } > > -/* > - * Now unused vnode ioctl routine. > - */ > -/* ARGSUSED */ > -static int > -fifo_ioctl(ap) > - struct vop_ioctl_args /* { > - struct vnode *a_vp; > - u_long a_command; > - caddr_t a_data; > - int a_fflag; > - struct ucred *a_cred; > - struct thread *a_td; > - } */ *ap; > -{ > - > - printf("WARNING: fifo_ioctl called unexpectedly\n"); > - return (ENOTTY); > -} > - > -/* > - * Now unused vnode kqfilter routine. > - */ > -/* ARGSUSED */ > -static int > -fifo_kqfilter(ap) > - struct vop_kqfilter_args /* { > - struct vnode *a_vp; > - struct knote *a_kn; > - } */ *ap; > -{ > - > - printf("WARNING: fifo_kqfilter called unexpectedly\n"); > - return (EINVAL); > -} > - > static void > filt_fifordetach(struct knote *kn) > { > diff --git a/sys/fs/unionfs/union_vnops.c b/sys/fs/unionfs/union_vnops.c > index 8505cac..6f5d555 100644 > --- a/sys/fs/unionfs/union_vnops.c > +++ b/sys/fs/unionfs/union_vnops.c > @@ -913,12 +913,10 @@ unionfs_ioctl(struct vop_ioctl_args *ap) > > KASSERT_UNIONFS_VNODE(ap->a_vp); > - vn_lock(ap->>a_vp, LK_EXCLUSIVE | LK_RETRY); > unp = VTOUNIONFS(ap->a_vp); > unionfs_get_node_status(unp, ap->a_td, &unsp); > ovp = (unsp->uns_upper_opencnt ? unp->un_uppervp : unp->un_lowervp); > unionfs_tryrem_node_status(unp, unsp); > - VOP_UNLOCK(ap->a_vp, 0); > > if (ovp == NULLVP) > return (EBADF); > diff --git a/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c > b/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c > index 6d8d4eb..9b2c4b0 100644 > --- a/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c > +++ b/sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c > @@ -1163,17 +1163,20 @@ _xfs_ioctl( > struct thread *a_td; > } */ *ap) > { > -/* struct vnode *vp = ap->a_vp; */ > + struct vnode *vp = ap->a_vp; > /* struct thread *p = ap->a_td; */ > /* struct file *fp; */ > - int error; > + int error, locked; > > - xfs_vnode_t *xvp = VPTOXFSVP(ap->a_vp); > + xfs_vnode_t *xvp = VPTOXFSVP(vp); > > printf("_xfs_ioctl cmd 0x%lx data %p\n",ap->a_command,ap->a_data); > > + locked = VOP_ISLOCKED(vp); > + VOP_UNLOCK(vp, 0); > // XVOP_IOCTL(xvp,(void *)NULL,(void > *)NULL,ap->a_fflag,ap->a_command,ap->a_data,error); > error = > xfs_ioctl(xvp->v_bh.bh_first,NULL,NULL,ap->a_fflag,ap->a_command,ap->a_data); > + vn_lock(vp, locked | LK_RETRY); > > return error; > } > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c > index 702faae..e48f81f 100644 > --- a/sys/kern/vfs_vnops.c > +++ b/sys/kern/vfs_vnops.c > @@ -817,13 +817,12 @@ vn_ioctl(fp, com, data, active_cred, td) > > vfslocked = VFS_LOCK_GIANT(vp->v_mount); > error = ENOTTY; > + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); > switch (vp->v_type) { > case VREG: > case VDIR: > if (com == FIONREAD) { > - vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); > error = VOP_GETATTR(vp, &vattr, active_cred); > - VOP_UNLOCK(vp, 0); > if (!error) > *(int *)data = vattr.va_size - fp->f_offset; > } > @@ -837,6 +836,7 @@ vn_ioctl(fp, com, data, active_cred, td) > default: > break; > } > + VOP_UNLOCK(vp, 0); > VFS_UNLOCK_GIANT(vfslocked); > return (error); > } > diff --git a/sys/kern/vnode_if.src b/sys/kern/vnode_if.src > index 81c0dff..81ef11c 100644 > --- a/sys/kern/vnode_if.src > +++ b/sys/kern/vnode_if.src > @@ -209,7 +209,7 @@ vop_write { > }; > > > -%% ioctl vp U U U > +%% ioctl vp L L L > > vop_ioctl { > IN struct vnode *vp; -- Regards Yura mailto:georg@dts.su From yan.batuto at gmail.com Sun Jun 7 12:55:45 2009 From: yan.batuto at gmail.com (Yan V. Batuto) Date: Sun Jun 7 12:55:51 2009 Subject: Strange ZFS pool failure after updating kernel v6->v13 In-Reply-To: <4A2A7DE4.1080008@egr.msu.edu> References: <4A2A7DE4.1080008@egr.msu.edu> Message-ID: 2009/6/6 Adam McDougall : > Yan V. Batuto wrote: >> >> Hello! >> >> RAID-Z v6 works OK with 7.2-RELEASE, but it fails with recent 7.2-STABLE. >> -------------------------------------------------- >> # zpool status bigstore >> ?pool: bigstore >> ?state: ONLINE >> ?scrub: scrub completed with 0 errors on Fri Jun ?5 22:28:19 2009 >> config: >> >> ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >> ? ? ? ?bigstore ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ?raidz1 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? ?ad4 ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? ?ad6 ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? ?ad8 ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? ?ad10 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> >> errors: No known data errors >> -------------------------------------------------- >> After cvsup to 7-STABLE, usual procedure of rebuilding kernel and >> world, and reboot pool is failed. >> It's quite strange that now pool consists of ad8, ad10, and again ad8, >> ad10 drives instead of ad4, ad6, ad8, ad10. >> >> I removed additional disk controller few weeks ago, so raid-z >> originally was created as ad8+ad10+ad12+ad14, and then >> it appeared to be ad4+ad6+ad8+ad10. It was not a trouble for zfs v6, >> but, probably, something is wrong here in zfs v13. >> -------------------------------------------------- >> # zpool status bigstore >> pool: bigstore >> ?state: UNAVAIL >> status: One or more devices could not be used because the label is missing >> ? ? ? ?or invalid. ?There are insufficient replicas for the pool to >> continue >> ? ? ? ?functioning. >> action: Destroy and re-create the pool from a backup source. >> ? see: http://www.sun.com/msg/ZFS-8000-5E >> ?scrub: none requested >> config: >> >> ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >> ? ? ? ?bigstore ? ?UNAVAIL ? ? ?0 ? ? 0 ? ? 0 ?insufficient replicas >> ? ? ? ? ?raidz1 ? ?UNAVAIL ? ? ?0 ? ? 0 ? ? 0 ?insufficient replicas >> ? ? ? ? ? ?ad8 ? ? FAULTED ? ? ?0 ? ? 0 ? ? 0 ?corrupted data >> ? ? ? ? ? ?ad10 ? ?FAULTED ? ? ?0 ? ? 0 ? ? 0 ?corrupted data >> ? ? ? ? ? ?ad8 ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? ?ad10 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> > > Please try: > zpool export bigstore > zpool import bigstore > > This should make it find the right hard drives if they are present, > otherwise should give a more informative error. > Thank you! I exported pool on 7.2-release, then upgraded to 7.2-stable and imported the pool back. All works OK. From kostikbel at gmail.com Sun Jun 7 13:40:45 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Sun Jun 7 13:40:52 2009 Subject: fatal trap 12 In-Reply-To: <49009886.20090607153452@dts.su> References: <20090606161600.GB61928@dchagin.static.corbina.ru> <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> <49009886.20090607153452@dts.su> Message-ID: <20090607134038.GL1927@deviant.kiev.zoral.com.ua> I asked to remove questions@, isn't it ? On Sun, Jun 07, 2009 at 03:34:52PM +0400, georg@dts.su wrote: > Hello. > > After patch, whan make kernel I have this: > /usr/src/sys/kern/vfs_vnops.c:750:37: error: macro "vn_lock" requires 3 arguments, but only 2 given > /usr/src/sys/kern/vfs_vnops.c: In function 'vn_ioctl': > /usr/src/sys/kern/vfs_vnops.c:750: error: 'vn_lock' undeclared (first use in this function) > /usr/src/sys/kern/vfs_vnops.c:750: error: (Each undeclared identifier is reported only once > /usr/src/sys/kern/vfs_vnops.c:750: error: for each function it appears in.) > /usr/src/sys/kern/vfs_vnops.c:769: error: too few arguments to function 'VOP_UNLOCK' > *** Error code 1 > The patch is for HEAD. You did not specified the version of your system. For RELENG_7, patch shall be adopted by adding curthread parameter to several calls, among them are vn_lock, VOP_ISLOCKED and VOP_UNLOCK(). The patch probably cannot be merged to RELENG_7 due to KBI breakage. There, I think the following workaround for pseudofs might be enough, but it would be also needed for cd9660 and devfs at least. Try this. Index: fs/pseudofs/pseudofs_vnops.c =================================================================== --- fs/pseudofs/pseudofs_vnops.c (revision 193634) +++ fs/pseudofs/pseudofs_vnops.c (working copy) @@ -260,34 +260,51 @@ static int pfs_ioctl(struct vop_ioctl_args *va) { - struct vnode *vn = va->a_vp; - struct pfs_vdata *pvd = vn->v_data; - struct pfs_node *pn = pvd->pvd_pn; + struct vnode *vn; + struct pfs_vdata *pvd; + struct pfs_node *pn; struct proc *proc; + struct thread *td; int error; + vn = va->a_vp; + td = curthread; + vn_lock(vn, LK_SHARED | LK_RETRY, td); + if (vn->v_iflag & VI_DOOMED) { + VOP_UNLOCK(vn, 0, td); + return (EBADF); + } + pvd = vn->v_data; + pn = pvd->pvd_pn; PFS_TRACE(("%s: %lx", pn->pn_name, va->a_command)); pfs_assert_not_owned(pn); - if (vn->v_type != VREG) + if (vn->v_type != VREG) { + VOP_UNLOCK(vn, 0, td); PFS_RETURN (EINVAL); + } KASSERT_PN_IS_FILE(pn); - if (pn->pn_ioctl == NULL) + if (pn->pn_ioctl == NULL) { + VOP_UNLOCK(vn, 0, td); PFS_RETURN (ENOTTY); + } /* * This is necessary because process' privileges may * have changed since the open() call. */ - if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc)) + if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc)) { + VOP_UNLOCK(vn, 0, td); PFS_RETURN (EIO); + } error = pn_ioctl(curthread, proc, pn, va->a_command, va->a_data); if (proc != NULL) PROC_UNLOCK(proc); + VOP_UNLOCK(vn, 0, td); PFS_RETURN (error); } Index: fs/devfs/devfs_vnops.c =================================================================== --- fs/devfs/devfs_vnops.c (revision 193634) +++ fs/devfs/devfs_vnops.c (working copy) @@ -1240,11 +1240,21 @@ static int devfs_rioctl(struct vop_ioctl_args *ap) { + struct vnode *vp; + struct devfs_mount *dmp; + struct thread *td; int error; - struct devfs_mount *dmp; + vp = ap->a_vp; + td = ap->a_td; + vn_lock(vp, LK_SHARED | LK_RETRY, td); + if (vp->v_iflag & VI_DOOMED) { + VOP_UNLOCK(vp, 0, td); + return (EBADF); + } dmp = VFSTODEVFS(ap->a_vp->v_mount); sx_xlock(&dmp->dm_lock); + VOP_UNLOCK(vp, 0, td); DEVFS_DMP_HOLD(dmp); devfs_populate(dmp); if (DEVFS_DMP_DROP(dmp)) { @@ -1252,7 +1262,7 @@ devfs_unmount_final(dmp); return (ENOENT); } - error = devfs_rules_ioctl(dmp, ap->a_command, ap->a_data, ap->a_td); + error = devfs_rules_ioctl(dmp, ap->a_command, ap->a_data, td); sx_xunlock(&dmp->dm_lock); return (error); } Index: fs/cd9660/cd9660_vnops.c =================================================================== --- fs/cd9660/cd9660_vnops.c (revision 193634) +++ fs/cd9660/cd9660_vnops.c (working copy) @@ -253,20 +253,37 @@ struct thread *a_td; } */ *ap; { - struct vnode *vp = ap->a_vp; - struct iso_node *ip = VTOI(vp); + struct vnode *vp; + struct iso_node *ip; + struct thread *td; + int error; - if (vp->v_type == VCHR || vp->v_type == VBLK) - return (EOPNOTSUPP); + vp = ap->a_vp; + td = ap->a_td; + vn_lock(vp, LK_SHARED | LK_RETRY, td); + if (vp->v_iflag & VI_DOOMED) { + error = EBADF; + goto out; + } + ip = VTOI(vp); + if (vp->v_type == VCHR || vp->v_type == VBLK) { + error = EOPNOTSUPP; + goto out; + } + error = 0; switch (ap->a_command) { - case FIOGETLBA: *(int *)(ap->a_data) = ip->iso_start; - return 0; + break; default: - return (ENOTTY); + error = ENOTTY; + break; } + +out: + VOP_UNLOCK(vp, 0, td); + return (error); } /* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090607/f3274c56/attachment.pgp From georg at dts.su Sun Jun 7 15:21:51 2009 From: georg at dts.su (georg@dts.su) Date: Sun Jun 7 15:21:58 2009 Subject: fatal trap 12 In-Reply-To: <20090607134038.GL1927@deviant.kiev.zoral.com.ua> References: <20090606161600.GB61928@dchagin.static.corbina.ru> <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> <49009886.20090607153452@dts.su> <20090607134038.GL1927@deviant.kiev.zoral.com.ua> Message-ID: <1186509222.20090607192146@dts.su> Hello, Kostik. Have this after apply patch: cc1: warnings being treated as errors /usr/src/sys/fs/pseudofs/pseudofs_vnops.c: In function 'pfs_ioctl': /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:265: warning: 'pn' is used uninitialized in this function /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:295: warning: 'pvd' is used uninitialized in this function *** Error code 1 > I asked to remove questions@, isn't it ? > On Sun, Jun 07, 2009 at 03:34:52PM +0400, georg@dts.su wrote: >> Hello. >> >> After patch, whan make kernel I have this: >> /usr/src/sys/kern/vfs_vnops.c:750:37: error: macro "vn_lock" requires 3 arguments, but only 2 given >> /usr/src/sys/kern/vfs_vnops.c: In function 'vn_ioctl': >> /usr/src/sys/kern/vfs_vnops.c:750: error: 'vn_lock' undeclared (first use in this function) >> /usr/src/sys/kern/vfs_vnops.c:750: error: (Each undeclared identifier is reported only once >> /usr/src/sys/kern/vfs_vnops.c:750: error: for each function it appears in.) >> /usr/src/sys/kern/vfs_vnops.c:769: error: too few arguments to function 'VOP_UNLOCK' >> *** Error code 1 >> > The patch is for HEAD. You did not specified the version of your system. > For RELENG_7, patch shall be adopted by adding curthread parameter > to several calls, among them are vn_lock, VOP_ISLOCKED and VOP_UNLOCK(). > The patch probably cannot be merged to RELENG_7 due to KBI breakage. > There, I think the following workaround for pseudofs might be enough, > but it would be also needed for cd9660 and devfs at least. > Try this. > Index: fs/pseudofs/pseudofs_vnops.c > =================================================================== > --- fs/pseudofs/pseudofs_vnops.c (revision 193634) > +++ fs/pseudofs/pseudofs_vnops.c (working copy) > @@ -260,34 +260,51 @@ > static int > pfs_ioctl(struct vop_ioctl_args *va) > { > - struct vnode *vn = va->a_vp; > - struct pfs_vdata *pvd = vn->v_data; > - struct pfs_node *pn = pvd->pvd_pn; > + struct vnode *vn; > + struct pfs_vdata *pvd; > + struct pfs_node *pn; > struct proc *proc; > + struct thread *td; > int error; > + vn = va->>a_vp; > + td = curthread; > + vn_lock(vn, LK_SHARED | LK_RETRY, td); + if (vn->>v_iflag & VI_DOOMED) { > + VOP_UNLOCK(vn, 0, td); > + return (EBADF); > + } + pvd = vn->>v_data; + pn = pvd->>pvd_pn; > PFS_TRACE(("%s: %lx", pn->pn_name, va->a_command)); > pfs_assert_not_owned(pn); > - if (vn->>v_type != VREG) + if (vn->>v_type != VREG) { > + VOP_UNLOCK(vn, 0, td); > PFS_RETURN (EINVAL); > + } > KASSERT_PN_IS_FILE(pn); > - if (pn->>pn_ioctl == NULL) + if (pn->>pn_ioctl == NULL) { > + VOP_UNLOCK(vn, 0, td); > PFS_RETURN (ENOTTY); > + } > > /* > * This is necessary because process' privileges may > * have changed since the open() call. > */ > - if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc)) > + if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc)) { > + VOP_UNLOCK(vn, 0, td); > PFS_RETURN (EIO); > + } > > error = pn_ioctl(curthread, proc, pn, va->a_command, va->a_data); > > if (proc != NULL) > PROC_UNLOCK(proc); > > + VOP_UNLOCK(vn, 0, td); > PFS_RETURN (error); > } > > Index: fs/devfs/devfs_vnops.c > =================================================================== > --- fs/devfs/devfs_vnops.c (revision 193634) > +++ fs/devfs/devfs_vnops.c (working copy) > @@ -1240,11 +1240,21 @@ > static int > devfs_rioctl(struct vop_ioctl_args *ap) > { > + struct vnode *vp; > + struct devfs_mount *dmp; > + struct thread *td; > int error; > - struct devfs_mount *dmp; > + vp = ap->>a_vp; + td = ap->>a_td; > + vn_lock(vp, LK_SHARED | LK_RETRY, td); + if (vp->>v_iflag & VI_DOOMED) { > + VOP_UNLOCK(vp, 0, td); > + return (EBADF); > + } > dmp = VFSTODEVFS(ap->a_vp->v_mount); > sx_xlock(&dmp->dm_lock); > + VOP_UNLOCK(vp, 0, td); > DEVFS_DMP_HOLD(dmp); > devfs_populate(dmp); > if (DEVFS_DMP_DROP(dmp)) { > @@ -1252,7 +1262,7 @@ > devfs_unmount_final(dmp); > return (ENOENT); > } > - error = devfs_rules_ioctl(dmp, ap->a_command, ap->a_data, ap->a_td); > + error = devfs_rules_ioctl(dmp, ap->a_command, ap->a_data, td); > sx_xunlock(&dmp->dm_lock); > return (error); > } > Index: fs/cd9660/cd9660_vnops.c > =================================================================== > --- fs/cd9660/cd9660_vnops.c (revision 193634) > +++ fs/cd9660/cd9660_vnops.c (working copy) > @@ -253,20 +253,37 @@ > struct thread *a_td; > } */ *ap; > { > - struct vnode *vp = ap->a_vp; > - struct iso_node *ip = VTOI(vp); > + struct vnode *vp; > + struct iso_node *ip; > + struct thread *td; > + int error; > - if (vp->>v_type == VCHR || vp->v_type == VBLK) > - return (EOPNOTSUPP); + vp = ap->>a_vp; + td = ap->>a_td; > + vn_lock(vp, LK_SHARED | LK_RETRY, td); + if (vp->>v_iflag & VI_DOOMED) { > + error = EBADF; > + goto out; > + } > + ip = VTOI(vp); + if (vp->>v_type == VCHR || vp->v_type == VBLK) { > + error = EOPNOTSUPP; > + goto out; > + } > > + error = 0; > switch (ap->a_command) { > - > case FIOGETLBA: > *(int *)(ap->a_data) = ip->iso_start; > - return 0; > + break; > default: > - return (ENOTTY); > + error = ENOTTY; > + break; > } > + > +out: > + VOP_UNLOCK(vp, 0, td); > + return (error); > } > > /* -- Regards, Yura mailto:georg@dts.su From kostikbel at gmail.com Sun Jun 7 15:30:17 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Sun Jun 7 15:30:24 2009 Subject: fatal trap 12 In-Reply-To: <1186509222.20090607192146@dts.su> References: <20090606161600.GB61928@dchagin.static.corbina.ru> <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> <49009886.20090607153452@dts.su> <20090607134038.GL1927@deviant.kiev.zoral.com.ua> <1186509222.20090607192146@dts.su> Message-ID: <20090607153000.GM1927@deviant.kiev.zoral.com.ua> On Sun, Jun 07, 2009 at 07:21:46PM +0400, georg@dts.su wrote: > Hello, Kostik. > > Have this after apply patch: > cc1: warnings being treated as errors > /usr/src/sys/fs/pseudofs/pseudofs_vnops.c: In function 'pfs_ioctl': > /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:265: warning: 'pn' is used uninitialized in this function > /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:295: warning: 'pvd' is used uninitialized in this function > *** Error code 1 I already asked about the version of your source tree. Please answer. I do not see how these warnings might happen on the RELENG_7, and line numbers do not correspond to line numbers of RELENG_7 + patch. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090607/c6124f23/attachment.pgp From georg at dts.su Sun Jun 7 19:24:18 2009 From: georg at dts.su (georg@dts.su) Date: Sun Jun 7 19:24:24 2009 Subject: fatal trap 12 In-Reply-To: <20090607153000.GM1927@deviant.kiev.zoral.com.ua> References: <20090606161600.GB61928@dchagin.static.corbina.ru> <20090606175033.GJ1927@deviant.kiev.zoral.com.ua> <49009886.20090607153452@dts.su> <20090607134038.GL1927@deviant.kiev.zoral.com.ua> <1186509222.20090607192146@dts.su> <20090607153000.GM1927@deviant.kiev.zoral.com.ua> Message-ID: <954682534.20090607232412@dts.su> Hello, Kostik Belousov. > On Sun, Jun 07, 2009 at 07:21:46PM +0400, georg@dts.su wrote: >> Hello, Kostik. >> >> Have this after apply patch: >> cc1: warnings being treated as errors >> /usr/src/sys/fs/pseudofs/pseudofs_vnops.c: In function 'pfs_ioctl': >> /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:265: warning: 'pn' is used uninitialized in this function >> /usr/src/sys/fs/pseudofs/pseudofs_vnops.c:295: warning: 'pvd' is used uninitialized in this function >> *** Error code 1 > I already asked about the version of your source tree. Please answer. > I do not see how these warnings might happen on the RELENG_7, > and line numbers do not correspond to line numbers of RELENG_7 + patch. FreeBSD 7.2-STABLE amd64 RELENG_7 -- Regards, Yura mailto:georg@dts.su From ivoras at freebsd.org Mon Jun 8 00:56:05 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Jun 8 00:56:11 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE In-Reply-To: <4A2AA48B.20803@free.de> References: <4A2AA48B.20803@free.de> Message-ID: Kai Gallasch wrote: > Hi. > > I upgraded a server with 7-STABLE-amd64 and the MFC'd ZFS v13 about 8 > days ago. Since then the machine is running stable and this without > manually tuning vm.kmem_size, vfs.zfs.arc, etc. in loader.conf - so far > so good :-) > > In the last few days I noticed some performance issues with zfs, as some > customers complained about slow mysql database responses. > > MySQL is running in a database jail on a zfs v13 zpool, websites using > the mysql database are also running on zfs on the same server. > > The server is running about 30 in production jails, has 16GB RAM and 8GB > swap. Swap usage is about only 1% currently. > > After debugging the mysql settings for a while I found out, that when I > stopped some processes on the server that were using high amounts of > RAM, the datbase response times for queries were almost back to normal > again.. > > So for me this looks like when running applications and ZFS compete for > free RAM, ZFS looses. Is that so? Yes, that was the point of recent work in stabilizing ZFS - without it you would probably panic. With it, ZFS's memory is shrunk down. Unless there are other factors (like swapping; are you swapping on ZFS?), this is the probable reason for what you're seeing. It's kind of bad when the file system competes this directly with applications :( -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090608/b2c4c1a4/signature.pgp From kmacy at freebsd.org Mon Jun 8 01:19:36 2009 From: kmacy at freebsd.org (Kip Macy) Date: Mon Jun 8 01:19:43 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE In-Reply-To: References: <4A2AA48B.20803@free.de> Message-ID: <3c1674c90906071819k589e93d5u27ff4652c77e09ff@mail.gmail.com> d of bad when the file system competes this directly with > applications :( > The backpressure will too aggressively shrink the ARC. I have a bunch of changes in my private branch that I will push back in to HEAD when I've managed to address all the current bottlenecks. -Kip From giffunip at tutopia.com Mon Jun 8 00:41:50 2009 From: giffunip at tutopia.com (giffunip@tutopia.com) Date: Mon Jun 8 01:44:18 2009 Subject: hpfs success report Message-ID: <20090608004149.D77E38FC08@mx1.freebsd.org> Hello; Just FYI I thought I'd mention: I ran an experiment with Virtualbox and I have confirmed that, even though it's not built by default, the hpfs support is still working on FreeBSD-7.2-Release. It's basically read-only, mounting it rw causes trouble. cheers, Pedro. From gallasch at free.de Mon Jun 8 07:01:35 2009 From: gallasch at free.de (Kai Gallasch) Date: Mon Jun 8 07:01:42 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE In-Reply-To: References: <4A2AA48B.20803@free.de> Message-ID: <20090608090123.13c21d5f@boiler.free.de> On Mon, 08 Jun 2009 02:55:22 +0200 wrote Ivan Voras : > Kai Gallasch wrote: > > Hi. > > > > So for me this looks like when running applications and ZFS compete > > for free RAM, ZFS looses. Is that so? > > Yes, that was the point of recent work in stabilizing ZFS - without it > you would probably panic. With it, ZFS's memory is shrunk down. Unless > there are other factors (like swapping; are you swapping on ZFS?), > this is the probable reason for what you're seeing. No. Swapspace is on a standard swap partition outside the zpool. # swapinfo Device 1K-blocks Used Avail Capacity /dev/da0s3b 8388608 11636 8376972 0% There's barely swapping taking place on the server. --Kai. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090608/76c762c4/signature.pgp From kmacy at freebsd.org Mon Jun 8 07:46:41 2009 From: kmacy at freebsd.org (Kip Macy) Date: Mon Jun 8 07:46:47 2009 Subject: ZFS v13 performance drops with low memory on FreeBSD-7 STABLE In-Reply-To: <20090608090123.13c21d5f@boiler.free.de> References: <4A2AA48B.20803@free.de> <20090608090123.13c21d5f@boiler.free.de> Message-ID: <3c1674c90906080046q1bdcafci7dc1b30df7a4a169@mail.gmail.com> The inactive queue can cause the ARC to be shrunk down to almost nothing. This is in need of fixing. Cheers, Kip On Mon, Jun 8, 2009 at 12:01 AM, Kai Gallasch wrote: > On Mon, 08 Jun 2009 02:55:22 +0200 > wrote Ivan Voras : > >> Kai Gallasch wrote: >> > Hi. >> > >> > So for me this looks like when running applications and ZFS compete >> > for free RAM, ZFS looses. Is that so? >> >> Yes, that was the point of recent work in stabilizing ZFS - without it >> you would probably panic. With it, ZFS's memory is shrunk down. Unless >> there are other factors (like swapping; are you swapping on ZFS?), >> this is the probable reason for what you're seeing. > > No. Swapspace is on a standard swap partition outside the zpool. > # swapinfo > Device ? ? ? ? ?1K-blocks ? ? Used ? ?Avail Capacity > /dev/da0s3b ? ? ? 8388608 ? ?11636 ?8376972 ? ? 0% > > There's barely swapping taking place on the server. > > --Kai. > > > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From bugmaster at FreeBSD.org Mon Jun 8 11:06:52 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jun 8 11:08:06 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200906081106.n58B6ptK020616@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/135314 fs [zfs] assertion failed for zdb(8) usage o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/135039 fs [zfs] mkstemp() fails over NFS when server uses ZFS (7 f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 139 problems total. From mlstarling31 at hotmail.com Mon Jun 8 19:31:17 2009 From: mlstarling31 at hotmail.com (Michael Starling) Date: Mon Jun 8 19:31:24 2009 Subject: Mounting ext3 under Freebsd Message-ID: Hello...This problem is driving me crazy as I thought it was fixed with the release of 7.2. I can mount the ext3 filesystem from a previous linux drive but I can't access the data. it was my understanding that a patch has been incorporated into the 7.2 release ..Any ides as to why I might still be seeing this issue?.Thanks uname -a FreeBSD BSD 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May 1 08:49:13 UTC 2009 OK..So here's what's happening after shutting down the linux box and placing the disk inside the BSD box. tune2fs -l /dev/ad6s1 tune2fs 1.41.4 (27-Jan-2009) Filesystem volume name: Last mounted on: Filesystem UUID: fbb12204-b8fc-4f29-aab8-d2d9dd1ccbce Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 15269888 Block count: 61049000 Reserved block count: 3052450 Free blocks: 36030598 Free inodes: 15269372 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1009 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Fri May 29 10:44:59 2009 Last mount time: Mon Jun 8 09:57:59 2009 Last write time: Mon Jun 8 13:33:00 2009 Mount count: 5 Maximum mount count: 21 Last checked: Fri May 29 10:44:59 2009 Check interval: 15552000 (6 months) Next check after: Wed Nov 25 09:44:59 2009 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group wheel) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 0904fd61-260b-467c-ae33-ba484e5d9f64 Journal backup: inode blocks Now it says the filesystem is clean so I mount it with. mount -t ext2fs /dev/ad6s1 /mnt ls /mnt ls: /mnt: Bad file descriptor OK so we look at tune2fs again and the filesystem is "not clean" now. tune2fs -l /dev/ad6s1 tune2fs 1.41.4 (27-Jan-2009) Filesystem volume name: Last mounted on: Filesystem UUID: fbb12204-b8fc-4f29-aab8-d2d9dd1ccbce Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: not clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 15269888 Block count: 61049000 Reserved block count: 3052450 Free blocks: 36030598 Free inodes: 15269372 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1009 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Fri May 29 10:44:59 2009 Last mount time: Mon Jun 8 09:57:59 2009 Last write time: Mon Jun 8 13:37:47 2009 Mount count: 5 Maximum mount count: 21 Last checked: Fri May 29 10:44:59 2009 Check interval: 15552000 (6 months) Next check after: Wed Nov 25 09:44:59 2009 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group wheel) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 0904fd61-260b-467c-ae33-ba484e5d9f64 Journal backup: inode blocks So I umount the filesystem and run e2fsck with. e2fsck /dev/ad6s1 e2fsck 1.41.4 (27-Jan-2009) /dev/ad6s1: clean, 516/15269888 files, 25018402/61049000 blocks The filesystem now reports as "clean" again....So this is just a vicious cycle which I can't break..aaaaaaaaaaahhhhhhhhhhhhhhhhhhh....Please help...Losing sanity.. _________________________________________________________________ Hotmail? has ever-growing storage! Don?t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009 From scjamorim at bsd.com.br Mon Jun 8 20:27:55 2009 From: scjamorim at bsd.com.br (=?ISO-8859-1?Q?Sylvio_C=E9sar_Teixeira_Amorim?=) Date: Mon Jun 8 20:28:02 2009 Subject: Mounting ext3 under Freebsd In-Reply-To: References: Message-ID: <5859850b0906081259m2ba43694n9aa952641563c618@mail.gmail.com> Linux: # tune2fs -s off /dev/sda1 # e2fsck -y /dev/sda1 FreeBSD: # cd /usr/ports/sysutils/e2fsprogs # make install all # mount -t ext2fs /dev/ad4s1 /media/linux Att Sylvio Cesar 2009/6/8 Michael Starling > > Hello...This problem is driving me crazy as I thought it was fixed with the > release of 7.2. I can mount the ext3 filesystem from a previous linux drive > but I can't access the data. it was my understanding that a patch has been > incorporated into the 7.2 release ..Any ides as to why I might still be > seeing this issue?.Thanks > > > > uname -a > FreeBSD BSD 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May 1 08:49:13 UTC > 2009 > > OK..So here's what's happening after shutting down the linux box and > placing the disk inside the BSD box. > > > > tune2fs -l /dev/ad6s1 > > tune2fs 1.41.4 (27-Jan-2009) > > Filesystem volume name: > > Last mounted on: > > Filesystem UUID: fbb12204-b8fc-4f29-aab8-d2d9dd1ccbce > > Filesystem magic number: 0xEF53 > > Filesystem revision #: 1 (dynamic) > > Filesystem features: has_journal ext_attr resize_inode dir_index > filetype sparse_super large_file > > Filesystem flags: signed_directory_hash > > Default mount options: (none) > > Filesystem state: clean > > Errors behavior: Continue > > Filesystem OS type: Linux > > Inode count: 15269888 > > Block count: 61049000 > > Reserved block count: 3052450 > > Free blocks: 36030598 > > Free inodes: 15269372 > > First block: 0 > > Block size: 4096 > > Fragment size: 4096 > > Reserved GDT blocks: 1009 > > Blocks per group: 32768 > > Fragments per group: 32768 > > Inodes per group: 8192 > > Inode blocks per group: 512 > > Filesystem created: Fri May 29 10:44:59 2009 > > Last mount time: Mon Jun 8 09:57:59 2009 > > Last write time: Mon Jun 8 13:33:00 2009 > > Mount count: 5 > > Maximum mount count: 21 > > Last checked: Fri May 29 10:44:59 2009 > > Check interval: 15552000 (6 months) > > Next check after: Wed Nov 25 09:44:59 2009 > > Reserved blocks uid: 0 (user root) > > Reserved blocks gid: 0 (group wheel) > > First inode: 11 > > Inode size: 256 > > Required extra isize: 28 > > Desired extra isize: 28 > > Journal inode: 8 > > Default directory hash: half_md4 > > Directory Hash Seed: 0904fd61-260b-467c-ae33-ba484e5d9f64 > > Journal backup: inode blocks > > > > > > Now it says the filesystem is clean so I mount it with. > > > > mount -t ext2fs /dev/ad6s1 /mnt > > > > ls /mnt > > > > ls: /mnt: Bad file descriptor > > > > > > OK so we look at tune2fs again and the filesystem is "not clean" now. > > > > tune2fs -l /dev/ad6s1 > > tune2fs 1.41.4 (27-Jan-2009) > > Filesystem volume name: > > Last mounted on: > > Filesystem UUID: fbb12204-b8fc-4f29-aab8-d2d9dd1ccbce > > Filesystem magic number: 0xEF53 > > Filesystem revision #: 1 (dynamic) > > Filesystem features: has_journal ext_attr resize_inode dir_index > filetype sparse_super large_file > > Filesystem flags: signed_directory_hash > > Default mount options: (none) > > Filesystem state: not clean > > Errors behavior: Continue > > Filesystem OS type: Linux > > Inode count: 15269888 > > Block count: 61049000 > > Reserved block count: 3052450 > > Free blocks: 36030598 > > Free inodes: 15269372 > > First block: 0 > > Block size: 4096 > > Fragment size: 4096 > > Reserved GDT blocks: 1009 > > Blocks per group: 32768 > > Fragments per group: 32768 > > Inodes per group: 8192 > > Inode blocks per group: 512 > > Filesystem created: Fri May 29 10:44:59 2009 > > Last mount time: Mon Jun 8 09:57:59 2009 > > Last write time: Mon Jun 8 13:37:47 2009 > > Mount count: 5 > > Maximum mount count: 21 > > Last checked: Fri May 29 10:44:59 2009 > > Check interval: 15552000 (6 months) > > Next check after: Wed Nov 25 09:44:59 2009 > > Reserved blocks uid: 0 (user root) > > Reserved blocks gid: 0 (group wheel) > > First inode: 11 > > Inode size: 256 > > Required extra isize: 28 > > Desired extra isize: 28 > > Journal inode: 8 > > Default directory hash: half_md4 > > Directory Hash Seed: 0904fd61-260b-467c-ae33-ba484e5d9f64 > > Journal backup: inode blocks > > > > So I umount the filesystem and run e2fsck with. > > > > e2fsck /dev/ad6s1 > > e2fsck 1.41.4 (27-Jan-2009) > > /dev/ad6s1: clean, 516/15269888 files, 25018402/61049000 blocks > > > > The filesystem now reports as "clean" again....So this is just a > vicious cycle which I can't > break..aaaaaaaaaaahhhhhhhhhhhhhhhhhhh....Please help...Losing sanity.. > > _________________________________________________________________ > Hotmail? has ever-growing storage! Don?t worry about storage limits. > > http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009_______________________________________________ > freebsd-fs@freebsd.orgmailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- -=-=-=-=-=-=-=- Live free or die - UNIX* -=-=-=-=-=-=-= From sarawgi.aditya at gmail.com Tue Jun 9 15:35:09 2009 From: sarawgi.aditya at gmail.com (Aditya Sarawgi) Date: Tue Jun 9 15:35:41 2009 Subject: Mounting ext3 under Freebsd In-Reply-To: References: Message-ID: <20090609100446.GA1095@aditya> On Mon, Jun 08, 2009 at 03:18:33PM -0400, Michael Starling wrote: > > Hello...This problem is driving me crazy as I thought it was fixed with the release of 7.2. I can mount the ext3 filesystem from a previous linux drive > but I can't access the data. it was my understanding that a patch has been incorporated into the 7.2 release ..Any ides as to why I might still be seeing this issue?.Thanks > > > > uname -a > FreeBSD BSD 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May 1 08:49:13 UTC 2009 > > OK..So here's what's happening after shutting down the linux box and placing the disk inside the BSD box. > > > > tune2fs -l /dev/ad6s1 > > tune2fs 1.41.4 (27-Jan-2009) > > Filesystem volume name: > > Last mounted on: > > Filesystem UUID: fbb12204-b8fc-4f29-aab8-d2d9dd1ccbce > > Filesystem magic number: 0xEF53 > > Filesystem revision #: 1 (dynamic) > > Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file > > Filesystem flags: signed_directory_hash > > Default mount options: (none) > > Filesystem state: clean > > Errors behavior: Continue > > Filesystem OS type: Linux > > Inode count: 15269888 > > Block count: 61049000 > > Reserved block count: 3052450 > > Free blocks: 36030598 > > Free inodes: 15269372 > > First block: 0 > > Block size: 4096 > > Fragment size: 4096 > > Reserved GDT blocks: 1009 > > Blocks per group: 32768 > > Fragments per group: 32768 > > Inodes per group: 8192 > > Inode blocks per group: 512 > > Filesystem created: Fri May 29 10:44:59 2009 > > Last mount time: Mon Jun 8 09:57:59 2009 > > Last write time: Mon Jun 8 13:33:00 2009 > > Mount count: 5 > > Maximum mount count: 21 > > Last checked: Fri May 29 10:44:59 2009 > > Check interval: 15552000 (6 months) > > Next check after: Wed Nov 25 09:44:59 2009 > > Reserved blocks uid: 0 (user root) > > Reserved blocks gid: 0 (group wheel) > > First inode: 11 > > Inode size: 256 The 7.2 Release doesn't support inode size other than 128 but this problem is fixed in 8.0. If you want support for inode size other than 128 in 7.2 you can use the following patch http://pflog.net/~floyd/ext2fs.diff. Cheers, Aditya Sarawgi From kmacy at freebsd.org Wed Jun 10 01:22:42 2009 From: kmacy at freebsd.org (Kip Macy) Date: Wed Jun 10 01:22:49 2009 Subject: heads up on prefetch tunable in ZFS Message-ID: <3c1674c90906091822x56be5f2bg5b6156618847cc21@mail.gmail.com> As far as I can tell systems that have less than 4GB are more often hurt by prefetched than helped. On i386 systems and systems with less than 4GB, as of 193878 prefetch is now disabled by default. I've added a prefetch enable tunable, to enable prefetching for those systems. The prefetch disable tunable will continue to unconditionally disable prefetching. Cheers, Kip From linimon at FreeBSD.org Wed Jun 10 05:16:11 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Jun 10 05:16:16 2009 Subject: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Message-ID: <200906100516.n5A5GAJM036067@freefall.freebsd.org> Old Synopsis: zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error New Synopsis: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jun 10 05:15:55 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135412 From fbsd at microhost.su Wed Jun 10 11:48:36 2009 From: fbsd at microhost.su (fbsd@microhost.su) Date: Wed Jun 10 11:48:48 2009 Subject: trouble adding hard disk Message-ID: Hello everybody! I have a trouble to add 500G new disk. uname: FreeBSD 7.2-RELEASE #0 harddisk connected via SATA controller ====== atapci0: port 0xb000-0xb00f,0xb400-0xb40f,0xb800-0xb80f,0xbc00-0xbc0f,0xc000-0xc01f,0xc400-0xc4ff irq 18 at device 0.0 on pci1 ====== dmesg | grep ad ===== ad0: 76319MB at ata0-master UDMA100 ad4: 476940MB at ata2-master SATA150 ===== I use sysinstall to setup slice using fdisk and bsdlabel to create partition. When I choose to write changes in bsdlabel i got error mounting: no such file or directory. before use sysinstall i create directory. mkdir /newdisk after sysinstall i got: ======= ls | grep ad4 ad4 ad4s1 ======= But when i doing "newfs -O2 /dev/ad4s1" i got an error: ===== skipped. 976633600 internal error: can't find block in cyl 0 ===== What i'm doing wrong? Tnanks for all answers :) Sorry about my english :) From fbsd at microhost.su Wed Jun 10 11:48:37 2009 From: fbsd at microhost.su (fbsd@microhost.su) Date: Wed Jun 10 11:48:48 2009 Subject: trouble adding hard disk Message-ID: Hello everybody! I have a trouble to add 500G new disk. uname: FreeBSD 7.2-RELEASE #0 harddisk connected via SATA controller ====== atapci0: port 0xb000-0xb00f,0xb400-0xb40f,0xb800-0xb80f,0xbc00-0xbc0f,0xc000-0xc01f,0xc400-0xc4ff irq 18 at device 0.0 on pci1 ====== dmesg | grep ad ===== ad0: 76319MB at ata0-master UDMA100 ad4: 476940MB at ata2-master SATA150 ===== I use sysinstall to setup slice using fdisk and bsdlabel to create partition. When I choose to write changes in bsdlabel i got error mounting: no such file or directory. before use sysinstall i create directory. mkdir /newdisk after sysinstall i got: ======= ls | grep ad4 ad4 ad4s1 ======= But when i doing "newfs -O2 /dev/ad4s1" i got an error: ===== skipped. 976633600 internal error: can't find block in cyl 0 ===== What i'm doing wrong? Tnanks for all answers :) Sorry about my english :) From kmacy at freebsd.org Wed Jun 10 21:37:16 2009 From: kmacy at freebsd.org (Kip Macy) Date: Wed Jun 10 21:37:24 2009 Subject: prefetch change in ZFS Message-ID: <3c1674c90906101437t5ff9fa72kcad59ff4afe4d1e3@mail.gmail.com> Prefetch is still enabled by default on non-i386 systems with more than 4GB. To unconditionally enable or disable, set "vfs.zfs.prefetch_enable" in loader.conf. Cheers, Kip From gtodd at bellanet.org Wed Jun 10 22:22:26 2009 From: gtodd at bellanet.org (Graham Todd) Date: Wed Jun 10 22:22:32 2009 Subject: prefetch change in ZFS In-Reply-To: <3c1674c90906101437t5ff9fa72kcad59ff4afe4d1e3@mail.gmail.com> References: <3c1674c90906101437t5ff9fa72kcad59ff4afe4d1e3@mail.gmail.com> Message-ID: <4A302C1A.4030803@bellanet.org> Kip Macy wrote: > Prefetch is still enabled by default on non-i386 systems with more > than 4GB. To unconditionally enable or disable, set > "vfs.zfs.prefetch_enable" in loader.conf. So is this a change of OID as well (i.e. to "vfs.zfs.prefetch_enable")? Here prefetch is enabled by default on a 2GB RAM 7.2-RELEASE system (ZFS version 6) by setting *vfs.zfs.prefetch_disable* to 0. ninga# sysctl vfs.zfs | grep fetch vfs.zfs.prefetch_disable: 0 cheers, From kmacy at freebsd.org Wed Jun 10 22:42:34 2009 From: kmacy at freebsd.org (Kip Macy) Date: Wed Jun 10 22:42:41 2009 Subject: prefetch change in ZFS In-Reply-To: <4A302C1A.4030803@bellanet.org> References: <3c1674c90906101437t5ff9fa72kcad59ff4afe4d1e3@mail.gmail.com> <4A302C1A.4030803@bellanet.org> Message-ID: <3c1674c90906101542m13c93aedx89c6b03eb188c033@mail.gmail.com> This is on HEAD until MFC. -Kip On Wed, Jun 10, 2009 at 2:56 PM, Graham Todd wrote: > Kip Macy wrote: >> Prefetch is still enabled by default on non-i386 systems with more >> than 4GB. To unconditionally enable or disable, set >> "vfs.zfs.prefetch_enable" in loader.conf. > > So is this a change of OID as well (i.e. to "vfs.zfs.prefetch_enable")? > > Here prefetch is enabled by default on a 2GB RAM 7.2-RELEASE system (ZFS > version 6) by setting *vfs.zfs.prefetch_disable* to 0. > > ninga# sysctl vfs.zfs | grep fetch > vfs.zfs.prefetch_disable: 0 > > cheers, > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From linimon at FreeBSD.org Thu Jun 11 20:30:05 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Thu Jun 11 20:31:01 2009 Subject: kern/135480: [zfs] panic: lock &arg.lock already initialized Message-ID: <200906112030.n5BKU3Rj022351@freefall.freebsd.org> Old Synopsis: (zfs) panic: lock &arg.lock already initialized New Synopsis: [zfs] panic: lock &arg.lock already initialized Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Jun 11 20:29:44 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135480 From linimon at FreeBSD.org Fri Jun 12 05:39:30 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Fri Jun 12 05:39:37 2009 Subject: kern/135469: [ufs] [panic] kernel crash on md operation in ufs_dirbad Message-ID: <200906120539.n5C5dTjd052780@freefall.freebsd.org> Old Synopsis: kernel crash on md operation in ufs_dirbad New Synopsis: [ufs] [panic] kernel crash on md operation in ufs_dirbad Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jun 12 05:39:12 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135469 From ilgiz at reid.ru Fri Jun 12 06:09:07 2009 From: ilgiz at reid.ru (=?utf-8?B?0JjQu9GM0LPQuNC3INCv0L3Rg9C30LDQutC+0LI=?=) Date: Fri Jun 12 06:09:14 2009 Subject: ZFS performance Message-ID: <262c68149962de9309da268bea23ecaa.squirrel@reid.ru> Hi all. Sorry for my english. I'm test ZFS 6 (7.1) and 13 (upgraded to 7.2) version of zpools (results here - http://www.reid.ru/freebsd/?p=1644 [in russian]) and see what performance is degraded (20-30%). Thread in FreeBSD forums - http://forums.freebsd.org/showthread.php?t=4663 What happen? -- ???????? ??????, ????????? ????????????? ??? "????" ?. ???, ?????????????? ????? 44/1 E-mail: ilgiz@reid.ru http://www.reid.ru/ ----------------------------- FreeBSD - The Power to Serve! From avg at icyb.net.ua Fri Jun 12 13:56:51 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Jun 12 13:57:04 2009 Subject: zfs related panic Message-ID: <4A325E9F.2080802@icyb.net.ua> This is on a recent stable/7 amd64, with zpool and filesystems upgraded to the latest version. I did zfs rollback xxx@yyy And then did ls on a directory in the rolled-back fs. I have the core file if it can be of any help. Sleeping thread (tid 100263, pid 2432) owns a non-sleepable lock sched_switch() at 0xffffffff8031d0ef = sched_switch+0x47d mi_switch() at 0xffffffff80302a59 = mi_switch+0x1bf sleepq_switch() at 0xffffffff8032f645 = sleepq_switch+0xd8 sleepq_catch_signals() at 0xffffffff8032f925 = sleepq_catch_signals+0x2db sleepq_wait_sig() at 0xffffffff80330219 = sleepq_wait_sig+0xc _sleep() at 0xffffffff80302eba = _sleep+0x2b5 kern_sigsuspend() at 0xffffffff802fc567 = kern_sigsuspend+0xeb sigsuspend() at 0xffffffff802fc5e9 = sigsuspend+0x34 syscall() at 0xffffffff80491d2d = syscall+0x347 Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab --- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80092ce3c, rsp = 0x7fffffffdee8, rbp = 0x8011e5a60 --- panic: sleeping thread cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff80192dd5 = db_trace_self_wrapper+0x2a kdb_backtrace() at 0xffffffff80327ea7 = kdb_backtrace+0x32 panic() at 0xffffffff802fb70c = panic+0x1b0 propagate_priority() at 0xffffffff80332e92 = propagate_priority+0x122 turnstile_wait() at 0xffffffff80333e29 = turnstile_wait+0x358 _mtx_lock_sleep() at 0xffffffff802ed64a = _mtx_lock_sleep+0x117 cache_lookup() at 0xffffffff8036a52a = cache_lookup+0x632 vfs_cache_lookup() at 0xffffffff8036a69f = vfs_cache_lookup+0xab VOP_LOOKUP_APV() at 0xffffffff804c86f3 = VOP_LOOKUP_APV+0x51 lookup() at 0xffffffff80370a71 = lookup+0x5d8 namei() at 0xffffffff8037168f = namei+0x320 kern_lstat() at 0xffffffff8037f6ca = kern_lstat+0x5e lstat() at 0xffffffff8037f8c9 = lstat+0x25 syscall() at 0xffffffff80491d2d = syscall+0x347 Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab --- syscall (190, FreeBSD ELF64, lstat), rip = 0x80095afbc, rsp = 0x7fffffffdde8, rbp = 0x800b50270 --- -- Andriy Gapon From serenity at exscape.org Fri Jun 12 17:49:00 2009 From: serenity at exscape.org (Thomas Backman) Date: Fri Jun 12 17:49:07 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere Message-ID: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> OK, so I filed a PR late May (kern/135050): http://www.freebsd.org/cgi/query-pr.cgi?pr=135050 . I don't know if this is a "feature" or a bug, but it really should be considered the latter. The data could be repaired in the background without the user ever knowing - until the disk dies completely. I'd prefer to have warning signs (i.e. checksum errors) so that I can buy a replacement drive *before* that. Not only does this mean that errors can go unnoticed, but also that it's impossible to figure out which disk is broken, if ZFS has *temporarily* repaired the broken data! THAT is REALLY bad! Is this something that we can expect to see changed before 8.0-RELEASE? BTW, note that the md5sums always check out (good!), and that it never mentions "x MB repaired" when repairing silent damage (bad!), but only when scrubbing. Scrubbing may be a hard task with a dying disk - I haven't tried it, but I'd guess so. Regards, Thomas PS. I'm not subscribed to fs@, so please CC me if you read this message over there. [root@clone ~]# uname -a FreeBSD clone.exscape.org 8.0-CURRENT FreeBSD 8.0-CURRENT #0 r194059M: Fri Jun 12 18:25:05 CEST 2009 root@clone.exscape.org:/usr/obj/usr/ src/sys/DTRACE amd64 [root@clone ~]# sysctl kern.geom.debugflags=0x10 ### To allow overwriting of the disk kern.geom.debugflags: 0 -> 16 [root@clone ~]# zpool create test raidz da1 da2 da3 [root@clone ~]# dd if=/dev/random of=/test/testfile bs=1000k dd: /test/testfile: No space left on device 188+0 records in 187+1 records out 192413696 bytes transferred in 105.004322 secs (1832436 bytes/sec) [root@clone ~]# dd if=/dev/random of=/dev/da3 bs=1000k count=10 seek=80 10+0 records in 10+0 records out 10240000 bytes transferred in 0.838391 secs (12213871 bytes/sec) [root@clone ~]# cat /test/testfile > /dev/null [root@clone ~]# zpool status -xv pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 92 errors: No known data errors [root@clone ~]# reboot --- immediately after reboot --- [root@clone ~]# zpool status -xv pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 1 errors: No known data errors [root@clone ~]# zpool scrub test (...) [root@clone ~]# zpool status -xv pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h0m with 0 errors on Fri Jun 12 19:11:36 2009 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 88 2.72M repaired errors: No known data errors [root@clone ~]# reboot --- immediately after reboot, again --- [root@clone ~]# zpool status -xv all pools are healthy [root@clone ~]# zpool status -v test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 errors: No known data errors [root@clone ~]# ----------------- even more testing, no scrub this time ----------------- [root@clone ~]# sysctl kern.geom.debugflags=0x10 kern.geom.debugflags: 0 -> 16 [root@clone ~]# md5 /test/testfile && dd if=/dev/random of=/dev/da2 bs=1000k count=10 seek=40 ; md5 /test/testfile MD5 (/test/testfile) = 510479f16592bf66e7ba63c0a4dda0b6 10+0 records in 10+0 records out 10240000 bytes transferred in 0.901645 secs (11357020 bytes/sec) MD5 (/test/testfile) = 510479f16592bf66e7ba63c0a4dda0b6 [root@clone ~]# zpool status -xv pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 104 da3 ONLINE 0 0 0 errors: No known data errors [root@clone ~]# reboot --- immediately after reboot, yet again --- [root@clone ~]# md5 /test/testfile MD5 (/test/testfile) = 510479f16592bf66e7ba63c0a4dda0b6 [root@clone ~]# zpool status -xv pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 3 da3 ONLINE 0 0 0 errors: No known data errors [root@clone ~]# reboot --- immediately after reboot, yet *again* --- [root@clone ~]# md5 /test/testfile MD5 (/test/testfile) = 510479f16592bf66e7ba63c0a4dda0b6 [root@clone ~]# zpool status -xv all pools are healthy [root@clone ~]# zpool status -v test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 errors: No known data errors [root@clone ~]# zpool history -il test History for 'test': 2009-06-12.19:03:43 zpool create test raidz da1 da2 da3 [user root on clone.exscape.org:global] 2009-06-12.19:10:42 [internal pool scrub txg:160] func=1 mintxg=0 maxtxg=160 [user root on clone.exscape.org] 2009-06-12.19:10:44 zpool scrub test [user root on clone.exscape.org:global] 2009-06-12.19:11:36 [internal pool scrub done txg:162] complete=1 [user root on clone.exscape.org] From kmacy at freebsd.org Fri Jun 12 20:54:43 2009 From: kmacy at freebsd.org (Kip Macy) Date: Fri Jun 12 20:54:49 2009 Subject: zfs related panic In-Reply-To: <4A325E9F.2080802@icyb.net.ua> References: <4A325E9F.2080802@icyb.net.ua> Message-ID: <3c1674c90906121354s6d6ae7ben5082708b1586e94f@mail.gmail.com> show sleepchain show thread 100263 On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapon wrote: > > This is on a recent stable/7 amd64, with zpool and filesystems upgraded to the > latest version. > I did zfs rollback xxx@yyy > And then did ls on a directory in the rolled-back fs. > > I have the core file if it can be of any help. > > Sleeping thread (tid 100263, pid 2432) owns a non-sleepable lock > sched_switch() at 0xffffffff8031d0ef = sched_switch+0x47d > mi_switch() at 0xffffffff80302a59 = mi_switch+0x1bf > sleepq_switch() at 0xffffffff8032f645 = sleepq_switch+0xd8 > sleepq_catch_signals() at 0xffffffff8032f925 = sleepq_catch_signals+0x2db > sleepq_wait_sig() at 0xffffffff80330219 = sleepq_wait_sig+0xc > _sleep() at 0xffffffff80302eba = _sleep+0x2b5 > kern_sigsuspend() at 0xffffffff802fc567 = kern_sigsuspend+0xeb > sigsuspend() at 0xffffffff802fc5e9 = sigsuspend+0x34 > syscall() at 0xffffffff80491d2d = syscall+0x347 > Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab > --- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80092ce3c, rsp = > 0x7fffffffdee8, rbp = 0x8011e5a60 --- > panic: sleeping thread > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at 0xffffffff80192dd5 = db_trace_self_wrapper+0x2a > kdb_backtrace() at 0xffffffff80327ea7 = kdb_backtrace+0x32 > panic() at 0xffffffff802fb70c = panic+0x1b0 > propagate_priority() at 0xffffffff80332e92 = propagate_priority+0x122 > turnstile_wait() at 0xffffffff80333e29 = turnstile_wait+0x358 > _mtx_lock_sleep() at 0xffffffff802ed64a = _mtx_lock_sleep+0x117 > cache_lookup() at 0xffffffff8036a52a = cache_lookup+0x632 > vfs_cache_lookup() at 0xffffffff8036a69f = vfs_cache_lookup+0xab > VOP_LOOKUP_APV() at 0xffffffff804c86f3 = VOP_LOOKUP_APV+0x51 > lookup() at 0xffffffff80370a71 = lookup+0x5d8 > namei() at 0xffffffff8037168f = namei+0x320 > kern_lstat() at 0xffffffff8037f6ca = kern_lstat+0x5e > lstat() at 0xffffffff8037f8c9 = lstat+0x25 > syscall() at 0xffffffff80491d2d = syscall+0x347 > Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab > --- syscall (190, FreeBSD ELF64, lstat), rip = 0x80095afbc, rsp = 0x7fffffffdde8, > rbp = 0x800b50270 --- > > -- > Andriy Gapon > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From kip.macy at gmail.com Fri Jun 12 21:01:59 2009 From: kip.macy at gmail.com (Kip Macy) Date: Fri Jun 12 21:02:11 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> Message-ID: <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> On Fri, Jun 12, 2009 at 10:32 AM, Thomas Backman wrote: > OK, so I filed a PR late May (kern/135050): > http://www.freebsd.org/cgi/query-pr.cgi?pr=135050?. > I don't know if this is a "feature" or a bug, but it really should be > considered the latter. The data could be repaired in the background without > the user ever knowing - until the disk dies completely. I'd prefer to have > warning signs (i.e. checksum errors) so that I can buy a replacement drive > *before* that. > > Not only does this mean that errors can go unnoticed, but also that it's > impossible to figure out which disk is broken, if ZFS has *temporarily* > repaired the broken data! THAT is REALLY bad! > Is this something that we can expect to see changed before 8.0-RELEASE? I'm fairly certain that we've discussed this already. Solaris uses FMA - I don't think that I'll get to a "real fix" any time soon. The time that I do have will go to addressing stability problems (memory over-allocation, NFS interaction, control directory mounts) all of which cause panics. Maintaining them persistently in the label doesn't make sense - when do you drop them? Would a simple log message about the number of checksum errors suffice? Cheers, Kip From mlists at pmade.com Fri Jun 12 21:05:07 2009 From: mlists at pmade.com (Peter Jones) Date: Fri Jun 12 21:05:13 2009 Subject: Logical Disk to Physical Drive Mapping Message-ID: <86ljnxyy01.fsf@pmade.com> Given the situation where you have several identical physical drives, what is the best way to turn logical labels such as da5 into a physical identifier like "the drive in slot 4"? It looks like I could use dmesg, some assumptions, and glabel to label the logical disks. However, I plan to use ZFS and as far as I can tell glabel doesn't support ZFS. What is the de facto way of doing this? I'll be using FreeBSD-CURRENT for this, btw. -- Peter Jones, http://pmade.com pmade inc. Louisville, CO US From kmacy at freebsd.org Fri Jun 12 21:39:30 2009 From: kmacy at freebsd.org (Kip Macy) Date: Fri Jun 12 21:39:37 2009 Subject: ZFS performance In-Reply-To: <262c68149962de9309da268bea23ecaa.squirrel@reid.ru> References: <262c68149962de9309da268bea23ecaa.squirrel@reid.ru> Message-ID: <3c1674c90906121439id0aaffdk8eb410a1870ad3a5@mail.gmail.com> On Thu, Jun 11, 2009 at 10:55 PM, ?????? ???????? wrote: > Hi all. Sorry for my english. > I'm test ZFS 6 (7.1) and 13 (upgraded to 7.2) version of zpools (results > here - http://www.reid.ru/freebsd/?p=1644 [in russian]) and see what > performance is degraded (20-30%). > Thread in FreeBSD forums - http://forums.freebsd.org/showthread.php?t=4663 > What happen? I don't know. I have a private branch where I've done a lot of work on read and write performance. When these changes make it back, they should at the very least recover the 30% loss. Cheers, From bp at barryp.org Fri Jun 12 22:15:25 2009 From: bp at barryp.org (Barry Pederson) Date: Fri Jun 12 22:15:31 2009 Subject: Logical Disk to Physical Drive Mapping In-Reply-To: <86ljnxyy01.fsf@pmade.com> References: <86ljnxyy01.fsf@pmade.com> Message-ID: <4A32CF01.4010004@barryp.org> Peter Jones wrote: > Given the situation where you have several identical physical drives, > what is the best way to turn logical labels such as da5 into a physical > identifier like "the drive in slot 4"? > > It looks like I could use dmesg, some assumptions, and glabel to label > the logical disks. However, I plan to use ZFS and as far as I can tell > glabel doesn't support ZFS. > > What is the de facto way of doing this? I'll be using FreeBSD-CURRENT > for this, btw. I've glabeled disks and then added them to ZFS pools, seems to work fine. Here's a raidz2 setup of 8 identical glabeled drives on 7.2 -------- # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2 ONLINE 0 0 0 label/15_01 ONLINE 0 0 0 label/15_02 ONLINE 0 0 0 label/15_03 ONLINE 0 0 0 label/15_04 ONLINE 0 0 0 label/15_05 ONLINE 0 0 0 label/15_06 ONLINE 0 0 0 label/15_07a ONLINE 0 0 0 label/15_08 ONLINE 0 0 0 errors: No known data errors ------ Barry From morganw at chemikals.org Fri Jun 12 23:40:05 2009 From: morganw at chemikals.org (Wes Morgan) Date: Fri Jun 12 23:40:13 2009 Subject: Logical Disk to Physical Drive Mapping In-Reply-To: <86ljnxyy01.fsf@pmade.com> References: <86ljnxyy01.fsf@pmade.com> Message-ID: On Fri, 12 Jun 2009, Peter Jones wrote: > Given the situation where you have several identical physical drives, > what is the best way to turn logical labels such as da5 into a physical > identifier like "the drive in slot 4"? > > It looks like I could use dmesg, some assumptions, and glabel to label > the logical disks. However, I plan to use ZFS and as far as I can tell > glabel doesn't support ZFS. > > What is the de facto way of doing this? I'll be using FreeBSD-CURRENT > for this, btw. Use ATA_STATIC_ID for the ATA subsystem to prevent unit numbers from changing when devices are added or removed. For SCSI devices, you can wire down the naming scheme with something like this in /boot/device.hints: hint.scbus.0.at="mpt0" hint.da.0.at="scbus0" hint.da.0.target="0" hint.da.1.at="scbus0" hint.da.1.target="1" hint.da.2.at="scbus0" hint.da.2.target="2" hint.da.3.at="scbus0" hint.da.3.target="3" From serenity at exscape.org Sat Jun 13 07:32:18 2009 From: serenity at exscape.org (Thomas Backman) Date: Sat Jun 13 07:32:25 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> Message-ID: On Jun 12, 2009, at 11:01 PM, Kip Macy wrote: > On Fri, Jun 12, 2009 at 10:32 AM, Thomas > Backman wrote: >> OK, so I filed a PR late May (kern/135050): >> http://www.freebsd.org/cgi/query-pr.cgi?pr=135050 . >> I don't know if this is a "feature" or a bug, but it really should be >> considered the latter. The data could be repaired in the background >> without >> the user ever knowing - until the disk dies completely. I'd prefer >> to have >> warning signs (i.e. checksum errors) so that I can buy a >> replacement drive >> *before* that. >> >> Not only does this mean that errors can go unnoticed, but also that >> it's >> impossible to figure out which disk is broken, if ZFS has >> *temporarily* >> repaired the broken data! THAT is REALLY bad! >> Is this something that we can expect to see changed before 8.0- >> RELEASE? > > > I'm fairly certain that we've discussed this already. Solaris uses FMA > - I don't think that I'll get to a "real fix" any time soon. The time > that I do have will go to addressing stability problems (memory > over-allocation, NFS interaction, control directory mounts) all of > which cause panics. Maintaining them persistently in the label doesn't > make sense - when do you drop them? Would a simple log message about > the number of checksum errors suffice? > > Cheers, > Kip Yes, I suppose a log message would be OK, especially if there's a semi- simple way of mailing root automatically (either by the ZFS libs themselves, or by a simple log analyzer daemon that I'm sure there are plenty of already). I do think that storing them in the label does make sense, though, but if Solaris doesn't do it, I suppose we shouldn't, either. IF stored that way, they should IMHO remain until a "zpool clear" is executed on device (a device that causes errors is a device that causes errors - most of the time, this is a great way for the disk to say "hey, I'm dying here!"). In practice, this clearing is already done on reboot (although the relevant functions are of course never called). Regards, Thomas From pjd at FreeBSD.org Sat Jun 13 15:06:33 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sat Jun 13 15:06:39 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> Message-ID: <20090613150627.GB1848@garage.freebsd.pl> On Fri, Jun 12, 2009 at 02:01:57PM -0700, Kip Macy wrote: > On Fri, Jun 12, 2009 at 10:32 AM, Thomas Backman wrote: > > OK, so I filed a PR late May (kern/135050): > > http://www.freebsd.org/cgi/query-pr.cgi?pr=135050?. > > I don't know if this is a "feature" or a bug, but it really should be > > considered the latter. The data could be repaired in the background without > > the user ever knowing - until the disk dies completely. I'd prefer to have > > warning signs (i.e. checksum errors) so that I can buy a replacement drive > > *before* that. > > > > Not only does this mean that errors can go unnoticed, but also that it's > > impossible to figure out which disk is broken, if ZFS has *temporarily* > > repaired the broken data! THAT is REALLY bad! > > Is this something that we can expect to see changed before 8.0-RELEASE? > > > I'm fairly certain that we've discussed this already. Solaris uses FMA > - I don't think that I'll get to a "real fix" any time soon. The time > that I do have will go to addressing stability problems (memory > over-allocation, NFS interaction, control directory mounts) all of > which cause panics. Maintaining them persistently in the label doesn't > make sense - when do you drop them? Would a simple log message about > the number of checksum errors suffice? We do log such errors. Solaris uses FMA and for FreeBSD I use devd. You can find the following entry in /etc/devd.conf: notify 10 { match "system" "ZFS"; match "type" "checksum"; action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; }; If you see nothing in your logs, there must be a bug with reporting the problem somewhere or devd is not running (it should be enabled by default). -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090613/7dbb55ba/attachment.pgp From serenity at exscape.org Sat Jun 13 15:13:14 2009 From: serenity at exscape.org (Thomas Backman) Date: Sat Jun 13 15:13:26 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <20090613150627.GB1848@garage.freebsd.pl> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> <20090613150627.GB1848@garage.freebsd.pl> Message-ID: On Jun 13, 2009, at 05:06 PM, Pawel Jakub Dawidek wrote: > On Fri, Jun 12, 2009 at 02:01:57PM -0700, Kip Macy wrote: >> On Fri, Jun 12, 2009 at 10:32 AM, Thomas >> Backman wrote: >>> OK, so I filed a PR late May (kern/135050): >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=135050 . >>> I don't know if this is a "feature" or a bug, but it really should >>> be >>> considered the latter. The data could be repaired in the >>> background without >>> the user ever knowing - until the disk dies completely. I'd prefer >>> to have >>> warning signs (i.e. checksum errors) so that I can buy a >>> replacement drive >>> *before* that. >>> >>> Not only does this mean that errors can go unnoticed, but also >>> that it's >>> impossible to figure out which disk is broken, if ZFS has >>> *temporarily* >>> repaired the broken data! THAT is REALLY bad! >>> Is this something that we can expect to see changed before 8.0- >>> RELEASE? >> >> >> I'm fairly certain that we've discussed this already. Solaris uses >> FMA >> - I don't think that I'll get to a "real fix" any time soon. The time >> that I do have will go to addressing stability problems (memory >> over-allocation, NFS interaction, control directory mounts) all of >> which cause panics. Maintaining them persistently in the label >> doesn't >> make sense - when do you drop them? Would a simple log message >> about >> the number of checksum errors suffice? > > We do log such errors. Solaris uses FMA and for FreeBSD I use devd. > You > can find the following entry in /etc/devd.conf: > ... > If you see nothing in your logs, there must be a bug with reporting > the > problem somewhere or devd is not running (it should be enabled by > default). Awesome! After checking further I did indeed find a bunch of such messages in messages.0.bz2. One thing less to worry about, I guess. :) Regards, Thomas From linimon at FreeBSD.org Sat Jun 13 16:33:08 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sat Jun 13 16:33:19 2009 Subject: kern/135546: [zfs] zfs.ko module doesn't ignore zpool.cache filename supplied by loader Message-ID: <200906131633.n5DGX7mk014571@freefall.freebsd.org> Old Synopsis: zfs.ko module doesn't ignore zpool.cache filename supplied by loader New Synopsis: [zfs] zfs.ko module doesn't ignore zpool.cache filename supplied by loader Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Jun 13 16:32:52 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135546 From stas at FreeBSD.org Sat Jun 13 16:55:44 2009 From: stas at FreeBSD.org (Stanislav Sedov) Date: Sat Jun 13 16:55:51 2009 Subject: Logical Disk to Physical Drive Mapping In-Reply-To: <86ljnxyy01.fsf@pmade.com> References: <86ljnxyy01.fsf@pmade.com> Message-ID: <20090613205648.9840e240.stas@FreeBSD.org> On Fri, 12 Jun 2009 14:53:50 -0600 Peter Jones mentioned: > Given the situation where you have several identical physical drives, > what is the best way to turn logical labels such as da5 into a physical > identifier like "the drive in slot 4"? > > It looks like I could use dmesg, some assumptions, and glabel to label > the logical disks. However, I plan to use ZFS and as far as I can tell > glabel doesn't support ZFS. > If you're using ZFS you probably don't need labels at all. AFAIK, ZFS stores all of its information in the on-disk metadata, and you always access data via ZFS volume labels. -- Stanislav Sedov ST4096-RIPE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090613/95b9481a/attachment.pgp From dan.naumov at gmail.com Sat Jun 13 21:40:03 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sat Jun 13 21:40:09 2009 Subject: misc/118855: [zfs] ZFS-related commands are nonfunctional in fixit shell. Message-ID: <200906132140.n5DLe2KG069797@freefall.freebsd.org> The following reply was made to PR misc/118855; it has been noted by GNATS. From: Dan Naumov To: bug-followup@FreeBSD.org, erik.swanson@gmail.com Cc: Subject: Re: misc/118855: [zfs] ZFS-related commands are nonfunctional in fixit shell. Date: Sun, 14 Jun 2009 00:32:54 +0300 I am also having this very same problem on my 7.2-RELEASE/amd64, kinda hard to debug ZFS issues when you cannot interact with ZFS from Fixit... - Dan Naumov From james-freebsd-current at jrv.org Sat Jun 13 21:42:08 2009 From: james-freebsd-current at jrv.org (James R. Van Artsdalen) Date: Sat Jun 13 21:42:15 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <20090613150627.GB1848@garage.freebsd.pl> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> <20090613150627.GB1848@garage.freebsd.pl> Message-ID: <4A3411EF.5000307@jrv.org> Pawel Jakub Dawidek wrote: > > We do log such errors. Solaris uses FMA and for FreeBSD I use devd. You > can find the following entry in /etc/devd.conf: > > notify 10 { > match "system" "ZFS"; > match "type" "checksum"; > action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; > }; > > If you see nothing in your logs, there must be a bug with reporting the > problem somewhere or devd is not running (it should be enabled by > default). > Looking at vsyslog(3), I don't think logger(1) can ever log with facility KERN. LOG_KERN is 0, so this in vsyslog /* Set default facility if none specified. */ if ((pri & LOG_FACMASK) == 0) pri |= LogFacility; will always change the KERN facility is to LogFacility, which defaults to LOG_USER. So the devd output is really going to user.warn and a syslog.conf line like kern.* /var/log/kernel.log will capture kernel messages, but not the devd logger output, and if you look in kernel.log you won't find the checksum errors. From dan.naumov at gmail.com Sat Jun 13 22:07:25 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sat Jun 13 22:07:32 2009 Subject: misc/118855: [zfs] ZFS-related commands are nonfunctional in fixit shell Message-ID: Hello list. >From http://www.freebsd.org/cgi/query-pr.cgi?pr=misc/118855 : "The zfs and zpool commands are nonfunctional in the fixit shell of the 7.0-BETA4-i386-livefs.iso cd image. When either command is run, the result is "internal error: failed to initialize ZFS library"." This PR was submitted on Dec 19 2007 and the same issue still persists on 7.2-RELEASE, is there anything being done to fix this? I am evaluating a small-time deployment of ZFS, but not being able to debug anything ZFS related from within a Fixit envronment if/when things go south is a huge "NO-GO" sign. Sincerely, - Dan Naumov From noackjr at alumni.rice.edu Sun Jun 14 05:19:30 2009 From: noackjr at alumni.rice.edu (Jonathan Noack) Date: Sun Jun 14 05:19:38 2009 Subject: Booting from ZFS raidz In-Reply-To: <9cc826f0720e1624489dd6e6d384babc.squirrel@www.noacks.org> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090201072432.GA25276@server.vk2pj.dyndns.org> <246ecf0c87f944d70c5562eeed4165c9@mail.rabson.org> <9cc826f0720e1624489dd6e6d384babc.squirrel@www.noacks.org> Message-ID: On Fri, May 15, 2009 19:07, Jonathan Noack wrote: > On Thu, May 14, 2009 10:25, Doug Rabson wrote: >> I fixed a bug in the patch. Try this version: >> http://people.freebsd.org/~dfr/raidzboot-14052009.diff > > I know the bug fix was for booting from degraded pools, but I can at least > give you a "no regression" report. I just set up a new amd64 box and was > able to boot from a raidz1 pool using your latest patch. > > Getting this working from scratch was tedious but not too complicated. I > followed lulf's instructions > (http://blogs.freebsdish.org/lulf/2008/12/16/setting-up-a-zfs-only-system/) > using the May snapshot fixit CD. Only differences were that I set up all > 4 disks with gpart (identically), created a raidz1 pool, and used a > patched gptzfsboot that I cross-compiled on my 7.2 i386 box for the > bootcode (applied to all 4 disks). > > If only I had remembered to patch my /usr/src tree before rebuilding world > and rebooting... *sigh* Once more unto the fixit breach... :) This (and the committed version) had been working fine for me on my stock amd64 CURRENT system until I rebuilt world/kernel on 5/30 and rebooted. I get the following error on boot (hand transcribed so hopefully I didn't screw it up): ************************************************************ ZFS: i/o error - all block copies unavailable ZFS: can't read object set for dataset lld Can't find root filesystem - giving up ZFS: unexpected object set type lld ZFS: unexpected object set type lld FreeBSD/i386 boot Default: tank:/boot/kernel/kernerl boot: ZFS: unexpected object set type lld FreeBSD/i386 boot Default: tank:/boot/kernel/kernel boot: ************************************************************ The previously working world/kernel was from 5/26. I haven't had much time to troubleshoot until today. I can use the fixit CD to access the ZFS pool with no issues; the problem appears to just be the boot code. I cross-built a fresh world on my i386 system today, reinstalled everything in /boot, reinstalled gptzfsboot, and still got the same results. What steps should I take to troubleshoot and resolve this? Thank, -Jon From serenity at exscape.org Sun Jun 14 07:50:04 2009 From: serenity at exscape.org (Thomas Backman) Date: Sun Jun 14 07:50:10 2009 Subject: kern/135050: [zfs] ZFS clears/hides disk errors on reboot Message-ID: <200906140750.n5E7o2bN069089@freefall.freebsd.org> The following reply was made to PR kern/135050; it has been noted by GNATS. From: Thomas Backman To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/135050: [zfs] ZFS clears/hides disk errors on reboot Date: Sun, 14 Jun 2009 09:25:01 +0200 Apparently, errors like these are actually logged to syslog, and thus not completely hidden at all. By adding a line to your /etc/devd.conf you can even get an email notification automatically the instant an error is logged. Very nice. See this post: http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008149.html Regards, Thomas From dan.naumov at gmail.com Sun Jun 14 09:50:03 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 14 09:50:13 2009 Subject: misc/118855: [zfs] ZFS-related commands are nonfunctional in fixit shell. Message-ID: <200906140950.n5E9o3no066824@freefall.freebsd.org> The following reply was made to PR misc/118855; it has been noted by GNATS. From: Dan Naumov To: bug-followup@FreeBSD.org, erik.swanson@gmail.com Cc: Subject: Re: misc/118855: [zfs] ZFS-related commands are nonfunctional in fixit shell. Date: Sun, 14 Jun 2009 12:43:20 +0300 This should be moved to -docs, here is why: I managed to figure it out after having some of my hair go gray: when you are in FIXIT, you have to do "kldload /dist/boot/kernel/opensolaris.ko; kldload /dist/boot/kernel/zfs.ko" in that particular order (because automatic loading of kernel module dependencies does not work in FIXIT). After this, "zpool" and "zfs" will start working. The ZFS part of the Handbook ( http://www.freebsd.org/doc/en/books/handbook/filesystems-zfs.html ) makes no mention about this, I think a small note in there is in order. Sincerely, Dan Naumov From morganw at chemikals.org Sun Jun 14 12:21:58 2009 From: morganw at chemikals.org (Wes Morgan) Date: Sun Jun 14 12:22:04 2009 Subject: Logical Disk to Physical Drive Mapping In-Reply-To: <20090613205648.9840e240.stas@FreeBSD.org> References: <86ljnxyy01.fsf@pmade.com> <20090613205648.9840e240.stas@FreeBSD.org> Message-ID: On Sat, 13 Jun 2009, Stanislav Sedov wrote: > On Fri, 12 Jun 2009 14:53:50 -0600 > Peter Jones mentioned: > >> Given the situation where you have several identical physical drives, >> what is the best way to turn logical labels such as da5 into a physical >> identifier like "the drive in slot 4"? >> >> It looks like I could use dmesg, some assumptions, and glabel to label >> the logical disks. However, I plan to use ZFS and as far as I can tell >> glabel doesn't support ZFS. >> > > If you're using ZFS you probably don't need labels at all. AFAIK, ZFS > stores all of its information in the on-disk metadata, and you always > access data via ZFS volume labels. It does, but even in -current I have to export/import a pool if the device numbering shifts, and "zpool status" output could make your heart skip a beat if you didn't know how to fix it :) It might be kludgy (a chicken/egg type problem), but couldn't glabel be extended to read ZFS labels and create something like /dev/zpools/, and then zfs look there first for devices to import? From pjd at FreeBSD.org Sun Jun 14 15:49:44 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sun Jun 14 15:49:56 2009 Subject: ZFS: Silent/hidden errors, nothing logged anywhere In-Reply-To: <4A3411EF.5000307@jrv.org> References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> <20090613150627.GB1848@garage.freebsd.pl> <4A3411EF.5000307@jrv.org> Message-ID: <20090614154938.GC1848@garage.freebsd.pl> On Sat, Jun 13, 2009 at 03:54:07PM -0500, James R. Van Artsdalen wrote: > Pawel Jakub Dawidek wrote: > > > > We do log such errors. Solaris uses FMA and for FreeBSD I use devd. You > > can find the following entry in /etc/devd.conf: > > > > notify 10 { > > match "system" "ZFS"; > > match "type" "checksum"; > > action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; > > }; > > > > If you see nothing in your logs, there must be a bug with reporting the > > problem somewhere or devd is not running (it should be enabled by > > default). > > > > Looking at vsyslog(3), I don't think logger(1) can ever log with > facility KERN. LOG_KERN is 0, so this in vsyslog > > /* Set default facility if none specified. */ > if ((pri & LOG_FACMASK) == 0) > pri |= LogFacility; > > will always change the KERN facility is to LogFacility, which defaults > to LOG_USER. > > So the devd output is really going to user.warn and a syslog.conf line like > > kern.* /var/log/kernel.log > > will capture kernel messages, but not the devd logger output, and if you > look in kernel.log you won't find the checksum errors. Could be, I'm most of the time just use *.* /var/log/all.log. We could easly log directly from inside the kernel, but this is just an example devd entry so one can replace it with, eg. mailing the problem to the system administrator or whatever. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090614/a32b59b9/attachment.pgp From dan.naumov at gmail.com Sun Jun 14 16:16:24 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 14 16:16:31 2009 Subject: Does this disk/filesystem layout look sane to you? Message-ID: Hello list. I just wanted to have an extra pair (or a dozen) of eyes look this configuration over before I commit to it (tested it in VMWare just in case, it works, so I am considering doing this on real hardware soon). I drew a nice diagram: http://www.pastebin.ca/1460089 Since it doesnt show on the diagram, let me clarify that the geom mirror consumers as well as the vdevz for ZFS RAIDZ are going to be partitions (raw disk => full disk slice => swap partition | mirror provider partition | zfs vdev partition | unused. Is there any actual downside to having a 5-way mirror vs a 2-way or a 3-way one? - Sincerely, Dan Naumov From bugmaster at FreeBSD.org Mon Jun 15 11:06:53 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jun 15 11:07:56 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200906151106.n5FB6rWr076889@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135480 fs [zfs] panic: lock &arg.lock already initialized o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135412 fs [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREA o bin/135314 fs [zfs] assertion failed for zdb(8) usage o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/135039 fs [zfs] mkstemp() fails over NFS when server uses ZFS (7 f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 143 problems total. From linimon at FreeBSD.org Mon Jun 15 17:24:31 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Jun 15 17:24:43 2009 Subject: kern/135594: [zfs] Single dataset unresponsive with Samba Message-ID: <200906151724.n5FHOUIW077304@freefall.freebsd.org> Synopsis: [zfs] Single dataset unresponsive with Samba Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Jun 15 17:24:18 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=135594 From avg at icyb.net.ua Tue Jun 16 15:00:41 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Jun 16 15:00:50 2009 Subject: zfs related panic In-Reply-To: <3c1674c90906121354s6d6ae7ben5082708b1586e94f@mail.gmail.com> References: <4A325E9F.2080802@icyb.net.ua> <3c1674c90906121354s6d6ae7ben5082708b1586e94f@mail.gmail.com> Message-ID: <4A37B395.20506@icyb.net.ua> on 12/06/2009 23:54 Kip Macy said the following: > show sleepchain I can only do post-mortem using jhb's scripts for kgdb: (kgdb) sleepchain 2432 thread 100263 (pid 2432, tcsh) non-lock sleep lockchain 2432 thread 100263 (pid 2432, tcsh) inhibited Not sure if this correct though and what this means. > show thread 100263 (kgdb) thr 250 [Switching to thread 250 (Thread 100263)]#0 sched_switch (td=0xffffff000cfad720, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 1944 cpuid = PCPU_GET(cpuid); (kgdb) backtrace #0 sched_switch (td=0xffffff000cfad720, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xffffffff80302a59 in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:444 #2 0xffffffff8032f645 in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xffffffff8032f925 in sleepq_catch_signals (wchan=0xffffff011440e548) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xffffffff80330219 in sleepq_wait_sig (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xffffffff80302eba in _sleep (ident=0xffffff011440e548, lock=0xffffff011440e5a0, priority=360, wmesg=0xffffffff80508788 "pause", timo=0) at /usr/src/sys/kern/kern_synch.c:228 #6 0xffffffff802fc567 in kern_sigsuspend (td=Variable "td" is not available. ) at /usr/src/sys/kern/kern_sig.c:1474 #7 0xffffffff802fc5e9 in sigsuspend (td=0xffffff000cfad720, uap=Variable "uap" is not available. ) at /usr/src/sys/kern/kern_sig.c:1453 #8 0xffffffff80491d2d in syscall (frame=0xffffff8076db8c80) at /usr/src/sys/amd64/amd64/trap.c:899 #9 0xffffffff8047d00b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:339 #10 0x000000080092ce3c in ?? () Previous frame inner to this frame (corrupt stack?) -- Andriy Gapon From avg at freebsd.org Tue Jun 16 15:20:45 2009 From: avg at freebsd.org (Andriy Gapon) Date: Tue Jun 16 15:20:51 2009 Subject: zfs related panic In-Reply-To: <4A37B395.20506@icyb.net.ua> References: <4A325E9F.2080802@icyb.net.ua> <3c1674c90906121354s6d6ae7ben5082708b1586e94f@mail.gmail.com> <4A37B395.20506@icyb.net.ua> Message-ID: <4A37B596.4090607@freebsd.org> on 16/06/2009 18:00 Andriy Gapon said the following: > on 12/06/2009 23:54 Kip Macy said the following: >> show sleepchain > > I can only do post-mortem using jhb's scripts for kgdb: > (kgdb) sleepchain 2432 > thread 100263 (pid 2432, tcsh) non-lock sleep I think that this was reported because td_wchan is not 'struct lock' in this case. (kgdb) fr 6 #6 0xffffffff802fc567 in kern_sigsuspend (td=Variable "td" is not available. ) at /usr/src/sys/kern/kern_sig.c:1474 (kgdb) list 1469 td->td_oldsigmask = td->td_sigmask; 1470 td->td_pflags |= TDP_OLDMASK; 1471 SIG_CANTMASK(mask); 1472 td->td_sigmask = mask; 1473 signotify(td); 1474 while (msleep(&p->p_sigacts, &p->p_mtx, PPAUSE|PCATCH, "pause", 0) == 0) 1475 /* void */; 1476 PROC_UNLOCK(p); 1477 /* always return EINTR rather than ERESTART... */ 1478 return (EINTR); (kgdb) p &p->p_sigacts $10 = (struct sigacts **) 0xffffff011440e548 (kgdb) fr 0 #0 sched_switch (td=0xffffff000cfad720, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 (kgdb) p td->td_wchan $11 = (void *) 0xffffff011440e548 (kgdb) p td->td_wmesg $12 = 0xffffffff80508788 "pause" (kgdb) backtrace #0 sched_switch (td=0xffffff000cfad720, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xffffffff80302a59 in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:444 #2 0xffffffff8032f645 in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xffffffff8032f925 in sleepq_catch_signals (wchan=0xffffff011440e548) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xffffffff80330219 in sleepq_wait_sig (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xffffffff80302eba in _sleep (ident=0xffffff011440e548, lock=0xffffff011440e5a0, priority=360, wmesg=0xffffffff80508788 "pause", timo=0) at /usr/src/sys/kern/kern_synch.c:228 #6 0xffffffff802fc567 in kern_sigsuspend (td=Variable "td" is not available. ) at /usr/src/sys/kern/kern_sig.c:1474 #7 0xffffffff802fc5e9 in sigsuspend (td=0xffffff000cfad720, uap=Variable "uap" is not available. ) at /usr/src/sys/kern/kern_sig.c:1453 #8 0xffffffff80491d2d in syscall (frame=0xffffff8076db8c80) at /usr/src/sys/amd64/amd64/trap.c:899 #9 0xffffffff8047d00b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:339 #10 0x000000080092ce3c in ?? () -- Andriy Gapon From peter at vk2pj.dyndns.org Tue Jun 16 18:53:12 2009 From: peter at vk2pj.dyndns.org (Peter Jeremy) Date: Tue Jun 16 18:53:19 2009 Subject: Does this disk/filesystem layout look sane to you? In-Reply-To: References: Message-ID: <20090616185221.GI9529@server.vk2pj.dyndns.org> On 2009-Jun-14 19:16:22 +0300, Dan Naumov wrote: >Is there any actual downside to having a 5-way mirror vs a 2-way or a 3-way one? Only write performance to the UFS root filesystem. I run a system using a similar approach (though across 3 disks). My only suggestion would be that instead of a single 5-way mirrored root, you have a 2- or 3-way mirrored root and an off-line root backup using the remaining disks - if you accidently trash your active root, you can just boot off one of the other disks to recover. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090616/d54e08d6/attachment.pgp From dan.naumov at gmail.com Wed Jun 17 07:34:04 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Wed Jun 17 07:34:10 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates Message-ID: I am wondering if the numbers I am seeing is something expected or is something broken somewhere. Output of bonnie -s 1024: on UFS2 + SoftUpdates: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 243.3 on ZFS: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 atom# cat /boot/loader.conf vm.kmem_size="1024M" vm.kmem_size_max="1024M" vfs.zfs.arc_max="96M" The test isn't completely fair in that the test on UFS2 is done on a partition that resides on the first 16gb of a 2tb disk while the zfs test is done on the enormous 1,9tb zfs pool that comes after that partition (same disk). Can this difference in layout make up for the huge difference in performance or is there something else in play? The system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green 2tb disk. Also what would be another good way to get good numbers for comparing the performance of UFS2 vs ZFS on the same system. Sincerely, - Dan Naumov From ivoras at freebsd.org Wed Jun 17 10:53:52 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Jun 17 10:53:59 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: Dan Naumov wrote: > I am wondering if the numbers I am seeing is something expected or is > something broken somewhere. Output of bonnie -s 1024: Unless you have 512 MB of memory in the machine or you're trying to test caching, the benchmark you did is useless. In your environment, you need at least "-s 4096". Even with those issues solved, it's semi-useless since you did both tests on the same drive, on different parts of it (see "diskinfo -vt ad0" or whatever your drive is to see how different parts of the drive have different performance). To make an objective comparison you need two identical drives, and create a new empty small-ish partition (e.g. 15 GB) on the same position on both (e.g. at the start), then use this partition only for benchmarking (not for the OS, etc). > on UFS2 + SoftUpdates: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 > 23603.8 243.3 > > on ZFS: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 I did my own testing on the early import of ZFS, the results in bonnie++ were that read and rewrite speeds are significantly better on ZFS than on UFS+SU (50%+), while write speed is a bit slower (~~10%). There are of course other workloads than the sequential that need to be reviewed. For example, blogbench places ZFS again at about 50% better than UFS+SU, while randomio makes it 50% slower. Untarring the ports tree on ZFS is about 3x faster than on UFS+SU. From andrew at modulus.org Wed Jun 17 11:26:11 2009 From: andrew at modulus.org (Andrew Snow) Date: Wed Jun 17 11:26:17 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: <4A38D1F9.6020105@modulus.org> Further to this, the gap between ZFS and UFS grows even larger when you compare ZFS software RAID with UFS on hardware RAID. (with ZFS beating UFS rather soundly) From joe at osoft.us Wed Jun 17 16:07:58 2009 From: joe at osoft.us (Joe Koberg) Date: Wed Jun 17 16:08:11 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: <4A390E57.9010701@osoft.us> The difference in layout can easily explain a 2x difference in sequential transfer performance. I seriously doubt your disk is really getting 23K seeks/s done in the UFS case - 100/s sounds much more reasonable for real hardware. Perhaps the results of caching? Joe Koberg Dan Naumov wrote: > I am wondering if the numbers I am seeing is something expected or is > something broken somewhere. Output of bonnie -s 1024: > > on UFS2 + SoftUpdates: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 243.3 > > on ZFS: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 > > > atom# cat /boot/loader.conf > vm.kmem_size="1024M" > vm.kmem_size_max="1024M" > vfs.zfs.arc_max="96M" > > The test isn't completely fair in that the test on UFS2 is done on a > partition that resides on the first 16gb of a 2tb disk while the zfs > test is done on the enormous 1,9tb zfs pool that comes after that > partition (same disk). Can this difference in layout make up for the > huge difference in performance or is there something else in play? The > system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green > 2tb disk. Also what would be another good way to get good numbers for > comparing the performance of UFS2 vs ZFS on the same system. > > > Sincerely, > - Dan Naumov > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > From dnelson at allantgroup.com Wed Jun 17 17:15:11 2009 From: dnelson at allantgroup.com (Dan Nelson) Date: Wed Jun 17 17:15:17 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: <20090617161109.GA12966@dan.emsphone.com> In the last episode (Jun 17), Dan Naumov said: > I am wondering if the numbers I am seeing is something expected or is > something broken somewhere. Output of bonnie -s 1024: > > on UFS2 + SoftUpdates: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 243.3 The insane sequential input K/sec and random seeks/sec values indicate that your entire test file was cached in memory. Try a larger file (at least 2x your installed RAM). > on ZFS: > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 > -- Dan Nelson dnelson@allantgroup.com From mlists at pmade.com Wed Jun 17 20:48:11 2009 From: mlists at pmade.com (Peter Jones) Date: Wed Jun 17 20:48:18 2009 Subject: Logical Disk to Physical Drive Mapping References: <86ljnxyy01.fsf@pmade.com> <4A32CF01.4010004@barryp.org> Message-ID: <8663eu8u4l.fsf@pmade.com> Barry Pederson writes: > Peter Jones wrote: >> Given the situation where you have several identical physical drives, >> what is the best way to turn logical labels such as da5 into a physical >> identifier like "the drive in slot 4"? >> >> It looks like I could use dmesg, some assumptions, and glabel to label >> the logical disks. However, I plan to use ZFS and as far as I can tell >> glabel doesn't support ZFS. >> >> What is the de facto way of doing this? I'll be using FreeBSD-CURRENT >> for this, btw. > > I've glabeled disks and then added them to ZFS pools, seems to work > fine. Here's a raidz2 setup of 8 identical glabeled drives on 7.2 I'm not exactly sure how file system labels differ from disk labels, but the man page suggests that they both write meta-data to the last sector of the disk. Wouldn't that indicate that once ZFS wrote to the last sector of the disk you'd loose that meta-data? -- Peter Jones, http://pmade.com pmade inc. Louisville, CO US From dan.naumov at gmail.com Wed Jun 17 20:58:43 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Wed Jun 17 20:58:50 2009 Subject: Logical Disk to Physical Drive Mapping In-Reply-To: <8663eu8u4l.fsf@pmade.com> References: <86ljnxyy01.fsf@pmade.com> <4A32CF01.4010004@barryp.org> <8663eu8u4l.fsf@pmade.com> Message-ID: You could use ZFS on a slice/partition taking up 99,9% of the disk's size to avoid this. Contrary to how it works in Solaris/OpenSolaris, in FreeBSD you don't use the ability to use write cache if you chose to use a slice or partition as a vdev for a ZFS pool instead of giving it the full disk. Additionally, you get some room to play if one disk in your raidz drop dead and your replacement drive ends up being a few sectors smaller then the disk you are replacing. - Dan Naumov On Wed, Jun 17, 2009 at 11:47 PM, Peter Jones wrote: > I'm not exactly sure how file system labels differ from disk labels, but > the man page suggests that they both write meta-data to the last sector > of the disk. > > Wouldn't that indicate that once ZFS wrote to the last sector of the > disk you'd loose that meta-data? From ronald-freebsd8 at klop.yi.org Thu Jun 18 00:07:41 2009 From: ronald-freebsd8 at klop.yi.org (Ronald Klop) Date: Thu Jun 18 00:07:52 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: On Wed, 17 Jun 2009 09:34:02 +0200, Dan Naumov wrote: > I am wondering if the numbers I am seeing is something expected or is > something broken somewhere. Output of bonnie -s 1024: > > on UFS2 + SoftUpdates: > > -------Sequential Output-------- ---Sequential Input-- > --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU > /sec %CPU > 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 > 23603.8 243.3 > > on ZFS: > > -------Sequential Output-------- ---Sequential Input-- > --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU > /sec %CPU > 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 > 102.2 1.2 > > > atom# cat /boot/loader.conf > vm.kmem_size="1024M" > vm.kmem_size_max="1024M" > vfs.zfs.arc_max="96M" Isn't 96M for ARC really small? Mine is 860M. vfs.zfs.arc_max: 860072960 kstat.zfs.misc.arcstats.size: 657383376 I think the UFS2 cache is much bigger which makes a difference in your test. Ronald. From dan.naumov at gmail.com Thu Jun 18 00:07:56 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Thu Jun 18 00:08:03 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: All the ZFS tuning guides for FreeBSD (including one on the FreeBSD ZFS wiki) have recommended values between 64M and 128M to improve stability, so that what I went with. How much of my max kmem is it safe to give to ZFS? - Dan Naumov On Thu, Jun 18, 2009 at 2:51 AM, Ronald Klop wrote: > Isn't 96M for ARC really small? > Mine is 860M. > vfs.zfs.arc_max: 860072960 > kstat.zfs.misc.arcstats.size: 657383376 > > I think the UFS2 cache is much bigger which makes a difference in your test. > > Ronald. > From chengjin at fastsoft.com Thu Jun 18 04:57:26 2009 From: chengjin at fastsoft.com (Cheng Jin) Date: Thu Jun 18 04:57:34 2009 Subject: 7.0 RELEASE panic: zero vnode ref count (with VFS_BIO_DEBUG on) Message-ID: All, While I was testing various kernel debug options, I ran into the kernel panic. I am not much of filesystem/vm person. I also didn't find any recent report of a similar crash so I am hoping one of you would provide some pointers on what this is. I am running 7.0 Release on a Dell 860 with a WD 80G SATA disk using the following kernel config file. I do have a few other debug option turned on, but as far as fs is concerned, VFS_BIO_DEBUG is the only thing. I do have the core file (about 181 MB) so if anyone needs additional information, please let me know. cpu HAMMER ident NOFP options IPFIREWALL options DUMMYNET options IPFIREWALL_DEFAULT_TO_ACCEPT options KDB options KDB_TRACE options KDB_UNATTENDED options GDB options DDB options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_KDB options VFS_BIO_DEBUG options HZ=1000 makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_IA32 # Compatible with i386 binaries options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing # Make an SMP-capable kernel by default options SMP # Symmetric MultiProcessor Kernel # Bus support. device acpi device pci # Floppy drives #device fdc # ATA and ATAPI devices device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives #device atapifd # ATAPI floppy drives #device atapist # ATAPI tape drives options ATA_STATIC_ID # Static device numbering # SCSI Controllers device ahc # AHA2940 and onboard AIC7xxx devices options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. device ahd # AHA39320/29320 and onboard AIC79xx devices options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. device amd # AMD 53C974 (Tekram DC-390(T)) device hptiop # Highpoint RocketRaid 3xxx series device isp # Qlogic family device mpt # LSI-Logic MPT-Fusion device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') device trm # Tekram DC395U/UW/F DC315U adapters device adv # Advansys SCSI adapters device adw # Advansys wide SCSI adapters device aic # Adaptec 15[012]x SCSI adapters, AIC-6[23]60. device bt # Buslogic/Mylex MultiMaster SCSI adapters # SCSI peripherals device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers interfaced to the SCSI subsystem device amr # AMI MegaRAID device arcmsr # Areca SATA II RAID device ciss # Compaq Smart RAID 5* device dpt # DPT Smartcache III, IV - See NOTES for options device hptmv # Highpoint RocketRAID 182x device hptrr # Highpoint RocketRAID 17xx, 22xx, 23xx, 25xx device iir # Intel Integrated RAID device ips # IBM (Adaptec) ServeRAID device mly # Mylex AcceleRAID/eXtremeRAID device twa # 3ware 9000 series PATA/SATA RAID # RAID controllers device aac # Adaptec FSA RAID device aacp # SCSI passthrough for aac (requires CAM) device ida # Compaq Smart RAID device mfi # LSI MegaRAID SAS device mlx # Mylex DAC960 family device twe # 3ware ATA RAID # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc # IPMI support device ipmi # Serial (COM) ports device sio # 8250, 16[45]50 based serial ports # Parallel port device ppbus # Parallel port bus (required) # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support device bce # Broadcom BCM5706/BCM5708 Gigabit Ethernet device bge # Broadcom BCM570xx Gigabit Ethernet # Pseudo devices. device loop # Network loopback device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module device if_bridge #Bridge interface # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device usb # USB Bus (required) device ugen # Generic device ukbd # Keyboard The backtrace of the stack is the following: #0 doadump () at pcpu.h:194 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:194 #1 0xffffffff80351af5 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 #2 0xffffffff80351f77 in panic (fmt=Variable "fmt" is not available.) at ../../../kern/kern_shutdown.c:563 #3 0xffffffff803baa00 in bufdone_finish (bp=0xffffffffa0209b00) at ../../../kern/vfs_bio.c:3202 #4 0xffffffff803baaa8 in bufdone (bp=0xffffffffa0209b00) at ../../../kern/vfs_bio.c:3173 #5 0xffffffff8030a2b1 in g_io_schedule_up (tp=Variable "tp" is not available.) at ../../../geom/geom_io.c:587 #6 0xffffffff8030a99f in g_up_procbody () at ../../../geom/geom_kern.c:95 #7 0xffffffff80334dca in fork_exit (callout=0xffffffff8030a930 , arg=0x0, frame=0xffffffffab9fdc80) at ../../../kern/kern_fork.c:781 #8 0xffffffff804ee63e in fork_trampoline () at ../../../amd64/amd64/exception.S:415 From numisemis at yahoo.com Thu Jun 18 07:31:54 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Thu Jun 18 07:32:01 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: <270394.95537.qm@web37305.mail.mud.yahoo.com> Dan Naumov wrote: > All the ZFS tuning guides for FreeBSD (including one on the FreeBSD > ZFS wiki) have recommended values between 64M and 128M to improve > stability, so that what I went with. How much of my max kmem is it > safe to give to ZFS? On amd64 since 7.2-RELEASE manually adjusting kmem map or arc size is not necessary any more (see /usr/src/UPDATING) for stability. But if you like you can still do it. If you want to use ZFS allot I suggest to use latest 7-STABLE (which has ZFS v13, more stable, more bugs resolved). amd64 of course. For i386 it would be better to use UFS+SU (for SCSI) or UFS+gjournal (for ATA). btw. turning on compression on ZFS filesystems might actually increase it's performance that is seen by benchmark programs. From petefrench at ticketswitch.com Thu Jun 18 08:29:04 2009 From: petefrench at ticketswitch.com (Pete French) Date: Thu Jun 18 08:29:16 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: Message-ID: > All the ZFS tuning guides for FreeBSD (including one on the FreeBSD > ZFS wiki) have recommended values between 64M and 128M to improve > stability, so that what I went with. How much of my max kmem is it > safe to give to ZFS? If you are on amd64 then don't tune it, it will tune itself. If you are on i386 (or an earlier verions of amd64) then 128M on a 2 gig machine should be OK, assuming you have kmem_size_max set to the full 1500 odd. Those are numbers which come up time and time again - I ran reliably with them for ages, until the latest -STABLE. -pete. From fjwcash at gmail.com Thu Jun 18 15:47:25 2009 From: fjwcash at gmail.com (Freddie Cash) Date: Thu Jun 18 15:47:31 2009 Subject: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates In-Reply-To: References: Message-ID: On Thu, Jun 18, 2009 at 1:29 AM, Pete French wrote: > > All the ZFS tuning guides for FreeBSD (including one on the FreeBSD > > ZFS wiki) have recommended values between 64M and 128M to improve > > stability, so that what I went with. How much of my max kmem is it > > safe to give to ZFS? > > If you are on amd64 then don't tune it, it will tune itself. If you > are on i386 (or an earlier verions of amd64) then 128M on a 2 gig machine > should be OK, assuming you have kmem_size_max set to the full 1500 odd. > Those are numbers which come up time and time again - I ran reliably with > them for ages, until the latest -STABLE. > My "rule of thumb" for 32-bit i386 systems has been to: - assign half of RAM to kmem (up to the max of ~1500 on 7.0/7.1) - assign half of kmem to zfs_arc_max So far, for my workloads (nfs/cifs file servers, cups print servers, rsync servers, kde4 desktop), it's worked well. -- Freddie Cash fjwcash@gmail.com From randy at psg.com Thu Jun 18 22:19:06 2009 From: randy at psg.com (Randy Bush) Date: Thu Jun 18 22:19:13 2009 Subject: adding drive to raidz1 Message-ID: so, i made the jet-lagged mistake of saying # zpool add tank ad7s1 and got the following # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad5s3 ONLINE 0 0 0 ad6s1 ONLINE 0 0 0 ad7s1 ONLINE 0 0 0 when i wanted to add it to the raidz1. # zpool remove tank ad7s1 cannot remove ad7s1: only inactive hot spares or cache devices can be removed # zpool offline tank ad7s1 cannot offline ad7s1: no valid replicas how do i pry it off of the pool and stick it into the raidz1? thanks randy From andrew at modulus.org Thu Jun 18 22:29:46 2009 From: andrew at modulus.org (Andrew Snow) Date: Thu Jun 18 22:29:53 2009 Subject: adding drive to raidz1 In-Reply-To: References: Message-ID: <4A3ABF76.3020905@modulus.org> Randy Bush wrote: > so, i made the jet-lagged mistake of saying > > # zpool add tank ad7s1 > when i wanted to add it to the raidz1. > > # zpool remove tank ad7s1 > cannot remove ad7s1: only inactive hot spares or cache devices can be removed > # zpool offline tank ad7s1 > cannot offline ad7s1: no valid replicas > > how do i pry it off of the pool and stick it into the raidz1? *braces* You can't, without recreating the whole zpool. - Andrew From dan.naumov at gmail.com Thu Jun 18 22:40:19 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Thu Jun 18 22:40:25 2009 Subject: adding drive to raidz1 In-Reply-To: <4A3ABF76.3020905@modulus.org> References: <4A3ABF76.3020905@modulus.org> Message-ID: To reiterate, you cant just add a single disk drive to a raidz1 or raidz2 pool. This is a known limitation (you can check with SUN ZFS docs). If you have an existing raidz and you MUST increase that particular pool's storage capabilities, you have 3 options: 1) Add a raidz of the same configuration to the pool (think 3 disk raidz + 3 disk raidz or 5 + 5, for example) 2) Replace each (and every) disk in your raidz pool one by one, letting it resilver after inserting each upgraded disk 3) Backup your data, destroy your pool and create a new raidz pool with a bigger amount of disks. - Dan Naumov On Fri, Jun 19, 2009 at 1:28 AM, Andrew Snow wrote: > Randy Bush wrote: >> >> so, i made the jet-lagged mistake of saying >> >> ? ?# zpool add tank ad7s1 >> when i wanted to add it to the raidz1. >> >> ? ?# zpool remove tank ad7s1 >> ? ?cannot remove ad7s1: only inactive hot spares or cache devices can be >> removed >> ? ?# zpool offline tank ad7s1 >> ? ?cannot offline ad7s1: no valid replicas >> >> how do i pry it off of the pool and stick it into the raidz1? > > > *braces* ?You can't, without recreating the whole zpool. From randy at psg.com Thu Jun 18 22:52:28 2009 From: randy at psg.com (Randy Bush) Date: Thu Jun 18 22:52:36 2009 Subject: adding drive to raidz1 In-Reply-To: <4A3ABF76.3020905@modulus.org> References: <4A3ABF76.3020905@modulus.org> Message-ID: > 2) Replace each (and every) disk in your raidz pool one by one, > letting it resilver after inserting each upgraded disk ok. given # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad5s3 ONLINE 0 0 0 ad6s1 ONLINE 0 0 0 ad7s1 ONLINE 0 0 0 how do i get ad7s1 offline? i can't detach it. will using it in a replace do the trick? and then how do i replace the four slices one by one? sorry, but this is a distant system and after screwing up once, i am a bit cautious. just From andrew at modulus.org Thu Jun 18 22:56:21 2009 From: andrew at modulus.org (Andrew Snow) Date: Thu Jun 18 22:56:27 2009 Subject: adding drive to raidz1 In-Reply-To: References: <4A3ABF76.3020905@modulus.org> Message-ID: <4A3AC5B2.9010607@modulus.org> > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad4s3 ONLINE 0 0 0 > ad5s3 ONLINE 0 0 0 > ad6s1 ONLINE 0 0 0 > ad7s1 ONLINE 0 0 0 Here you have created a non-redundant stripe with two vdev members: 1. a 3-disk RAIDZ1 and 2.a single disk. So you can't ever remove the ad7s1 without data loss. If you haven't written anything to the pool since adding ad7s1, you can probably yank the disk out and ignore any errors but the error messages will never go away until you recreate the whole pool from scratch From morganw at chemikals.org Fri Jun 19 00:45:06 2009 From: morganw at chemikals.org (Wes Morgan) Date: Fri Jun 19 00:45:13 2009 Subject: adding drive to raidz1 In-Reply-To: <4A3AC5B2.9010607@modulus.org> References: <4A3ABF76.3020905@modulus.org> <4A3AC5B2.9010607@modulus.org> Message-ID: On Fri, 19 Jun 2009, Andrew Snow wrote: > >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> raidz1 ONLINE 0 0 0 >> ad4s3 ONLINE 0 0 0 >> ad5s3 ONLINE 0 0 0 >> ad6s1 ONLINE 0 0 0 >> ad7s1 ONLINE 0 0 0 > > Here you have created a non-redundant stripe with two vdev members: > 1. a 3-disk RAIDZ1 and > 2.a single disk. > > So you can't ever remove the ad7s1 without data loss. > > If you haven't written anything to the pool since adding ad7s1, you can > probably yank the disk out and ignore any errors but the error messages will > never go away until you recreate the whole pool from scratch If you yank ad7s1 the pool will become unavailable. You could remove one of the slices in the raidz, though. The only way to "fix" this is just what everyone has said... Back up the data, destroy the pool and recreate. When you do this, if you don't want to be using slices, just "don't" -- use zpool create raidz somethingbesidestankfortheloveofgod ad4 ad5 ad6 ad7 And you'll be set. But you're using ad5s3 and ad6s3, are the first two slices in use? From randy at psg.com Fri Jun 19 01:04:21 2009 From: randy at psg.com (Randy Bush) Date: Fri Jun 19 01:04:28 2009 Subject: adding drive to raidz1 In-Reply-To: References: <4A3ABF76.3020905@modulus.org> <4A3AC5B2.9010607@modulus.org> Message-ID: > The only way to "fix" this is just what everyone has said... Back up the > data, destroy the pool and recreate. done. worked. luckily this was a system in build. > When you do this, if you don't want to be using slices i have nothing against slices > zpool create raidz somethingbesidestankfortheloveofgod ad4 ad5 ad6 ad7 she really does not care about the pool name. :) > And you'll be set. But you're using ad5s3 and ad6s3, are the first two > slices in use? on the two bootables s1 is a small gmirroed boot s2 is a non-mirrored swap s3 is pool randy From morganw at chemikals.org Fri Jun 19 01:10:38 2009 From: morganw at chemikals.org (Wes Morgan) Date: Fri Jun 19 01:10:44 2009 Subject: adding drive to raidz1 In-Reply-To: References: <4A3ABF76.3020905@modulus.org> <4A3AC5B2.9010607@modulus.org> Message-ID: On Thu, 18 Jun 2009, Randy Bush wrote: >> The only way to "fix" this is just what everyone has said... Back up the >> data, destroy the pool and recreate. > > done. worked. luckily this was a system in build. > >> When you do this, if you don't want to be using slices > > i have nothing against slices > >> zpool create raidz somethingbesidestankfortheloveofgod ad4 ad5 ad6 ad7 > > she really does not care about the pool name. :) > >> And you'll be set. But you're using ad5s3 and ad6s3, are the first two >> slices in use? > > on the two bootables > s1 is a small gmirroed boot > s2 is a non-mirrored swap > s3 is pool Just out of sheer curiosity, are all the slices and devices in the raidz the same size? From randy at psg.com Fri Jun 19 01:21:01 2009 From: randy at psg.com (Randy Bush) Date: Fri Jun 19 01:21:07 2009 Subject: adding drive to raidz1 In-Reply-To: References: <4A3ABF76.3020905@modulus.org> <4A3AC5B2.9010607@modulus.org> Message-ID: >> on the two bootables >> s1 is a small gmirroed boot >> s2 is a non-mirrored swap >> s3 is pool > Just out of sheer curiosity, are all the slices and devices in the raidz > the same size? no they were not. and now it is not a raidz, but rather NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad5s3 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad6s1 ONLINE 0 0 0 ad7s1 ONLINE 0 0 0 randy From james-freebsd-fs2 at jrv.org Fri Jun 19 04:12:39 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Fri Jun 19 04:12:46 2009 Subject: adding drive to raidz1 In-Reply-To: References: Message-ID: <4A3B1020.2010305@jrv.org> As a feature suggestion why not reject an "zpool add" of a non-redundant vdev to a pool of redundant vdev's unless -f is given? A command of that sort is almost always a mistake so requiring -f would seem no hardship for anyone... Randy Bush wrote: > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad4s3 ONLINE 0 0 0 > ad5s3 ONLINE 0 0 0 > ad6s1 ONLINE 0 0 0 > ad7s1 ONLINE 0 0 0 > As was said, a vdev (ad7s1) cannot be removed from a pool, and a device cannot be added to a raidz. However, I believe it is possible to attach a device to a single-device vdev such as ad7s1 and turn that vdev into a mirror, regaining redundancy without recreating the pool, perhaps something like: # zpool attach tank ad7s1 ad8s1 to get NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad5s3 ONLINE 0 0 0 ad6s1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad7s1 ONLINE 0 0 0 ad8s1 ONLINE 0 0 0 (hand edited, not actual zpool output) Even if the pool the pool is to be rebuilt I suggest converting the naked vdevs to mirrors in the meantime to avoid disaster... PS. I prefer pools of mirrors over raidz anyway with such a small number of devices since it's easier to protect against many more kinds of system faults (i.e., power supply, cable, device firmware, host controller, driver, etc). From jh at saunalahti.fi Fri Jun 19 09:43:19 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Fri Jun 19 09:43:26 2009 Subject: VOP_WRITE & read-only file system In-Reply-To: <20090527150258.GA3666@a91-153-125-115.elisa-laajakaista.fi> References: <20090527150258.GA3666@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <20090619094316.GA805@a91-153-125-115.elisa-laajakaista.fi> On 2009-05-27, Jaakko Heinonen wrote: > I found a few ways to get VOP_WRITE called for a read-only system. > > Ways I found: > > 1) mmap(2) > > 2) ktrace(2) > - start ktracing a process > - remount file-system as read-only While kib@ has a patch for mmap(2) I took a look at ktrace(2). ktrace too has a problem with writecount. ktrace uses vn_open() to open the trace file but immediately after that it calls vn_close() which decreases the writecount. As far as I can tell it does this because the same vnode may be associated with several processes and there is no easy and efficient way to know when it is disassociated from last process. Ideas how to fix it? Some thoughts: - Fiddle with writecount. IMHO it wouldn't fix the real bug (write after vn_close()). - Walk through all processes when disconnecting a vnode from process to find out if it was the last process using the vnode. Inefficient. - Keep track of vnodes which are used for tracing and have reference count for them. -- Jaakko From fb-fs at psconsult.nl Fri Jun 19 19:32:14 2009 From: fb-fs at psconsult.nl (Paul Schenkeveld) Date: Fri Jun 19 19:32:21 2009 Subject: adding drive to raidz1 In-Reply-To: References: <4A3ABF76.3020905@modulus.org> <4A3AC5B2.9010607@modulus.org> Message-ID: <20090619192158.GA78254@psconsult.nl> On Thu, Jun 18, 2009 at 06:04:17PM -0700, Randy Bush wrote: > on the two bootables > s1 is a small gmirroed boot > s2 is a non-mirrored swap If the system swaps, a read error on the swap device will panic the system. Although swap data is always transient and after a reboot generally not interesting anymore, I ALWAYS put swap on a mirror/raid3/raid5/raidz just to make sure the system survives a read error, especially with remote systems. > s3 is pool Paul Schenkeveld From peterjeremy at optushome.com.au Sat Jun 20 04:54:08 2009 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Sat Jun 20 04:54:15 2009 Subject: adding drive to raidz1 In-Reply-To: <4A3B1020.2010305@jrv.org> References: <4A3B1020.2010305@jrv.org> Message-ID: <20090620045357.GB22846@server.vk2pj.dyndns.org> On 2009-Jun-18 23:12:16 -0500, "James R. Van Artsdalen" wrote: >As a feature suggestion why not reject an "zpool add" of a non-redundant >vdev to a pool of redundant vdev's unless -f is given? A command of >that sort is almost always a mistake so requiring -f would seem no >hardship for anyone... Agreed. >As was said, a vdev (ad7s1) cannot be removed from a pool, and a device >cannot be added to a raidz. Both these are unfortunate restrictions. I can understand that expanding a RAIDZ would be a fairly complex operation but it's probably the most requested feature. I'm surprised that Sun don't allow removing vdevs from a pool - it's orthogonal to adding a vdev to a pool and (eg) HP AdvFS allows both. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090620/1234f2c1/attachment.pgp From kmacy at freebsd.org Sat Jun 20 05:43:31 2009 From: kmacy at freebsd.org (Kip Macy) Date: Sat Jun 20 05:43:38 2009 Subject: adding drive to raidz1 In-Reply-To: <20090620045357.GB22846@server.vk2pj.dyndns.org> References: <4A3B1020.2010305@jrv.org> <20090620045357.GB22846@server.vk2pj.dyndns.org> Message-ID: <3c1674c90906192243g2ea0781ne66fb67a520d56bf@mail.gmail.com> > Both these are unfortunate restrictions. ?I can understand that > expanding a RAIDZ would be a fairly complex operation but it's > probably the most requested feature. ?I'm surprised that Sun don't > allow removing vdevs from a pool - it's orthogonal to adding a vdev to > a pool and (eg) HP AdvFS allows both. http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z This has a very good discussion of how it could be done along with why it hasn't been done. Cheers, Kip From shopsite-user-bounces at mailman.dca.net Sat Jun 20 08:24:04 2009 From: shopsite-user-bounces at mailman.dca.net (shopsite-user-bounces@mailman.dca.net) Date: Sat Jun 20 08:24:11 2009 Subject: Your message to Shopsite-user awaits moderator approval Message-ID: Your mail to 'Shopsite-user' with the subject Mail System Error - Returned Mail Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://mailman.dca.net/mailman/confirm/shopsite-user/ec53249e000defe6a525f6261fadae3875751119 From kmacy at freebsd.org Sat Jun 20 19:32:34 2009 From: kmacy at freebsd.org (Kip Macy) Date: Sat Jun 20 19:32:41 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <1245525965.26909.69.camel@phoenix.blechhirn.net> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> Message-ID: <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> This is a a known issue with write allocate file systems and snapshots. I haven't seen this before on v13 without any snapshots. A few questions: - How many file systems? - How old are the file systems? - How much churn has there been on the file system? - Was this an upgraded v6 or created as v13? - How many files on test? ... as well as any other things that occur to you to characterize the file system. Cheers, Kip On Sat, Jun 20, 2009 at 12:26 PM, Mister Olli wrote: > Hi, > >> Do you have snapshots or run ZFS v6? > neither one or the other. Here are my pool/ ZFS details. > > [root@template-8_CURRENT /test/data2]# zpool get all test > NAME ?PROPERTY ? ? ? VALUE ? ? ? SOURCE > test ?size ? ? ? ? ? 2.98G ? ? ? - > test ?used ? ? ? ? ? 2.94G ? ? ? - > test ?available ? ? ?47.9M ? ? ? - > test ?capacity ? ? ? 98% ? ? ? ? - > test ?altroot ? ? ? ?- ? ? ? ? ? default > test ?health ? ? ? ? ONLINE ? ? ?- > test ?guid ? ? ? ? ? 5305090209740383945 ?- > test ?version ? ? ? ?13 ? ? ? ? ?default > test ?bootfs ? ? ? ? - ? ? ? ? ? default > test ?delegation ? ? on ? ? ? ? ?default > test ?autoreplace ? ?off ? ? ? ? default > test ?cachefile ? ? ?- ? ? ? ? ? default > test ?failmode ? ? ? wait ? ? ? ?default > test ?listsnapshots ?off ? ? ? ? default > [root@template-8_CURRENT /test/data2]# zfs get all test > NAME ?PROPERTY ? ? ? ? ? ? ?VALUE ? ? ? ? ? ? ? ? ?SOURCE > test ?type ? ? ? ? ? ? ? ? ?filesystem ? ? ? ? ? ? - > test ?creation ? ? ? ? ? ? ?Fri Jun 19 21:01 2009 ?- > test ?used ? ? ? ? ? ? ? ? ?1.96G ? ? ? ? ? ? ? ? ?- > test ?available ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ? ?- > test ?referenced ? ? ? ? ? ?26.6K ? ? ? ? ? ? ? ? ?- > test ?compressratio ? ? ? ? 1.00x ? ? ? ? ? ? ? ? ?- > test ?mounted ? ? ? ? ? ? ? yes ? ? ? ? ? ? ? ? ? ?- > test ?quota ? ? ? ? ? ? ? ? none ? ? ? ? ? ? ? ? ? default > test ?reservation ? ? ? ? ? none ? ? ? ? ? ? ? ? ? default > test ?recordsize ? ? ? ? ? ?128K ? ? ? ? ? ? ? ? ? default > test ?mountpoint ? ? ? ? ? ?/test ? ? ? ? ? ? ? ? ?default > test ?sharenfs ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?checksum ? ? ? ? ? ? ?on ? ? ? ? ? ? ? ? ? ? default > test ?compression ? ? ? ? ? off ? ? ? ? ? ? ? ? ? ?default > test ?atime ? ? ? ? ? ? ? ? on ? ? ? ? ? ? ? ? ? ? default > test ?devices ? ? ? ? ? ? ? on ? ? ? ? ? ? ? ? ? ? default > test ?exec ? ? ? ? ? ? ? ? ?on ? ? ? ? ? ? ? ? ? ? default > test ?setuid ? ? ? ? ? ? ? ?on ? ? ? ? ? ? ? ? ? ? default > test ?readonly ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?jailed ? ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?snapdir ? ? ? ? ? ? ? hidden ? ? ? ? ? ? ? ? default > test ?aclmode ? ? ? ? ? ? ? groupmask ? ? ? ? ? ? ?default > test ?aclinherit ? ? ? ? ? ?restricted ? ? ? ? ? ? default > test ?canmount ? ? ? ? ? ? ?on ? ? ? ? ? ? ? ? ? ? default > test ?shareiscsi ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?xattr ? ? ? ? ? ? ? ? off ? ? ? ? ? ? ? ? ? ?temporary > test ?copies ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? ? ?default > test ?version ? ? ? ? ? ? ? 3 ? ? ? ? ? ? ? ? ? ? ?- > test ?utf8only ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?- > test ?normalization ? ? ? ? none ? ? ? ? ? ? ? ? ? - > test ?casesensitivity ? ? ? sensitive ? ? ? ? ? ? ?- > test ?vscan ? ? ? ? ? ? ? ? off ? ? ? ? ? ? ? ? ? ?default > test ?nbmand ? ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?sharesmb ? ? ? ? ? ? ?off ? ? ? ? ? ? ? ? ? ?default > test ?refquota ? ? ? ? ? ? ?none ? ? ? ? ? ? ? ? ? default > test ?refreservation ? ? ? ?none ? ? ? ? ? ? ? ? ? default > test ?primarycache ? ? ? ? ?all ? ? ? ? ? ? ? ? ? ?default > test ?secondarycache ? ? ? ?all ? ? ? ? ? ? ? ? ? ?default > test ?usedbysnapshots ? ? ? 0 ? ? ? ? ? ? ? ? ? ? ?- > test ?usedbydataset ? ? ? ? 26.6K ? ? ? ? ? ? ? ? ?- > test ?usedbychildren ? ? ? ?1.96G ? ? ? ? ? ? ? ? ?- > test ?usedbyrefreservation ?0 ? ? ? ? ? ? ? ? ? ? ?- > [root@template-8_CURRENT /test/data2]# zfs list -t snapshot > no datasets available > > > >> Confirm that you've deleted your snapshots and are running pool v13. >> >> Future ZFS mail should be directed to freebsd-fs@ > Sorry for that. fixed now ;-)) > > Regards, > --- > Mr. Olli > > >> >> >> On Sat, Jun 20, 2009 at 10:36 AM, Mister Olli wrote: >> > Hi, >> > >> > after filling up a ZFS volume until the last byte, I'm unable to delete >> > files, with error 'No space left on the device'. >> > >> > >> > >> > [root@template-8_CURRENT /test/data2]# df -h >> > Filesystem ? ? Size ? ?Used ? Avail Capacity ?Mounted on >> > /dev/ad0s1a ? ?8.7G ? ?5.2G ? ?2.8G ? ?65% ? ?/ >> > devfs ? ? ? ? ?1.0K ? ?1.0K ? ? ?0B ? 100% ? ?/dev >> > test ? ? ? ? ? ? 0B ? ? ?0B ? ? ?0B ? 100% ? ?/test >> > test/data1 ? ? 1.6G ? ?1.6G ? ? ?0B ? 100% ? ?/test/data1 >> > test/data2 ? ? 341M ? ?341M ? ? ?0B ? 100% ? ?/test/data2 >> > [root@template-8_CURRENT /test/data2]# zfs list >> > NAME ? ? ? ? USED ?AVAIL ?REFER ?MOUNTPOINT >> > test ? ? ? ?1.96G ? ? ?0 ?26.6K ?/test >> > test/data1 ?1.62G ? ? ?0 ?1.62G ?/test/data1 >> > test/data2 ? 341M ? ? ?0 ? 341M ?/test/data2 >> > [root@template-8_CURRENT /test/data2]# ls -l data1 |tail -n 20 ? ? ? ? ?<-- there are quite a lot of files, so I truncated ;-)) >> > -rw-r--r-- ?1 root ?wheel ? ? ?3072 Jun 20 17:13 20090620165743 >> > -rw-r--r-- ?1 root ?wheel ? 9771008 Jun 20 17:11 20090620165803 >> > -rw-r--r-- ?1 root ?wheel ? ?624640 Jun 20 17:12 20090620165809 >> > -rw-r--r-- ?1 root ?wheel ? 1777664 Jun 20 17:14 20090620165810 >> > -rw-r--r-- ?1 root ?wheel ? 4059136 Jun 20 17:15 20090620165817 >> > -rw-r--r-- ?1 root ?wheel ?23778304 Jun 20 17:13 20090620165925 >> > -rw-r--r-- ?1 root ?wheel ?20318208 Jun 20 17:13 20090620165952 >> > -rw-r--r-- ?1 root ?wheel ?28394496 Jun 20 17:10 20090620170013 >> > -rw-r--r-- ?1 root ?wheel ?23698432 Jun 20 17:12 20090620170021 >> > -rw-r--r-- ?1 root ?wheel ?26476544 Jun 20 17:19 20090620170100 >> > -rw-r--r-- ?1 root ?wheel ?19904512 Jun 20 17:15 20090620170132 >> > -rw-r--r-- ?1 root ?wheel ?23815168 Jun 20 17:14 20090620170142 >> > -rw-r--r-- ?1 root ?wheel ? 6683648 Jun 20 17:11 20090620170225 >> > -rw-r--r-- ?1 root ?wheel ?19619840 Jun 20 17:11 20090620170322 >> > -rw-r--r-- ?1 root ?wheel ?13902848 Jun 20 17:13 20090620170331 >> > -rw-r--r-- ?1 root ?wheel ?28981248 Jun 20 17:13 20090620170346 >> > -rw-r--r-- ?1 root ?wheel ?18287616 Jun 20 17:11 20090620170355 >> > -rw-r--r-- ?1 root ?wheel ?16762880 Jun 20 17:16 20090620170405 >> > -rw-r--r-- ?1 root ?wheel ?26966016 Jun 20 17:10 20090620170429 >> > -rw-r--r-- ?1 root ?wheel ? 5252096 Jun 20 17:14 20090620170502 >> > [root@template-8_CURRENT /test/data2]# ?rm -rf data1 >> > rm: data1/20090620141524: No space left on device >> > rm: data1/20090620025202: No space left on device >> > rm: data1/20090620014926: No space left on device >> > rm: data1/20090620075405: No space left on device >> > rm: data1/20090620155124: No space left on device >> > rm: data1/20090620105723: No space left on device >> > rm: data1/20090620170100: No space left on device >> > rm: data1/20090620040149: No space left on device >> > rm: data1/20090620002512: No space left on device >> > rm: data1/20090620052315: No space left on device >> > rm: data1/20090620083750: No space left on device >> > rm: data1/20090620063831: No space left on device >> > rm: data1/20090620155029: No space left on device >> > rm: data1/20090619234313: No space left on device >> > rm: data1/20090620115346: No space left on device >> > rm: data1/20090620075508: No space left on device >> > rm: data1/20090620145541: No space left on device >> > rm: data1/20090620093335: No space left on device >> > rm: data1/20090620101846: No space left on device >> > rm: data1/20090620132456: No space left on device >> > rm: data1/20090620040044: No space left on device >> > rm: data1/20090620091401: No space left on device >> > rm: data1/20090620162251: No space left on device >> > rm: data1/20090619220813: No space left on device >> > rm: data1/20090620010643: No space left on device >> > rm: data1/20090620052218: No space left on device >> > >> > >> > >> > >> > >> > Regards, >> > --- >> > Mr. Olli >> > >> > _______________________________________________ >> > freebsd-current@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-current >> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >> > >> >> >> > > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From dan.naumov at gmail.com Sat Jun 20 19:42:42 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sat Jun 20 19:42:48 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> Message-ID: Hi. As Kip pointed out, this is a known issue with write allocate filesystems in general (not just ZFS). This is one of the several reasons why SUN recommends you do not completely fill up a zpool (they actually recommend to stay at or below 80% utilization). I have a workaround for you, however: Pick a file you don't need on the filled up ZFS volume. "Empty" the file contents in a way of your chosing. This should give you some disk space needed to use "rm" and further empty up your filesystem and allow for normal operation. This is a bit ugly, but it works. - Sincerely, Dan Naumov >>> On Sat, Jun 20, 2009 at 10:36 AM, Mister Olli wrote: >>> > Hi, >>> > >>> > after filling up a ZFS volume until the last byte, I'm unable to delete >>> > files, with error 'No space left on the device'. >>> > >>> > >>> > >>> > [root@template-8_CURRENT /test/data2]# df -h >>> > Filesystem ? ? Size ? ?Used ? Avail Capacity ?Mounted on >>> > /dev/ad0s1a ? ?8.7G ? ?5.2G ? ?2.8G ? ?65% ? ?/ >>> > devfs ? ? ? ? ?1.0K ? ?1.0K ? ? ?0B ? 100% ? ?/dev >>> > test ? ? ? ? ? ? 0B ? ? ?0B ? ? ?0B ? 100% ? ?/test >>> > test/data1 ? ? 1.6G ? ?1.6G ? ? ?0B ? 100% ? ?/test/data1 >>> > test/data2 ? ? 341M ? ?341M ? ? ?0B ? 100% ? ?/test/data2 >>> > [root@template-8_CURRENT /test/data2]# zfs list >>> > NAME ? ? ? ? USED ?AVAIL ?REFER ?MOUNTPOINT >>> > test ? ? ? ?1.96G ? ? ?0 ?26.6K ?/test >>> > test/data1 ?1.62G ? ? ?0 ?1.62G ?/test/data1 >>> > test/data2 ? 341M ? ? ?0 ? 341M ?/test/data2 >>> > [root@template-8_CURRENT /test/data2]# ls -l data1 |tail -n 20 ? ? ? ? ?<-- there are quite a lot of files, so I truncated ;-)) >>> > -rw-r--r-- ?1 root ?wheel ? ? ?3072 Jun 20 17:13 20090620165743 >>> > -rw-r--r-- ?1 root ?wheel ? 9771008 Jun 20 17:11 20090620165803 >>> > -rw-r--r-- ?1 root ?wheel ? ?624640 Jun 20 17:12 20090620165809 >>> > -rw-r--r-- ?1 root ?wheel ? 1777664 Jun 20 17:14 20090620165810 >>> > -rw-r--r-- ?1 root ?wheel ? 4059136 Jun 20 17:15 20090620165817 >>> > -rw-r--r-- ?1 root ?wheel ?23778304 Jun 20 17:13 20090620165925 >>> > -rw-r--r-- ?1 root ?wheel ?20318208 Jun 20 17:13 20090620165952 >>> > -rw-r--r-- ?1 root ?wheel ?28394496 Jun 20 17:10 20090620170013 >>> > -rw-r--r-- ?1 root ?wheel ?23698432 Jun 20 17:12 20090620170021 >>> > -rw-r--r-- ?1 root ?wheel ?26476544 Jun 20 17:19 20090620170100 >>> > -rw-r--r-- ?1 root ?wheel ?19904512 Jun 20 17:15 20090620170132 >>> > -rw-r--r-- ?1 root ?wheel ?23815168 Jun 20 17:14 20090620170142 >>> > -rw-r--r-- ?1 root ?wheel ? 6683648 Jun 20 17:11 20090620170225 >>> > -rw-r--r-- ?1 root ?wheel ?19619840 Jun 20 17:11 20090620170322 >>> > -rw-r--r-- ?1 root ?wheel ?13902848 Jun 20 17:13 20090620170331 >>> > -rw-r--r-- ?1 root ?wheel ?28981248 Jun 20 17:13 20090620170346 >>> > -rw-r--r-- ?1 root ?wheel ?18287616 Jun 20 17:11 20090620170355 >>> > -rw-r--r-- ?1 root ?wheel ?16762880 Jun 20 17:16 20090620170405 >>> > -rw-r--r-- ?1 root ?wheel ?26966016 Jun 20 17:10 20090620170429 >>> > -rw-r--r-- ?1 root ?wheel ? 5252096 Jun 20 17:14 20090620170502 >>> > [root@template-8_CURRENT /test/data2]# ?rm -rf data1 >>> > rm: data1/20090620141524: No space left on device >>> > rm: data1/20090620025202: No space left on device >>> > rm: data1/20090620014926: No space left on device >>> > rm: data1/20090620075405: No space left on device >>> > rm: data1/20090620155124: No space left on device >>> > rm: data1/20090620105723: No space left on device >>> > rm: data1/20090620170100: No space left on device >>> > rm: data1/20090620040149: No space left on device >>> > rm: data1/20090620002512: No space left on device >>> > rm: data1/20090620052315: No space left on device >>> > rm: data1/20090620083750: No space left on device >>> > rm: data1/20090620063831: No space left on device >>> > rm: data1/20090620155029: No space left on device >>> > rm: data1/20090619234313: No space left on device >>> > rm: data1/20090620115346: No space left on device >>> > rm: data1/20090620075508: No space left on device >>> > rm: data1/20090620145541: No space left on device >>> > rm: data1/20090620093335: No space left on device >>> > rm: data1/20090620101846: No space left on device >>> > rm: data1/20090620132456: No space left on device >>> > rm: data1/20090620040044: No space left on device >>> > rm: data1/20090620091401: No space left on device >>> > rm: data1/20090620162251: No space left on device >>> > rm: data1/20090619220813: No space left on device >>> > rm: data1/20090620010643: No space left on device >>> > rm: data1/20090620052218: No space left on device >>> > >>> > >>> > >>> > >>> > >>> > Regards, >>> > --- >>> > Mr. Olli >>> > >>> > _______________________________________________ >>> > freebsd-current@freebsd.org mailing list >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >>> > >>> >>> >>> >> >> > > > > -- > When bad men combine, the good must associate; else they will fall one > by one, an unpitied sacrifice in a contemptible struggle. > > ? ?Edmund Burke > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From mister.olli at googlemail.com Sat Jun 20 19:49:54 2009 From: mister.olli at googlemail.com (Mister Olli) Date: Sat Jun 20 19:50:02 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> Message-ID: <1245527381.26909.82.camel@phoenix.blechhirn.net> Hi, > This is a a known issue with write allocate file systems and snapshots. great so it's not something completly unknown... > I haven't seen this before on v13 without any snapshots. maybe I should mention that ZFS is running in a xen domU with 786MB ram, on i386 (as I already read that i386 can be troublesome).. > A few questions: some, yeah ;-)) > - How many file systems? I'm not sure how to count correclty, hut the 'zfs list' output is complete, with filesystems - test - test/data1 - test/data2 nothing more > - How old are the file systems? as in 'zpool get all' not older as 48 hours. > - How much churn has there been on the file system? not sure what you mean with 'churn' (there seem to be no translation to german that makes sense ;-)) > - Was this an upgraded v6 or created as v13? no. > - How many files on test? quite a lot, as I started with a bash loop that created 3k large files for 1/2 day, then switched to randomized sizes. - test/data1 has 57228 - test/data2 has 9024 (measured with 'ls -l /test/data2/data1 | cat -n | tail -n 10' -1) > ... as well as any other things that occur to you to characterize the > file system. all data on test/data1 was created using an endless bash loop to test if the system crashes, using while ( true ) ; do dd if=/dev/random of=/test/data1/`date +%Y%m%d%H%M%S` bs=1k count=3 ; sleep 1s; done while 'count=3' was replaced by 'count=$RANDOM' after approx. 16 hours. test/data2 is a copy of test/data1 which started as data1 used 1.62GB and ran until all space in pool was filled up. This lead to remaining copy processes aborted with 'no space left on device' failure. as the dir listing of test/data1 is too long for shell (sh/ bash) I did copying like this: cp -r /test/data1 /test/data2 That's pretty much everything I did. Let me know if you need further details. Regards, --- Mr. Olli > > Cheers, > Kip > > > On Sat, Jun 20, 2009 at 12:26 PM, Mister Olli wrote: > > Hi, > > > >> Do you have snapshots or run ZFS v6? > > neither one or the other. Here are my pool/ ZFS details. > > > > [root@template-8_CURRENT /test/data2]# zpool get all test > > NAME PROPERTY VALUE SOURCE > > test size 2.98G - > > test used 2.94G - > > test available 47.9M - > > test capacity 98% - > > test altroot - default > > test health ONLINE - > > test guid 5305090209740383945 - > > test version 13 default > > test bootfs - default > > test delegation on default > > test autoreplace off default > > test cachefile - default > > test failmode wait default > > test listsnapshots off default > > [root@template-8_CURRENT /test/data2]# zfs get all test > > NAME PROPERTY VALUE SOURCE > > test type filesystem - > > test creation Fri Jun 19 21:01 2009 - > > test used 1.96G - > > test available 0 - > > test referenced 26.6K - > > test compressratio 1.00x - > > test mounted yes - > > test quota none default > > test reservation none default > > test recordsize 128K default > > test mountpoint /test default > > test sharenfs off default > > test checksum on default > > test compression off default > > test atime on default > > test devices on default > > test exec on default > > test setuid on default > > test readonly off default > > test jailed off default > > test snapdir hidden default > > test aclmode groupmask default > > test aclinherit restricted default > > test canmount on default > > test shareiscsi off default > > test xattr off temporary > > test copies 1 default > > test version 3 - > > test utf8only off - > > test normalization none - > > test casesensitivity sensitive - > > test vscan off default > > test nbmand off default > > test sharesmb off default > > test refquota none default > > test refreservation none default > > test primarycache all default > > test secondarycache all default > > test usedbysnapshots 0 - > > test usedbydataset 26.6K - > > test usedbychildren 1.96G - > > test usedbyrefreservation 0 - > > [root@template-8_CURRENT /test/data2]# zfs list -t snapshot > > no datasets available > > > > > > > >> Confirm that you've deleted your snapshots and are running pool v13. > >> > >> Future ZFS mail should be directed to freebsd-fs@ > > Sorry for that. fixed now ;-)) > > > > Regards, > > --- > > Mr. Olli > > > > > >> > >> > >> On Sat, Jun 20, 2009 at 10:36 AM, Mister Olli wrote: > >> > Hi, > >> > > >> > after filling up a ZFS volume until the last byte, I'm unable to delete > >> > files, with error 'No space left on the device'. > >> > > >> > > >> > > >> > [root@template-8_CURRENT /test/data2]# df -h > >> > Filesystem Size Used Avail Capacity Mounted on > >> > /dev/ad0s1a 8.7G 5.2G 2.8G 65% / > >> > devfs 1.0K 1.0K 0B 100% /dev > >> > test 0B 0B 0B 100% /test > >> > test/data1 1.6G 1.6G 0B 100% /test/data1 > >> > test/data2 341M 341M 0B 100% /test/data2 > >> > [root@template-8_CURRENT /test/data2]# zfs list > >> > NAME USED AVAIL REFER MOUNTPOINT > >> > test 1.96G 0 26.6K /test > >> > test/data1 1.62G 0 1.62G /test/data1 > >> > test/data2 341M 0 341M /test/data2 > >> > [root@template-8_CURRENT /test/data2]# ls -l data1 |tail -n 20 <-- there are quite a lot of files, so I truncated ;-)) > >> > -rw-r--r-- 1 root wheel 3072 Jun 20 17:13 20090620165743 > >> > -rw-r--r-- 1 root wheel 9771008 Jun 20 17:11 20090620165803 > >> > -rw-r--r-- 1 root wheel 624640 Jun 20 17:12 20090620165809 > >> > -rw-r--r-- 1 root wheel 1777664 Jun 20 17:14 20090620165810 > >> > -rw-r--r-- 1 root wheel 4059136 Jun 20 17:15 20090620165817 > >> > -rw-r--r-- 1 root wheel 23778304 Jun 20 17:13 20090620165925 > >> > -rw-r--r-- 1 root wheel 20318208 Jun 20 17:13 20090620165952 > >> > -rw-r--r-- 1 root wheel 28394496 Jun 20 17:10 20090620170013 > >> > -rw-r--r-- 1 root wheel 23698432 Jun 20 17:12 20090620170021 > >> > -rw-r--r-- 1 root wheel 26476544 Jun 20 17:19 20090620170100 > >> > -rw-r--r-- 1 root wheel 19904512 Jun 20 17:15 20090620170132 > >> > -rw-r--r-- 1 root wheel 23815168 Jun 20 17:14 20090620170142 > >> > -rw-r--r-- 1 root wheel 6683648 Jun 20 17:11 20090620170225 > >> > -rw-r--r-- 1 root wheel 19619840 Jun 20 17:11 20090620170322 > >> > -rw-r--r-- 1 root wheel 13902848 Jun 20 17:13 20090620170331 > >> > -rw-r--r-- 1 root wheel 28981248 Jun 20 17:13 20090620170346 > >> > -rw-r--r-- 1 root wheel 18287616 Jun 20 17:11 20090620170355 > >> > -rw-r--r-- 1 root wheel 16762880 Jun 20 17:16 20090620170405 > >> > -rw-r--r-- 1 root wheel 26966016 Jun 20 17:10 20090620170429 > >> > -rw-r--r-- 1 root wheel 5252096 Jun 20 17:14 20090620170502 > >> > [root@template-8_CURRENT /test/data2]# rm -rf data1 > >> > rm: data1/20090620141524: No space left on device > >> > rm: data1/20090620025202: No space left on device > >> > rm: data1/20090620014926: No space left on device > >> > rm: data1/20090620075405: No space left on device > >> > rm: data1/20090620155124: No space left on device > >> > rm: data1/20090620105723: No space left on device > >> > rm: data1/20090620170100: No space left on device > >> > rm: data1/20090620040149: No space left on device > >> > rm: data1/20090620002512: No space left on device > >> > rm: data1/20090620052315: No space left on device > >> > rm: data1/20090620083750: No space left on device > >> > rm: data1/20090620063831: No space left on device > >> > rm: data1/20090620155029: No space left on device > >> > rm: data1/20090619234313: No space left on device > >> > rm: data1/20090620115346: No space left on device > >> > rm: data1/20090620075508: No space left on device > >> > rm: data1/20090620145541: No space left on device > >> > rm: data1/20090620093335: No space left on device > >> > rm: data1/20090620101846: No space left on device > >> > rm: data1/20090620132456: No space left on device > >> > rm: data1/20090620040044: No space left on device > >> > rm: data1/20090620091401: No space left on device > >> > rm: data1/20090620162251: No space left on device > >> > rm: data1/20090619220813: No space left on device > >> > rm: data1/20090620010643: No space left on device > >> > rm: data1/20090620052218: No space left on device > >> > > >> > > >> > > >> > > >> > > >> > Regards, > >> > --- > >> > Mr. Olli From mister.olli at googlemail.com Sat Jun 20 19:50:43 2009 From: mister.olli at googlemail.com (Mister Olli) Date: Sat Jun 20 19:50:49 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> Message-ID: <1245525965.26909.69.camel@phoenix.blechhirn.net> Hi, > Do you have snapshots or run ZFS v6? neither one or the other. Here are my pool/ ZFS details. [root@template-8_CURRENT /test/data2]# zpool get all test NAME PROPERTY VALUE SOURCE test size 2.98G - test used 2.94G - test available 47.9M - test capacity 98% - test altroot - default test health ONLINE - test guid 5305090209740383945 - test version 13 default test bootfs - default test delegation on default test autoreplace off default test cachefile - default test failmode wait default test listsnapshots off default [root@template-8_CURRENT /test/data2]# zfs get all test NAME PROPERTY VALUE SOURCE test type filesystem - test creation Fri Jun 19 21:01 2009 - test used 1.96G - test available 0 - test referenced 26.6K - test compressratio 1.00x - test mounted yes - test quota none default test reservation none default test recordsize 128K default test mountpoint /test default test sharenfs off default test checksum on default test compression off default test atime on default test devices on default test exec on default test setuid on default test readonly off default test jailed off default test snapdir hidden default test aclmode groupmask default test aclinherit restricted default test canmount on default test shareiscsi off default test xattr off temporary test copies 1 default test version 3 - test utf8only off - test normalization none - test casesensitivity sensitive - test vscan off default test nbmand off default test sharesmb off default test refquota none default test refreservation none default test primarycache all default test secondarycache all default test usedbysnapshots 0 - test usedbydataset 26.6K - test usedbychildren 1.96G - test usedbyrefreservation 0 - [root@template-8_CURRENT /test/data2]# zfs list -t snapshot no datasets available > Confirm that you've deleted your snapshots and are running pool v13. > > Future ZFS mail should be directed to freebsd-fs@ Sorry for that. fixed now ;-)) Regards, --- Mr. Olli > > > On Sat, Jun 20, 2009 at 10:36 AM, Mister Olli wrote: > > Hi, > > > > after filling up a ZFS volume until the last byte, I'm unable to delete > > files, with error 'No space left on the device'. > > > > > > > > [root@template-8_CURRENT /test/data2]# df -h > > Filesystem Size Used Avail Capacity Mounted on > > /dev/ad0s1a 8.7G 5.2G 2.8G 65% / > > devfs 1.0K 1.0K 0B 100% /dev > > test 0B 0B 0B 100% /test > > test/data1 1.6G 1.6G 0B 100% /test/data1 > > test/data2 341M 341M 0B 100% /test/data2 > > [root@template-8_CURRENT /test/data2]# zfs list > > NAME USED AVAIL REFER MOUNTPOINT > > test 1.96G 0 26.6K /test > > test/data1 1.62G 0 1.62G /test/data1 > > test/data2 341M 0 341M /test/data2 > > [root@template-8_CURRENT /test/data2]# ls -l data1 |tail -n 20 <-- there are quite a lot of files, so I truncated ;-)) > > -rw-r--r-- 1 root wheel 3072 Jun 20 17:13 20090620165743 > > -rw-r--r-- 1 root wheel 9771008 Jun 20 17:11 20090620165803 > > -rw-r--r-- 1 root wheel 624640 Jun 20 17:12 20090620165809 > > -rw-r--r-- 1 root wheel 1777664 Jun 20 17:14 20090620165810 > > -rw-r--r-- 1 root wheel 4059136 Jun 20 17:15 20090620165817 > > -rw-r--r-- 1 root wheel 23778304 Jun 20 17:13 20090620165925 > > -rw-r--r-- 1 root wheel 20318208 Jun 20 17:13 20090620165952 > > -rw-r--r-- 1 root wheel 28394496 Jun 20 17:10 20090620170013 > > -rw-r--r-- 1 root wheel 23698432 Jun 20 17:12 20090620170021 > > -rw-r--r-- 1 root wheel 26476544 Jun 20 17:19 20090620170100 > > -rw-r--r-- 1 root wheel 19904512 Jun 20 17:15 20090620170132 > > -rw-r--r-- 1 root wheel 23815168 Jun 20 17:14 20090620170142 > > -rw-r--r-- 1 root wheel 6683648 Jun 20 17:11 20090620170225 > > -rw-r--r-- 1 root wheel 19619840 Jun 20 17:11 20090620170322 > > -rw-r--r-- 1 root wheel 13902848 Jun 20 17:13 20090620170331 > > -rw-r--r-- 1 root wheel 28981248 Jun 20 17:13 20090620170346 > > -rw-r--r-- 1 root wheel 18287616 Jun 20 17:11 20090620170355 > > -rw-r--r-- 1 root wheel 16762880 Jun 20 17:16 20090620170405 > > -rw-r--r-- 1 root wheel 26966016 Jun 20 17:10 20090620170429 > > -rw-r--r-- 1 root wheel 5252096 Jun 20 17:14 20090620170502 > > [root@template-8_CURRENT /test/data2]# rm -rf data1 > > rm: data1/20090620141524: No space left on device > > rm: data1/20090620025202: No space left on device > > rm: data1/20090620014926: No space left on device > > rm: data1/20090620075405: No space left on device > > rm: data1/20090620155124: No space left on device > > rm: data1/20090620105723: No space left on device > > rm: data1/20090620170100: No space left on device > > rm: data1/20090620040149: No space left on device > > rm: data1/20090620002512: No space left on device > > rm: data1/20090620052315: No space left on device > > rm: data1/20090620083750: No space left on device > > rm: data1/20090620063831: No space left on device > > rm: data1/20090620155029: No space left on device > > rm: data1/20090619234313: No space left on device > > rm: data1/20090620115346: No space left on device > > rm: data1/20090620075508: No space left on device > > rm: data1/20090620145541: No space left on device > > rm: data1/20090620093335: No space left on device > > rm: data1/20090620101846: No space left on device > > rm: data1/20090620132456: No space left on device > > rm: data1/20090620040044: No space left on device > > rm: data1/20090620091401: No space left on device > > rm: data1/20090620162251: No space left on device > > rm: data1/20090619220813: No space left on device > > rm: data1/20090620010643: No space left on device > > rm: data1/20090620052218: No space left on device > > > > > > > > > > > > Regards, > > --- > > Mr. Olli > > > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > > > > From mister.olli at googlemail.com Sat Jun 20 19:55:13 2009 From: mister.olli at googlemail.com (Mister Olli) Date: Sat Jun 20 19:55:19 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> Message-ID: <1245527700.26909.86.camel@phoenix.blechhirn.net> Hi, sounds like a great idea, I'm gonna try that as soon as Kip Macy does not need further informations. The reason I filled up the pool was that I just got ZFS to work and started playing around to see how stable it works. As I wanna deploy it on my home FS (which has no heavy usage) it tried to simulate some work on the FS and came up with the bash loops (as described in my other mail). Filling up the pool happend 'accidentally'. btw I'm pretty much impressed how good it works. From the readings I assumed the first crash within minutes. Great job. Regards, --- Mr. Olli On Sat, 2009-06-20 at 22:42 +0300, Dan Naumov wrote: > Hi. > > As Kip pointed out, this is a known issue with write allocate > filesystems in general (not just ZFS). This is one of the several > reasons why SUN recommends you do not completely fill up a zpool (they > actually recommend to stay at or below 80% utilization). I have a > workaround for you, however: > > Pick a file you don't need on the filled up ZFS volume. "Empty" the > file contents in a way of your chosing. This should give you some disk > space needed to use "rm" and further empty up your filesystem and > allow for normal operation. This is a bit ugly, but it works. > > - Sincerely, > Dan Naumov > > > > > >>> On Sat, Jun 20, 2009 at 10:36 AM, Mister Olli wrote: > >>> > Hi, > >>> > > >>> > after filling up a ZFS volume until the last byte, I'm unable to delete > >>> > files, with error 'No space left on the device'. > >>> > > >>> > > >>> > > >>> > [root@template-8_CURRENT /test/data2]# df -h > >>> > Filesystem Size Used Avail Capacity Mounted on > >>> > /dev/ad0s1a 8.7G 5.2G 2.8G 65% / > >>> > devfs 1.0K 1.0K 0B 100% /dev > >>> > test 0B 0B 0B 100% /test > >>> > test/data1 1.6G 1.6G 0B 100% /test/data1 > >>> > test/data2 341M 341M 0B 100% /test/data2 > >>> > [root@template-8_CURRENT /test/data2]# zfs list > >>> > NAME USED AVAIL REFER MOUNTPOINT > >>> > test 1.96G 0 26.6K /test > >>> > test/data1 1.62G 0 1.62G /test/data1 > >>> > test/data2 341M 0 341M /test/data2 > >>> > [root@template-8_CURRENT /test/data2]# ls -l data1 |tail -n 20 <-- there are quite a lot of files, so I truncated ;-)) > >>> > -rw-r--r-- 1 root wheel 3072 Jun 20 17:13 20090620165743 > >>> > -rw-r--r-- 1 root wheel 9771008 Jun 20 17:11 20090620165803 > >>> > -rw-r--r-- 1 root wheel 624640 Jun 20 17:12 20090620165809 > >>> > -rw-r--r-- 1 root wheel 1777664 Jun 20 17:14 20090620165810 > >>> > -rw-r--r-- 1 root wheel 4059136 Jun 20 17:15 20090620165817 > >>> > -rw-r--r-- 1 root wheel 23778304 Jun 20 17:13 20090620165925 > >>> > -rw-r--r-- 1 root wheel 20318208 Jun 20 17:13 20090620165952 > >>> > -rw-r--r-- 1 root wheel 28394496 Jun 20 17:10 20090620170013 > >>> > -rw-r--r-- 1 root wheel 23698432 Jun 20 17:12 20090620170021 > >>> > -rw-r--r-- 1 root wheel 26476544 Jun 20 17:19 20090620170100 > >>> > -rw-r--r-- 1 root wheel 19904512 Jun 20 17:15 20090620170132 > >>> > -rw-r--r-- 1 root wheel 23815168 Jun 20 17:14 20090620170142 > >>> > -rw-r--r-- 1 root wheel 6683648 Jun 20 17:11 20090620170225 > >>> > -rw-r--r-- 1 root wheel 19619840 Jun 20 17:11 20090620170322 > >>> > -rw-r--r-- 1 root wheel 13902848 Jun 20 17:13 20090620170331 > >>> > -rw-r--r-- 1 root wheel 28981248 Jun 20 17:13 20090620170346 > >>> > -rw-r--r-- 1 root wheel 18287616 Jun 20 17:11 20090620170355 > >>> > -rw-r--r-- 1 root wheel 16762880 Jun 20 17:16 20090620170405 > >>> > -rw-r--r-- 1 root wheel 26966016 Jun 20 17:10 20090620170429 > >>> > -rw-r--r-- 1 root wheel 5252096 Jun 20 17:14 20090620170502 > >>> > [root@template-8_CURRENT /test/data2]# rm -rf data1 > >>> > rm: data1/20090620141524: No space left on device > >>> > rm: data1/20090620025202: No space left on device > >>> > rm: data1/20090620014926: No space left on device > >>> > rm: data1/20090620075405: No space left on device > >>> > rm: data1/20090620155124: No space left on device > >>> > rm: data1/20090620105723: No space left on device > >>> > rm: data1/20090620170100: No space left on device > >>> > rm: data1/20090620040149: No space left on device > >>> > rm: data1/20090620002512: No space left on device > >>> > rm: data1/20090620052315: No space left on device > >>> > rm: data1/20090620083750: No space left on device > >>> > rm: data1/20090620063831: No space left on device > >>> > rm: data1/20090620155029: No space left on device > >>> > rm: data1/20090619234313: No space left on device > >>> > rm: data1/20090620115346: No space left on device > >>> > rm: data1/20090620075508: No space left on device > >>> > rm: data1/20090620145541: No space left on device > >>> > rm: data1/20090620093335: No space left on device > >>> > rm: data1/20090620101846: No space left on device > >>> > rm: data1/20090620132456: No space left on device > >>> > rm: data1/20090620040044: No space left on device > >>> > rm: data1/20090620091401: No space left on device > >>> > rm: data1/20090620162251: No space left on device > >>> > rm: data1/20090619220813: No space left on device > >>> > rm: data1/20090620010643: No space left on device > >>> > rm: data1/20090620052218: No space left on device > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > Regards, > >>> > --- > >>> > Mr. Olli > >>> > > >>> > _______________________________________________ > >>> > freebsd-current@freebsd.org mailing list > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current > >>> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >>> > > >>> > >>> > >>> > >> > >> > > > > > > > > -- > > When bad men combine, the good must associate; else they will fall one > > by one, an unpitied sacrifice in a contemptible struggle. > > > > Edmund Burke > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From dan.naumov at gmail.com Sat Jun 20 21:29:28 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sat Jun 20 21:29:39 2009 Subject: ufs2 / softupdates / ZFS / disk write cache Message-ID: I have the following setup: A single consumer-grade 2tb SATA disk: Western Digital Green (model WDC WD20EADS-00R6B0). This disk is setup like this: 16gb root partition with UFS2 + softupdates, containing mostly static things: /bin /boot /etc /root /sbin /usr /var and such a 1,9tb non-redundant zfs pool on top of a slice, it hosts things like: /DATA, /home, /usr/local, /var/log and such. What should I do to ensure (as much as possible) filesystem consistency of the root filesystem in the case of the power loss? I know there have been a lot of discussions on the subject of consumer-level disks literally lying about the state of files in transit (disks telling the system that files have been written to disk while in reality they are still in disk's write cache), in turn throwing softupdates off balance (since softupdates assumes the disks don't lie about such things), in turn sometimes resulting in severe data losses in the case of a system power loss during heavy disk IO. One of the solutions that was often brought up in the mailing lists is disabling the actual disk write cache via adding hw.ata.wc=0 to /boot/loader.conf, FreeBSD 4.3 actually even had this setting by default, but this was apparently reverted back because some people have reported a write performance regression on the tune of becoming 4-6 times slower. So what should I do in my case? Should I disable disk write cache via the hw.ata.wc tunable? As far as I know, ZFS has a write cache of it's own and since the ufs2 root filesystem in my case is mostly static data, I am guessing I "shouldn't" notice that big of a performance hit. Or am I completely in the wrong here and setting hw.ata.wc=0 is going to adversely affect the write performance on both the root partition AND the zfs pool despite zfs using it's own write cache? Another thing I have been pondering is: I do have 2gb of space left unused on the system (currently being used as swap, I have 2 swap slices, one 1gb at the very beginning of the disk, the other being 2gb at the end), which I could turn into a GJOURNAL for the root filesystem... Sincerely, - Dan Naumov From ertr1013 at student.uu.se Sat Jun 20 23:26:55 2009 From: ertr1013 at student.uu.se (Erik Trulsson) Date: Sat Jun 20 23:27:01 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: Message-ID: <20090620231130.GA88907@owl.midgard.homeip.net> On Sun, Jun 21, 2009 at 12:29:26AM +0300, Dan Naumov wrote: > I have the following setup: > > A single consumer-grade 2tb SATA disk: Western Digital Green (model > WDC WD20EADS-00R6B0). This disk is setup like this: > > 16gb root partition with UFS2 + softupdates, containing mostly static things: > /bin /boot /etc /root /sbin /usr /var and such > > a 1,9tb non-redundant zfs pool on top of a slice, it hosts things like: > /DATA, /home, /usr/local, /var/log and such. > > What should I do to ensure (as much as possible) filesystem > consistency of the root filesystem in the case of the power loss? I > know there have been a lot of discussions on the subject of > consumer-level disks literally lying about the state of files in > transit (disks telling the system that files have been written to disk > while in reality they are still in disk's write cache), in turn > throwing softupdates off balance (since softupdates assumes the disks > don't lie about such things), in turn sometimes resulting in severe > data losses in the case of a system power loss during heavy disk IO. Note that this is not something specific to softupdates, but applies when you are not using softupdates as well. > > One of the solutions that was often brought up in the mailing lists is > disabling the actual disk write cache via adding hw.ata.wc=0 to > /boot/loader.conf, FreeBSD 4.3 actually even had this setting by > default, but this was apparently reverted back because some people > have reported a write performance regression on the tune of becoming > 4-6 times slower. So what should I do in my case? Should I disable > disk write cache via the hw.ata.wc tunable? As far as I know, ZFS has > a write cache of it's own and since the ufs2 root filesystem in my > case is mostly static data, I am guessing I "shouldn't" notice that > big of a performance hit. Or am I completely in the wrong here and > setting hw.ata.wc=0 is going to adversely affect the write performance > on both the root partition AND the zfs pool despite zfs using it's own > write cache? Why don't you try it and see if you notice the performance hit? You will almost certainly see some reduced write performance if you disable the disk's cache, but how noticable this will be for your setup and your disk usage is something only you can answer. My guess is that it will be quite noticable, but that is only a guess. (Keep in mind that UFS+softupdates does quite a bit of write-caching on its own, so just switching to ZFS is unlikely to improve write performance significantly compared to using UFS.) -- Erik Trulsson ertr1013@student.uu.se From kmacy at freebsd.org Sun Jun 21 00:37:01 2009 From: kmacy at freebsd.org (Kip Macy) Date: Sun Jun 21 00:37:06 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <1245527381.26909.82.camel@phoenix.blechhirn.net> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> <1245527381.26909.82.camel@phoenix.blechhirn.net> Message-ID: <3c1674c90906201736o335b4a3dv862139dc55f2211f@mail.gmail.com> > >> - How much churn has there been on the file system? > not sure what you mean with 'churn' (there seem to be no translation to > german that makes sense ;-)) > The closest I can come to defining it would be "fluctuation of contents". By this I was asking if you repeatedly created and deleted files. I didn't think it would be this easy to provoke this. I've tried provoking this with a handful of large files without success. I'm guessing that ZFS isn't as careful about bounding metadata creation as it is data creation. My focus is, and for the foreseeable future will be, fixing FreeBSD integration issues. As this is a fundamental ZFS space management issue I don't foresee fixing it any time soon if at all unless it is addressed by upgrading to v15 or v16. I hope that truncating some of the files will free up space for deletion. Cheers, Kip From kip.macy at gmail.com Sun Jun 21 01:08:36 2009 From: kip.macy at gmail.com (Kip Macy) Date: Sun Jun 21 01:08:43 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <20090620231130.GA88907@owl.midgard.homeip.net> References: <20090620231130.GA88907@owl.midgard.homeip.net> Message-ID: <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> > > My guess is that it will be quite noticable, but that is only a guess. > (Keep in mind that UFS+softupdates does quite a bit of write-caching on its > own, so just switching to ZFS is unlikely to improve write performance > significantly compared to using UFS.) That all depends on how much the drive relies on the write cache for batching writes to disk. Soft updates does a lot of small random writes for metadata updates which will likely be heavily penalized by the absence of write caching. On my SSD, which unfortunately turned out to be camera grade flash, with FFS the system was unusable when doing large numbers of metadata updates, svn checkouts would take hours. I postulated that ZFS would map well to the large erase blocks, so I destroyed /usr and recreated a zpool in its place. I now get random write performance better than FFS, "I lived happily ever after." I don't know if ZFS will provide the same benefit in your situation. My point is just that FFS+SU and ZFS are "apples and oranges." Please note that I've taken -stable off of the the CC, ZFS has been getting a lot of mailing list traffic lately and I've been hearing groans from certain quarters about it drowning out other discussions. Let's try to keep the discussions to freebsd-fs. Thanks, Kip From dan.naumov at gmail.com Sun Jun 21 02:18:41 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 21 02:18:47 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> Message-ID: I decided to do some performance tests of my own, "bonnie -s 4096" was used to obtain the results. Note that these results should be used to compare "write cache on" to "write cache off" and not to compare UFS2 vs ZFS, as the testing was done on different parts of the same physical disk (the UFS2 partition resides on the first 16gb of disk and ZFS pool takes the remaining ~1,9tb) and I am also using rather conservative ZFS tunables. UFS2 with write cache: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 4096 55457 95.9 91630 46.7 36264 37.5 46565 74.0 84751 33.7 164.3 10.3 UFS2 without write cache: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 4096 4938 46.9 4685 18.0 4288 21.8 17453 34.0 74232 31.6 165.0 9.9 As we can clearly see, the performance diffence between having disk cache enabled and disabled is _ENORMOUS_. In the case of sequential block write on UFS2, the performance loss is a staggering 94,89%. More surprinsingly, even reading seems to be affected in a noticable way, per char reads suffer a 62,62% penalty while block reads take a 12,42% hit. Moving on to testing ZFS with and without disk cache enabled: ZFS with write cache (384M ARC, 1GB max kmem): -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 4096 25972 66.1 45026 40.6 34269 36.0 46371 86.5 93973 34.6 84.5 8.5 ZFS without write cache (384M ARC, 1GB max kmem): -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 4096 2399 6.7 2258 3.5 2290 3.9 34380 66.1 85971 32.8 56.7 6.1 Uh oh.... After some digging around, I found the following quote: "ZFS is designed to work with storage devices that manage a disk-level cache. ZFS commonly asks the storage device to ensure that data is safely placed on stable storage by requesting a cache flush." at http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide I guess this might be somewhat related to why in the "disk cache disabled" scenario, ZFS suffers bigger losses than UFS2. It is quite obvious at this point that disabling disk cache in order have softupdates live in harmony with disks "lying" about whether disk cache contents have actually been committed to the disk in not in any way, shape or form a viable solution to the problem. On a sidenote, is there any way I can test whether *MY* disk is truthful about writing cache to disk or not? In the past (this was during my previous foray into the FreeBSD world, circa-2001/2002) I have suffered severe data corruption (leading to an unbootable system) using UFS2 + softupdates on 2 different occasions due to power losses and this past experience has me very worried about the proper way to configure my system to avoid such incidents in the future. - Sincerely, Dan Naumov On Sun, Jun 21, 2009 at 4:08 AM, Kip Macy wrote: >> >> My guess is that it will be quite noticable, but that is only a guess. >> (Keep in mind that UFS+softupdates does quite a bit of write-caching on its >> own, so just switching to ZFS is unlikely to improve write performance >> significantly compared to using UFS.) > > > That all depends on how much the drive relies on the write cache for > batching writes to disk. Soft updates does a lot of small random > writes for metadata updates which will likely be heavily penalized by > the absence of write caching. On my SSD, which unfortunately turned > out to be camera grade flash, with FFS the system was unusable when > doing large numbers of metadata updates, svn checkouts would take > hours. I postulated that ZFS would map well to the large erase blocks, > so I destroyed /usr and recreated a zpool in its place. I now get > random write performance ?better than FFS, "I lived happily ever > after." > > I don't know if ZFS will provide the same benefit in your situation. > My point is just that FFS+SU and ZFS are "apples and oranges." > > Please note that I've taken -stable off of the the CC, ZFS has been > getting a lot of mailing list traffic lately and I've been hearing > groans from certain quarters about it drowning out other discussions. > Let's try to keep the discussions to freebsd-fs. > > > Thanks, > Kip From nhoyle at hoyletech.com Sun Jun 21 02:48:50 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Sun Jun 21 02:48:57 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> Message-ID: <4A3D9F87.5090108@hoyletech.com> Dan Naumov wrote: > I decided to do some performance tests of my own, "bonnie -s 4096" was > used to obtain the results. Note that these results should be used to > compare "write cache on" to "write cache off" and not to compare UFS2 > vs ZFS, as the testing was done on different parts of the same > physical disk (the UFS2 partition resides on the first 16gb of disk > and ZFS pool takes the remaining ~1,9tb) and I am also using rather > conservative ZFS tunables. > > > UFS2 with write cache: > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 4096 55457 95.9 91630 46.7 36264 37.5 46565 74.0 84751 33.7 164.3 10.3 > > UFS2 without write cache: > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 4096 4938 46.9 4685 18.0 4288 21.8 17453 34.0 74232 31.6 165.0 9.9 > > > As we can clearly see, the performance diffence between having disk > cache enabled and disabled is _ENORMOUS_. In the case of sequential > block write on UFS2, the performance loss is a staggering 94,89%. More > surprinsingly, even reading seems to be affected in a noticable way, > per char reads suffer a 62,62% penalty while block reads take a 12,42% > hit. Moving on to testing ZFS with and without disk cache enabled: > > > ZFS with write cache (384M ARC, 1GB max kmem): > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 4096 25972 66.1 45026 40.6 34269 36.0 46371 86.5 93973 34.6 84.5 8.5 > > ZFS without write cache (384M ARC, 1GB max kmem): > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 4096 2399 6.7 2258 3.5 2290 3.9 34380 66.1 85971 32.8 56.7 6.1 > > > Uh oh.... After some digging around, I found the following quote: "ZFS > is designed to work with storage devices that manage a disk-level > cache. ZFS commonly asks the storage device to ensure that data is > safely placed on stable storage by requesting a cache flush." at > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide I > guess this might be somewhat related to why in the "disk cache > disabled" scenario, ZFS suffers bigger losses than UFS2. > > It is quite obvious at this point that disabling disk cache in order > have softupdates live in harmony with disks "lying" about whether disk > cache contents have actually been committed to the disk in not in any > way, shape or form a viable solution to the problem. On a sidenote, is > there any way I can test whether *MY* disk is truthful about writing > cache to disk or not? > > In the past (this was during my previous foray into the FreeBSD world, > circa-2001/2002) I have suffered severe data corruption (leading to an > unbootable system) using UFS2 + softupdates on 2 different occasions > due to power losses and this past experience has me very worried about > the proper way to configure my system to avoid such incidents in the > future. > > > - Sincerely, > Dan Naumov > > > > Dan, Top posting on mailing lists is bad (and not the preferred convention for this list). The performance numbers are startling, and good to have. You could also try setting the 'sync' flag on the FFS+SU mount to see what that looks like, it should give a small extra measure of protection. Since that mount shouldn't be write-heavy, I wouldn't expect much (perceived) performance hit (though the bonnie numbers may be ugly). As Peter Jeremy responded in your question about whether or not your proposed configuration looked sane (your post from the 14th), one solid strategy is to have an *offline* copy of your root filesystem. This ensures that outstanding disk writes cannot leave that instance in an unusable form, and helps protect you from all the evilness that can occur to online/mounted filesystems. On Linux systems where the kernel image and grub config usually reside in /boot, I usually make that a separate partition and set 'noauto' on the /etc/fstab so that it is never mounted except when I'm installing a new kernel or updating my grub config. -Nathanael From ertr1013 at student.uu.se Sun Jun 21 09:27:41 2009 From: ertr1013 at student.uu.se (Erik Trulsson) Date: Sun Jun 21 09:27:48 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> Message-ID: <20090621092736.GA92656@owl.midgard.homeip.net> On Sun, Jun 21, 2009 at 05:18:39AM +0300, Dan Naumov wrote: > Uh oh.... After some digging around, I found the following quote: "ZFS > is designed to work with storage devices that manage a disk-level > cache. ZFS commonly asks the storage device to ensure that data is > safely placed on stable storage by requesting a cache flush." at > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide I > guess this might be somewhat related to why in the "disk cache > disabled" scenario, ZFS suffers bigger losses than UFS2. If that quote is correct (and I have no real reason to doubt it) then it should probably be safe to enable the disk's write cache when used with ZFS. (That would make sense since UFS/FFS was originally designed to work with an older generation of disks that did not do any significant amount of write-caching (partly due to having very little cache on them), while ZFS has been designed to be used on modern hardware, and to be reliable even on cheap consumer-grade disks.) > > It is quite obvious at this point that disabling disk cache in order > have softupdates live in harmony with disks "lying" about whether disk > cache contents have actually been committed to the disk in not in any > way, shape or form a viable solution to the problem. On a sidenote, is > there any way I can test whether *MY* disk is truthful about writing > cache to disk or not? If you have IDE/SATA disks they will "lie". SCSI/SAS disks won't. SATA disks using NCQ should probably also be safe -- too bad FreeBSD does not support NCQ yet. > > In the past (this was during my previous foray into the FreeBSD world, > circa-2001/2002) I have suffered severe data corruption (leading to an > unbootable system) using UFS2 + softupdates on 2 different occasions > due to power losses and this past experience has me very worried about > the proper way to configure my system to avoid such incidents in the > future. > > > - Sincerely, > Dan Naumov > > -- Erik Trulsson ertr1013@student.uu.se From remko at FreeBSD.org Sun Jun 21 09:45:32 2009 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Sun Jun 21 09:45:39 2009 Subject: bin/135710: mount(8): mount -t tmpfs does not follow 'size' option Message-ID: <200906210945.n5L9jVOS009819@freefall.freebsd.org> Synopsis: mount(8): mount -t tmpfs does not follow 'size' option Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: remko Responsible-Changed-When: Sun Jun 21 09:45:21 UTC 2009 Responsible-Changed-Why: reassign to fs team http://www.freebsd.org/cgi/query-pr.cgi?pr=135710 From dan.naumov at gmail.com Sun Jun 21 10:03:18 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 21 10:03:24 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <20090621092736.GA92656@owl.midgard.homeip.net> References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> <20090621092736.GA92656@owl.midgard.homeip.net> Message-ID: On Sun, Jun 21, 2009 at 12:27 PM, Erik Trulsson wrote: > On Sun, Jun 21, 2009 at 05:18:39AM +0300, Dan Naumov wrote: >> Uh oh.... After some digging around, I found the following quote: "ZFS >> is designed to work with storage devices that manage a disk-level >> cache. ZFS commonly asks the storage device to ensure that data is >> safely placed on stable storage by requesting a cache flush." at >> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide I >> guess this might be somewhat related to why in the "disk cache >> disabled" scenario, ZFS suffers bigger losses than UFS2. > > If that quote is correct (and I have no real reason to doubt it) then > it should probably be safe to enable the disk's write cache when used with > ZFS. ?(That would make sense since UFS/FFS was originally designed to work > with an older generation of disks that did not do any significant amount > of write-caching (partly due to having very little cache on them), while > ZFS has been designed to be used on modern hardware, and to be reliable even > on cheap consumer-grade disks.) Actually, now that I think of it, this could be pretty big. If using ZFS on a disk will cause the disk to flush the cache every 5 seconds, wouldn't that mean that the sections of the cache that hold data from the UFS partition get flushed to disk as well, mostly eleminating the entire "disk cache lying = softupdates inconsistent" problem altogether? The most important part of this is obviously, whether the "ZFS forces cache flushes every 5 seconds) thing works in all cases (like mine, where I use ZFS on a slice) and not only those where ZFS is given direct access to the disk. Anyone knowledgable in the ways of FreeBSD ZFS implementation care to chip in? :) Sincerely, Dan Naumov From dan.naumov at gmail.com Sun Jun 21 10:42:00 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 21 10:42:06 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> <20090621092736.GA92656@owl.midgard.homeip.net> Message-ID: On Sun, Jun 21, 2009 at 1:03 PM, Dan Naumov wrote: > On Sun, Jun 21, 2009 at 12:27 PM, Erik Trulsson wrote: >> On Sun, Jun 21, 2009 at 05:18:39AM +0300, Dan Naumov wrote: >>> Uh oh.... After some digging around, I found the following quote: "ZFS >>> is designed to work with storage devices that manage a disk-level >>> cache. ZFS commonly asks the storage device to ensure that data is >>> safely placed on stable storage by requesting a cache flush." at >>> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide I >>> guess this might be somewhat related to why in the "disk cache >>> disabled" scenario, ZFS suffers bigger losses than UFS2. >> >> If that quote is correct (and I have no real reason to doubt it) then >> it should probably be safe to enable the disk's write cache when used with >> ZFS. ?(That would make sense since UFS/FFS was originally designed to work >> with an older generation of disks that did not do any significant amount >> of write-caching (partly due to having very little cache on them), while >> ZFS has been designed to be used on modern hardware, and to be reliable even >> on cheap consumer-grade disks.) > > Actually, now that I think of it, this could be pretty big. If using > ZFS on a disk will cause the disk to flush the cache every 5 seconds, > wouldn't that mean that the sections of the cache that hold data from > the UFS partition get flushed to disk as well, mostly eleminating the > entire "disk cache lying = softupdates inconsistent" problem > altogether? The most important part of this is obviously, whether the > "ZFS forces cache flushes every 5 seconds) thing works in all cases > (like mine, where I use ZFS on a slice) and not only those where ZFS > is given direct access to the disk. Anyone knowledgable in the ways of > FreeBSD ZFS implementation care to chip in? :) Actually, if it is possible for ZFS to issue "flush the cache NOW" commands directly to disk every 5 seconds by default (value tunable) I see 2 potential options/changes that would make the life of "UFS2+softupdates on SATA disks" users a whole lot easier. One option would be to add this same functionality to softupdates, making softupdates force a disk cache flush to ensure consistency. Another option would be to have a loader.conf tunable where you could enable and manually adjust the time intervals of forced disk cache flushes (without any regard for actual filesystem used). The latter option is a bit uglier, but still a LOT less ugly than suggesting people disable disk cache altogether ending up with 2-4MB/s write speeds on modern hardware. Should I perhaps Should I perhaps do a "proposed change" send-PR regarding either option? - Sincerely, Dan Naumov From andrew at modulus.org Sun Jun 21 11:45:18 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 21 11:45:25 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> <20090621092736.GA92656@owl.midgard.homeip.net> Message-ID: <4A3E1C72.6000406@modulus.org> Dan Naumov wrote: > Actually, if it is possible for ZFS to issue "flush the cache NOW" > commands directly to disk every 5 seconds by default (value tunable) I > see 2 potential options/changes that would make the life of > "UFS2+softupdates on SATA disks" users a whole lot easier. I believe you might be barking up the wrong tree here. The issue with cache flushing is not that changes are written to disk - that happens anyway, eventually, even if you don't explicitly flush the cache. The big issue is that things get written to disk in the correct order so that the system can recover from an unexpected crash. Even if the cache gets flushed every 5 seconds, things on UFS could get still get corrupted between T=0 and 4.9s, if the system writes something and expects it to be written syncronously and it is isn't. The only truly safe bet here is to turn disable the ATA write cache tunable. Folks who need to maximize safety and can't afford the performance hit of no write cache need to do what they always have had to do in the past - buy a controller card with battery-backed cached. - Andrew From numisemis at yahoo.com Sun Jun 21 17:33:35 2009 From: numisemis at yahoo.com (=?utf-8?B?xaBpbXVuIE1pa2VjaW4=?=) Date: Sun Jun 21 17:34:07 2009 Subject: ufs2 / softupdates / ZFS / disk write cache Message-ID: <570433.20373.qm@web37308.mail.mud.yahoo.com> 21. lip. 2009., u 13:41, Andrew Snow napisao: > Folks who need to maximize safety and can't afford the performance > hit of no write cache need to do what they always have had to do in > the past - buy a controller card with battery-backed cached. Or: B) use SCSI instead of ATA disks C) use UFS+gjournal instead of UFS+SU D) use ZFS instead of UFS+SU From mister.olli at googlemail.com Sun Jun 21 18:06:04 2009 From: mister.olli at googlemail.com (Mister Olli) Date: Sun Jun 21 18:06:10 2009 Subject: Unable to delete files on ZFS volume In-Reply-To: <3c1674c90906201736o335b4a3dv862139dc55f2211f@mail.gmail.com> References: <1245519413.26909.60.camel@phoenix.blechhirn.net> <3c1674c90906201050w15e4cd5dpae76cd70d64b4e92@mail.gmail.com> <1245525965.26909.69.camel@phoenix.blechhirn.net> <3c1674c90906201232x63ddee19yf91aeac30f3401bb@mail.gmail.com> <1245527381.26909.82.camel@phoenix.blechhirn.net> <3c1674c90906201736o335b4a3dv862139dc55f2211f@mail.gmail.com> Message-ID: <1245607551.4757.18.camel@phoenix.blechhirn.net> hi, On Sat, 2009-06-20 at 17:36 -0700, Kip Macy wrote: > > [snip] > > The closest I can come to defining it would be "fluctuation of > contents". By this I was asking if you repeatedly created and deleted > files. ok, I understand now. There's actually no real fluctuation, since I only add content, but do not delete anything. > I didn't think it would be this easy to provoke this. I've tried > provoking this with a handful of large files without success. I'm > guessing that ZFS isn't as careful about bounding metadata creation as > it is data creation. > > My focus is, and for the foreseeable future will be, fixing FreeBSD > integration issues. As this is a fundamental ZFS space management > issue I don't foresee fixing it any time soon if at all unless it is > addressed by upgrading to v15 or v16. Is that possible already? From my understanding 8-CURRENT creates pools with version13 atm. > I hope that truncating some of the files will free up space for deletion. would be nice to test that, but I just recognized that my xen play machine is down due to power supply problems. Since I don't have the time to fix that, regard this problem as 'unimorrtant' ;-)) Anyway I will give you feedback, when I managed to repair it. Regards, --- Mr. Olli From andrew at modulus.org Sun Jun 21 20:54:07 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 21 20:54:14 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <570433.20373.qm@web37308.mail.mud.yahoo.com> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> Message-ID: <4A3E9D81.1060406@modulus.org> ?imun Mikecin wrote: > 21. lip. 2009., u 13:41, Andrew Snow napisao: >> Folks who need to maximize safety and can't afford the performance >> hit of no write cache need to do what they always have had to do in >> the past - buy a controller card with battery-backed cached. > > Or: > B) use SCSI instead of ATA disks > C) use UFS+gjournal instead of UFS+SU > D) use ZFS instead of UFS+SU All of these solutions still involve disabling of write cache, with a performance hit of varying degrees. (I have tried all of those except gjournal!) From kmacy at freebsd.org Sun Jun 21 20:59:49 2009 From: kmacy at freebsd.org (Kip Macy) Date: Sun Jun 21 20:59:56 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <20090621092736.GA92656@owl.midgard.homeip.net> References: <20090620231130.GA88907@owl.midgard.homeip.net> <3c1674c90906201808t1854dd46n82213fbd0c1c254c@mail.gmail.com> <20090621092736.GA92656@owl.midgard.homeip.net> Message-ID: <3c1674c90906211359l78e1e953lfe208067aa673873@mail.gmail.com> > If you have IDE/SATA disks they will "lie". ?SCSI/SAS disks won't. > SATA disks using NCQ should probably also be safe -- too bad FreeBSD > does not support NCQ yet. > NCQ is working in a private branch. I haven't heard an ETA from the developer. -Kip From ronald-freebsd8 at klop.yi.org Sun Jun 21 22:25:58 2009 From: ronald-freebsd8 at klop.yi.org (Ronald Klop) Date: Sun Jun 21 22:26:09 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: Message-ID: On Sat, 20 Jun 2009 23:29:26 +0200, Dan Naumov wrote: > I have the following setup: > > A single consumer-grade 2tb SATA disk: Western Digital Green (model > WDC WD20EADS-00R6B0). This disk is setup like this: > > 16gb root partition with UFS2 + softupdates, containing mostly static > things: > /bin /boot /etc /root /sbin /usr /var and such > > a 1,9tb non-redundant zfs pool on top of a slice, it hosts things like: > /DATA, /home, /usr/local, /var/log and such. > > What should I do to ensure (as much as possible) filesystem > consistency of the root filesystem in the case of the power loss? I > know there have been a lot of discussions on the subject of > consumer-level disks literally lying about the state of files in > transit (disks telling the system that files have been written to disk > while in reality they are still in disk's write cache), in turn > throwing softupdates off balance (since softupdates assumes the disks > don't lie about such things), in turn sometimes resulting in severe > data losses in the case of a system power loss during heavy disk IO. > > One of the solutions that was often brought up in the mailing lists is > disabling the actual disk write cache via adding hw.ata.wc=0 to > /boot/loader.conf, FreeBSD 4.3 actually even had this setting by > default, but this was apparently reverted back because some people > have reported a write performance regression on the tune of becoming > 4-6 times slower. So what should I do in my case? Should I disable > disk write cache via the hw.ata.wc tunable? As far as I know, ZFS has > a write cache of it's own and since the ufs2 root filesystem in my > case is mostly static data, I am guessing I "shouldn't" notice that > big of a performance hit. Or am I completely in the wrong here and > setting hw.ata.wc=0 is going to adversely affect the write performance > on both the root partition AND the zfs pool despite zfs using it's own > write cache? > > Another thing I have been pondering is: I do have 2gb of space left > unused on the system (currently being used as swap, I have 2 swap > slices, one 1gb at the very beginning of the disk, the other being 2gb > at the end), which I could turn into a GJOURNAL for the root > filesystem... Using gjournal is a very trusted way for a good balance in consistency and speed. I don't know about any performance impact of having the journal at the other 'end' of the disk than where the fs is. You can try, because switching back is possible. Ronald. From dan.naumov at gmail.com Sun Jun 21 22:36:28 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 21 22:36:34 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <4A3E9D81.1060406@modulus.org> Message-ID: ---------- Forwarded message ---------- From: Dan Naumov Date: Mon, Jun 22, 2009 at 1:34 AM Subject: Re: ufs2 / softupdates / ZFS / disk write cache To: Andrew Snow On Sun, Jun 21, 2009 at 11:52 PM, Andrew Snow wrote: > > ?imun Mikecin wrote: >> >> 21. lip. 2009., u 13:41, Andrew Snow napisao: >>> >>> Folks who need to maximize safety and can't afford the performance ?hit of no write cache need to do what they always have had to do in ?the past - buy a controller card with battery-backed cached. >> >> Or: >> B) use SCSI instead of ATA disks >> C) use UFS+gjournal instead of UFS+SU >> D) use ZFS instead of UFS+SU > > > All of these solutions still involve disabling of write cache, with a performance hit of varying degrees. (I have tried all of those except gjournal!) Why would using UFS+gjournal, ZFS or SCSI still involve disabling write cache? Disabling write cache is only suggested for systems using UFS + SU on ATA/PATA disks. This in turn actually raises a new point: Why is UFS + SU still the default option provided by the sysinstall installation process? A simple truth is that most FreeBSD users on a modern system are going to be using SATA disks. Another simple truth is that most new users are going to go with what the default installation process suggests and as shown by the benchmark results a few posts back, noone is going to be disabling the write cache unless they are completely out of their mind (who on earth is going to accept 2-4 MB/s write speeds from a modern disk in 2009?). - Dan Naumov From andrew at modulus.org Sun Jun 21 22:51:31 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 21 22:51:38 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <4A3E9D81.1060406@modulus.org> Message-ID: <4A3EB902.8080503@modulus.org> Dan Naumov wrote: >>> Or: >>> B) use SCSI instead of ATA disks >>> C) use UFS+gjournal instead of UFS+SU >>> D) use ZFS instead of UFS+SU >> All of these solutions still involve disabling of write cache, with a performance hit of varying degrees. (I have tried all of those except gjournal!) B) SCSI drives come with write caching disabled by default. But here, the performance loss is partially made up by Tagged Command Queueing and faster spindle speeds C) gjournal needs to flush the disk cache regularly to maintain consistence. It doesn't need to do it as often but on a write-heavy system it isn't ideal for performance because it flushes everything in the cache and not just the journal. D) ZFS - same as (C) > who on earth is going to accept 2-4 MB/s write speeds from > a modern disk in 2009? eg. remote headless systems which don't do much (DNS server) :-) - Andrew From dan.naumov at gmail.com Sun Jun 21 22:57:22 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 21 22:57:28 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <570433.20373.qm@web37308.mail.mud.yahoo.com> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> Message-ID: 2009/6/21 ?imun Mikecin > > 21. lip. 2009., u 13:41, Andrew Snow napisao: > > Folks who need to maximize safety and can't afford the performance > > hit of no write cache need to do what they always have had to do in > > the past - buy a controller card with battery-backed cached. > > Or: > B) use SCSI instead of ATA disks > C) use UFS+gjournal instead of UFS+SU > D) use ZFS instead of UFS+SU Actually I think a need a few clarifications regarding ZFS: 1) Does FreeBSD honor the "flush the cache to disk now" commands issued by ZFS to the harrdive only when ZFS is used directly on top of a disk device directly or does this also work when ZFS is used on top of a slice/partition? 2) If we compare ZFS vs UFS+SU while using a regular "lying" SATA disk (with write cache enabled) under heavy IO followed by a power loss. Which one is going to recover better and why? Sincerely, - Dan Naumov From nhoyle at hoyletech.com Mon Jun 22 01:17:57 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Mon Jun 22 01:18:04 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <4A3EB902.8080503@modulus.org> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <4A3E9D81.1060406@modulus.org> <4A3EB902.8080503@modulus.org> Message-ID: <4A3EDBB8.6010402@hoyletech.com> Andrew Snow wrote: > Dan Naumov wrote: >>>> Or: >>>> B) use SCSI instead of ATA disks >>>> C) use UFS+gjournal instead of UFS+SU >>>> D) use ZFS instead of UFS+SU >>> All of these solutions still involve disabling of write cache, with >>> a performance hit of varying degrees. (I have tried all of those >>> except gjournal!) > > B) SCSI drives come with write caching disabled by default. But here, > the performance loss is partially made up by Tagged Command Queueing > and faster spindle speeds > > C) gjournal needs to flush the disk cache regularly to maintain > consistence. It doesn't need to do it as often but on a write-heavy > system it isn't ideal for performance because it flushes everything in > the cache and not just the journal. > > D) ZFS - same as (C) > As a minor nitpick to point D, IIRC it is possible to explicitly place the ZIL on a different device than the pool it is for. In this case, if the ZIL is on a dedicated device, then it is possible to flush only the ZIL, rather than all data pending in cache for the zpool. I realize it's a minor distinction / special case, but the option is worth mentioning. -Nathanael From nhoyle at hoyletech.com Mon Jun 22 01:22:11 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Mon Jun 22 01:22:17 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <570433.20373.qm@web37308.mail.mud.yahoo.com> Message-ID: <4A3EDCBA.7010204@hoyletech.com> Dan Naumov wrote: > 2009/6/21 ?imun Mikecin > > >> 21. lip. 2009., u 13:41, Andrew Snow napisao: >> >>> Folks who need to maximize safety and can't afford the performance >>> hit of no write cache need to do what they always have had to do in >>> the past - buy a controller card with battery-backed cached. >>> >> Or: >> B) use SCSI instead of ATA disks >> C) use UFS+gjournal instead of UFS+SU >> D) use ZFS instead of UFS+SU >> > > > Actually I think a need a few clarifications regarding ZFS: > > 1) Does FreeBSD honor the "flush the cache to disk now" commands issued by > ZFS to the harrdive only when ZFS is used directly on top of a disk device > directly or does this also work when ZFS is used on top of a > slice/partition? > 2) If we compare ZFS vs UFS+SU while using a regular "lying" SATA disk (with > write cache enabled) under heavy IO followed by a power loss. Which one is > going to recover better and why? > > > Sincerely, > - Dan Naumov > ZFS should recover better I believe. It's copy-on-write semantics mean that you always have a valid, intact (even if not the most recent) copy of the data. I believe with soft updates it is still possible to have partially written metadata updates cause problems. I'm not as much of an expert on soft updates semantics however, so I'll defer to those who are to correct me if I'm off-base. -Nathanael From nhoyle at hoyletech.com Mon Jun 22 01:42:54 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Mon Jun 22 01:43:00 2009 Subject: bin/135710: mount(8): mount -t tmpfs does not follow 'size' option In-Reply-To: <200906210945.n5L9jVOS009819@freefall.freebsd.org> References: <200906210945.n5L9jVOS009819@freefall.freebsd.org> Message-ID: <4A3EE185.9040205@hoyletech.com> remko@FreeBSD.org wrote: > Synopsis: mount(8): mount -t tmpfs does not follow 'size' option > > Responsible-Changed-From-To: freebsd-bugs->freebsd-fs > Responsible-Changed-By: remko > Responsible-Changed-When: Sun Jun 21 09:45:21 UTC 2009 > Responsible-Changed-Why: > reassign to fs team > > http://www.freebsd.org/cgi/query-pr.cgi?pr=135710 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Quoting Peter Snyder in "tmpfs: A Virtual Memory File System", in which he details the design for Solaris tmpfs, which served as the foundation for NetBSD's tmpfs, which was in turn integrated into FreeBSD: "Instead of allocating a fixed amount of memory for exclusive use as a file system, tmpfs file system size is dynamic depending on use, allowing the system to decide the optimal use of memory." And from the FreeBSD tmpfs(5) man page, "*size* - maximum size (in bytes) for the file system", note the use of the word "maximum" versus say, "initial". I would attempt to actually populate/fill the created file system, and unless it returns as being full with less than the size specified worth of data, I believe the behavior observed is as intended. -Nathanael From bugmaster at FreeBSD.org Mon Jun 22 11:06:54 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jun 22 11:07:56 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200906221106.n5MB6r0P018000@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/135710 fs mount(8): mount -t tmpfs does not follow 'size' option o kern/135594 fs [zfs] Single dataset unresponsive with Samba o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135480 fs [zfs] panic: lock &arg.lock already initialized o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135412 fs [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREA o bin/135314 fs [zfs] assertion failed for zdb(8) usage o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/135039 fs [zfs] mkstemp() fails over NFS when server uses ZFS (7 f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 145 problems total. From numisemis at yahoo.com Mon Jun 22 12:19:02 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Mon Jun 22 12:19:08 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: References: <570433.20373.qm@web37308.mail.mud.yahoo.com> Message-ID: <289445.67836.qm@web37308.mail.mud.yahoo.com> Dan Naumov wrote: > Actually I think a need a few clarifications regarding ZFS: > 1) Does FreeBSD honor the "flush the cache to disk now" commands issued by ZFS to the harrdive only when ZFS is used directly on top of a disk device directly or does this also work when ZFS is used on top of a slice/partition? > 2) If we compare ZFS vs UFS+SU while using a regular "lying" SATA disk (with write cache enabled) under heavy IO followed by a power loss. Which one is going to recover better and why? 1) AFAIK on FreeBSD (in contrary to Solaris) there is no difference wheter you use a whole disk or a slice/partition. 2) I wouldn't use UFS+SU on [S]ATA disks because your background fsck will simetimes give up stating something like "unexpected softupdates inconsistency" (unless you had disabled write cache, which you don't really want to) and you will have to do a manual foreground fsck yourself. The choice should (in my opinion) be: ZFS for amd64, UFS+gjournal for i386. Both (ZFS and UFS+gjournal) will not have any recovery penalty if you have write cache enabled. If you have a controller with battery backup cache, you could even run ZFS with disabled cache flush (i don't know wheter it can be disabled on gjournal), but I'm not sure that you will get any real word performance improvement by doing it. From numisemis at yahoo.com Mon Jun 22 12:44:37 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Mon Jun 22 12:44:44 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <4A3EB902.8080503@modulus.org> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <4A3E9D81.1060406@modulus.org> <4A3EB902.8080503@modulus.org> Message-ID: <457065.69200.qm@web37305.mail.mud.yahoo.com> Andrew Snow wrote: > B) SCSI drives come with write caching disabled by default. But here, the performance loss is partially made up by Tagged Command Queueing and faster spindle speeds SCSI drives get configured by the controller BIOS and/or kernel driver. It usually means that both TCQ and write cache become enabled. It is considered safe to enable write cache if TCQ is enabled. The same would be valid for SATA with NCQ enabled. But FreeBSD does not (yet) support SATA NCQ. From numisemis at yahoo.com Mon Jun 22 13:06:44 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Mon Jun 22 13:06:50 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <4A3EB902.8080503@modulus.org> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <4A3E9D81.1060406@modulus.org> <4A3EB902.8080503@modulus.org> Message-ID: <303630.93763.qm@web37304.mail.mud.yahoo.com> Andrew Snow wrote: > All of these solutions still involve disabling of write cache, with a performance hit of varying degrees. (I have tried all of those except gjournal!) Can you please elaborate this statement? AFAIK all of these solutions are absolutelly safe to use with write cache enabled: - SCSI disks are safe with TCQ enabled (not sure when TCQ is disabled) - UFS+gjournal is safe because of regular disk cache flushing - ZFS is safe by design From gavin at FreeBSD.org Mon Jun 22 13:21:23 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Mon Jun 22 13:21:29 2009 Subject: kern/135039: [zfs] mkstemp() fails over NFS when server uses ZFS (7-stable only) [regression] Message-ID: <200906221321.n5MDLMh6029508@freefall.freebsd.org> Synopsis: [zfs] mkstemp() fails over NFS when server uses ZFS (7-stable only) [regression] State-Changed-From-To: open->closed State-Changed-By: gavin State-Changed-When: Mon Jun 22 13:18:35 UTC 2009 State-Changed-Why: Close this PR, kern/135412 is a duplicate of this, but contains a simple test case Responsible-Changed-From-To: freebsd-fs->gavin Responsible-Changed-By: gavin Responsible-Changed-When: Mon Jun 22 13:18:35 UTC 2009 Responsible-Changed-Why: Track http://www.freebsd.org/cgi/query-pr.cgi?pr=135039 From dan.naumov at gmail.com Mon Jun 22 13:21:48 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Mon Jun 22 13:21:55 2009 Subject: ufs2 / softupdates / ZFS / disk write cache In-Reply-To: <289445.67836.qm@web37308.mail.mud.yahoo.com> References: <570433.20373.qm@web37308.mail.mud.yahoo.com> <289445.67836.qm@web37308.mail.mud.yahoo.com> Message-ID: On Mon, Jun 22, 2009 at 3:19 PM, Simun Mikecin wrote: > 2) I wouldn't use UFS+SU on [S]ATA disks because your background fsck will > simetimes give up stating something like "unexpected softupdates > inconsistency" (unless you had disabled write cache, which you don't really > want to) and you will have to do a manual foreground fsck yourself. > The choice should (in my opinion) be: ZFS for amd64, UFS+gjournal for i386. > Both (ZFS and UFS+gjournal) will not have any recovery penalty if you have > write cache enabled. You seem to be thinking the way I am thinking :) My biggest concern is for the new users coming to FreeBSD, most of whom are going to end up with having UFS2+SU on their SATA disks, which is not the best defaults to have. I wonder if people running away screaming from sysinstall code has anything to do with why gjournal and zfs are not suggested as an option during the system installation procedure. Is the general consensus that adding these options to the install will only happen if/when FreeBSD moves on to a new installer system? Out of curiosity, how many beers would folks have to chime in for somebody knowledgeable enough to implement direct support for gjournal and zfs into sysinstall? :) > If you have a controller with battery backup cache, you could even run ZFS > with disabled cache flush (i don't know wheter it can be disabled on > gjournal), but I'm not sure that you will get any real word performance > improvement by doing it. In their own documentation, SUN recommends against disabling cache flushing in most cases, as the performance gain is going to be very negligible. However, there is a special case with "smart" SAN devices, where SUN strongly recommends disabling cache flushes because otherwise you are going to suffer through serious performance losses. - Sincerely, Dan Naumov From gavin at FreeBSD.org Mon Jun 22 14:10:04 2009 From: gavin at FreeBSD.org (Gavin Atkinson) Date: Mon Jun 22 14:10:10 2009 Subject: kern/135412: ZFS issue Message-ID: <200906221410.n5MEA3uQ061414@freefall.freebsd.org> The following reply was made to PR kern/135412; it has been noted by GNATS. From: Gavin Atkinson To: bug-followup@FreeBSD.org Cc: Danny Braniss Subject: Re: kern/135412: ZFS issue Date: Mon, 22 Jun 2009 14:30:27 +0100 Other useful information (from PR kern/135039): This appears to only affect 7-STABLE since the ZFS merge, but doesn't affect -HEAD. After the recent import of ZFS v13 into 7-STABLE, an mkstemp() call from an NFS client to a ZFS-backed NFS server will fail: the syscall returns EIO and the server will have created a 0-byte file with 000 permissions. This breaks not just mktemp but also mv, tar, rsync... Kip Macy said there's a flags check that is too strict, in email message <3c1674c90905280025i17039257l573838d33d8493fd@mail.gmail.com> Otherwise, use cp and rm instead of mv, or use scp instead of NFS, or use UFS2 on the server To submitter: did you upgrade your on-disk pools to v13, or is running the new code and v6 pools enough to show the problem? What is the output of "zpool upgrade"? From danny at cs.huji.ac.il Mon Jun 22 15:20:05 2009 From: danny at cs.huji.ac.il (Danny Braniss) Date: Mon Jun 22 15:20:11 2009 Subject: kern/135412: ZFS issue Message-ID: <200906221520.n5MFK4Xr014924@freefall.freebsd.org> The following reply was made to PR kern/135412; it has been noted by GNATS. From: Danny Braniss To: Gavin Atkinson Cc: bug-followup@FreeBSD.org Subject: Re: kern/135412: ZFS issue Date: Mon, 22 Jun 2009 17:53:44 +0300 > Other useful information (from PR kern/135039): > > This appears to only affect 7-STABLE since the ZFS merge, but doesn't > affect -HEAD. > > After the recent import of ZFS v13 into 7-STABLE, an mkstemp() call from > an NFS client to a ZFS-backed NFS server will fail: the syscall returns > EIO and the server will have created a 0-byte file with 000 permissions. > This breaks not just mktemp but also mv, tar, rsync... > > Kip Macy said there's a flags check that is too strict, in email message > <3c1674c90905280025i17039257l573838d33d8493fd@mail.gmail.com> > Otherwise, use cp and rm instead of mv, or use scp instead of NFS, or > use UFS2 on the server > > To submitter: did you upgrade your on-disk pools to v13, or is running > the new code and v6 pools enough to show the problem? What is the > output of "zpool upgrade"? it happens ONLY because zfs is v13, irrelevant if the pools have been upgraded, created, or not upgraded. so yes, just running a newer -stable is enough to show the problem. Since I am using ZFS + NFS, and providing service to several hundred users, telling them not to use tar, svn, rsync, etc is not an option :-) also, NFS V2 is broken/un-maintained. danny > From kabaev at gmail.com Mon Jun 22 16:47:18 2009 From: kabaev at gmail.com (Alexander Kabaev) Date: Mon Jun 22 16:47:25 2009 Subject: kern/135412: ZFS issue In-Reply-To: <200906221520.n5MFK4Xr014924@freefall.freebsd.org> References: <200906221520.n5MFK4Xr014924@freefall.freebsd.org> Message-ID: <20090622121543.7bc4204b@kan.dnsalias.net> On Mon, 22 Jun 2009 15:20:04 GMT Danny Braniss wrote: > also, NFS V2 is broken/un-maintained. Hi Danny, could you please tell me what to look for and I will take a stab at reproducing/fixing it. -- Alexander Kabaev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 188 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090622/d4487335/signature.pgp From update+2soyojoy at facebookmail.com Mon Jun 22 17:41:23 2009 From: update+2soyojoy at facebookmail.com (Facebook) Date: Mon Jun 22 17:41:30 2009 Subject: Reminder: Martin invited you to join Facebook... Message-ID: <5e72a609e648a6086bf07ff5177e7e12@localhost.localdomain> ======================================= To sign up for Facebook, follow the link below: http://www.facebook.com/r.php?re=dd6bdc1df057a0e4e623e1a3655eb135&mid=a9aa01G66ef269cG81e63G46 ======================================= Hi Freebsd-fs, The following person recently invited you to be their friend on Facebook: Martin Hepworth Other people you may know on Facebook: George Horton (Los Angeles, CA) Steve Pirnie Phil Wagner Don Wershba (New York, NY) Nick Fragile (West Midlands) Jonathan Scott Headland Facebook is a great place to keep in touch with friends, post photos, videos and create events. But first you need to join! Sign up today to create a profile and connect with the people you know. Thanks, The Facebook Team To sign up for Facebook, follow the link below: http://www.facebook.com/r.php?re=dd6bdc1df057a0e4e623e1a3655eb135&mid=a9aa01G66ef269cG81e63G46 ======================================= This message was intended for freebsd-fs@freebsd.org. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe. http://www.facebook.com/o.php?c&k=1e3cd0&u=1726949020&mid=a9aa01G66ef269cG81e63G46 Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304. From remko at FreeBSD.org Mon Jun 22 20:52:21 2009 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Mon Jun 22 20:52:27 2009 Subject: bin/135710: mount(8): mount -t tmpfs does not follow 'size' option Message-ID: <200906222052.n5MKqLD6078860@freefall.freebsd.org> Synopsis: mount(8): mount -t tmpfs does not follow 'size' option State-Changed-From-To: open->closed State-Changed-By: remko State-Changed-When: Mon Jun 22 20:52:20 UTC 2009 State-Changed-Why: Nathanael gave a pretty good explaination which I follow; the behaviour is intended. http://www.freebsd.org/cgi/query-pr.cgi?pr=135710 From mikej at paymentallianceintl.com Mon Jun 22 23:35:25 2009 From: mikej at paymentallianceintl.com (Michael Jung) Date: Mon Jun 22 23:35:31 2009 Subject: bin/135710: mount(8): mount -t tmpfs does not follow 'size' option In-Reply-To: <200906222052.n5MKqLD6078860@freefall.freebsd.org> References: <200906222052.n5MKqLD6078860@freefall.freebsd.org> Message-ID: Thank you both for your kind responses. I've been reading man pages since 2.2 and my brain just didn't get what my eyes read. --mikej Michael Jung Payment Alliance International 11857 Commonwealth Drive Louisville, KY 40299 502-212-4045 Work Voice 502-212-4004 Work Facsimile -----Original Message----- From: remko@FreeBSD.org [mailto:remko@FreeBSD.org] Sent: Monday, June 22, 2009 4:52 PM To: Michael Jung; remko@FreeBSD.org; freebsd-fs@FreeBSD.org Subject: Re: bin/135710: mount(8): mount -t tmpfs does not follow 'size' option Synopsis: mount(8): mount -t tmpfs does not follow 'size' option State-Changed-From-To: open->closed State-Changed-By: remko State-Changed-When: Mon Jun 22 20:52:20 UTC 2009 State-Changed-Why: Nathanael gave a pretty good explaination which I follow; the behaviour is intended. http://www.freebsd.org/cgi/query-pr.cgi?pr=135710 CONFIDENTIALITY NOTE: This message is intended only for the use of the individual or entity to whom it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify us by telephone at (502) 212-4001 or notify us at PAI , Dept. 99, 11857 Commonwealth Drive, Louisville, KY 40299. Thank you. From danny at cs.huji.ac.il Tue Jun 23 07:11:53 2009 From: danny at cs.huji.ac.il (Danny Braniss) Date: Tue Jun 23 07:12:01 2009 Subject: kern/135412: ZFS issue In-Reply-To: <20090622121543.7bc4204b@kan.dnsalias.net> References: <200906221520.n5MFK4Xr014924@freefall.freebsd.org> <20090622121543.7bc4204b@kan.dnsalias.net> Message-ID: > On Mon, 22 Jun 2009 15:20:04 GMT > Danny Braniss wrote: > > > also, NFS V2 is broken/un-maintained. > > Hi Danny, > > could you please tell me what to look for and I will take a stab at > reproducing/fixing it. Hi Alexander, removing a file from a read-only NFS/V2 will hang the client. The main reason that V2 is still around, IMHO, is that diskless boot relies on it, so fixing boot to use NFS/v3 would be more productive. Cheers, danny From numisemis at yahoo.com Tue Jun 23 10:01:54 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Tue Jun 23 10:02:01 2009 Subject: 7.2-STABLE: swap on ZFS v13 Message-ID: <288239.1376.qm@web37303.mail.mud.yahoo.com> Hi! Before ZFS v13 import to the 7-STABLE branch, it was a well known fact that swap on ZFS volume (zvol) will not work. Recently, I tried to use swap on ZFS v13 (7-STABLE/amd64). It seems that it works. I used 4k for the volblocksize as that is equivalent to hardware page size. I tried with compression enabled and disabled. Enabling compression reduces number of bytes that need to be swapped in and out to the zvol. zfs create -o volblocksize=4k -o compression=on -o org.freebsd:swap=on -V 4g tank/swap swapon /dev/zvol/tank/swap Is there anybody else who tried 7-STABLE/amd64 with swap on ZFS v13? Does it work or not? From dan.naumov at gmail.com Tue Jun 23 10:10:47 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Tue Jun 23 10:10:53 2009 Subject: 7.2-STABLE: swap on ZFS v13 In-Reply-To: <288239.1376.qm@web37303.mail.mud.yahoo.com> References: <288239.1376.qm@web37303.mail.mud.yahoo.com> Message-ID: Pardon my perhaps stupid question, but what exactly does the "-o org.freebsd:swap=on" option do? Couldn't find anything related during a quick glance at the zfs manpage (on a 7.2-release/amd64 system with zfsv6). - Sincerely, Dan Naumov On Tue, Jun 23, 2009 at 1:01 PM, Simun Mikecin wrote: > > Hi! > > Before ZFS v13 import to the 7-STABLE branch, it was a well known fact that swap on ZFS volume (zvol) will not work. > Recently, I tried to use swap on ZFS v13 (7-STABLE/amd64). It seems that it works. > > I used 4k for the volblocksize as that is equivalent to hardware page size. > I tried with compression enabled and disabled. Enabling compression reduces number of bytes that need to be swapped in and out to the zvol. > > zfs create -o volblocksize=4k -o compression=on -o org.freebsd:swap=on -V 4g tank/swap > swapon /dev/zvol/tank/swap > > Is there anybody else who tried 7-STABLE/amd64 with swap on ZFS v13? Does it work or not? > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From numisemis at yahoo.com Tue Jun 23 10:19:38 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Tue Jun 23 10:19:44 2009 Subject: 7.2-STABLE: swap on ZFS v13 In-Reply-To: References: <288239.1376.qm@web37303.mail.mud.yahoo.com> Message-ID: <173835.76209.qm@web37306.mail.mud.yahoo.com> Dan Naumov wrote: > Pardon my perhaps stupid question, but what exactly does the "-o > org.freebsd:swap=on" option do? Couldn't find anything related during > a quick glance at the zfs manpage (on a 7.2-release/amd64 system with > zfsv6). Take a look at /etc/rc.d/zfs. Since /etc/fstab is not a supported way for adding swap on zvols, /etc/rc.d/zfs script (executed during boot if you have zfs_enable="YES" in your /etc/rc.conf[.local]) will execute "swapon" on all zvols that have org.freebsd:swap=on. From dan.naumov at gmail.com Tue Jun 23 10:34:38 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Tue Jun 23 10:34:45 2009 Subject: 7.2-STABLE: swap on ZFS v13 In-Reply-To: <173835.76209.qm@web37306.mail.mud.yahoo.com> References: <288239.1376.qm@web37303.mail.mud.yahoo.com> <173835.76209.qm@web37306.mail.mud.yahoo.com> Message-ID: On Tue, Jun 23, 2009 at 1:19 PM, Simun Mikecin wrote: > > Dan Naumov wrote: >> Pardon my perhaps stupid question, but what exactly does the "-o >> org.freebsd:swap=on" option do? Couldn't find anything related during >> a quick glance at the zfs manpage (on a 7.2-release/amd64 system with >> zfsv6). > Take a look at /etc/rc.d/zfs. > Since /etc/fstab is not a supported way for adding swap on zvols, /etc/rc.d/zfs script (executed during boot if you have zfs_enable="YES" in your /etc/rc.conf[.local]) will execute "swapon" on all zvols that have org.freebsd:swap=on. Oh, right. Kinda weird though that this is already included in /etc/rc.d/zfs already if "swap on ZFS" is considered even more experimental than ZFS support itself. Is this documented anywhere or is it just one of the things you learn by reading rc.d scripts? :) Out of curiosity, what kind of horrible breakage should I expect if I try enabling swap on a zvol using zfsv6 that comes with 7.2-release? - Dan Naumov From gary.jennejohn at freenet.de Tue Jun 23 12:24:06 2009 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Tue Jun 23 12:24:18 2009 Subject: 7.2-STABLE: swap on ZFS v13 In-Reply-To: <288239.1376.qm@web37303.mail.mud.yahoo.com> References: <288239.1376.qm@web37303.mail.mud.yahoo.com> Message-ID: <20090623142402.2c8d4b65@ernst.jennejohn.org> On Tue, 23 Jun 2009 03:01:53 -0700 (PDT) Simun Mikecin wrote: > Before ZFS v13 import to the 7-STABLE branch, it was a well known fact that swap on ZFS volume (zvol) will not work. > Recently, I tried to use swap on ZFS v13 (7-STABLE/amd64). It seems that it works. > > I used 4k for the volblocksize as that is equivalent to hardware page size. > I tried with compression enabled and disabled. Enabling compression reduces number of bytes that need to be swapped in and out to the zvol. > > zfs create -o volblocksize=4k -o compression=on -o org.freebsd:swap=on -V 4g tank/swap > swapon /dev/zvol/tank/swap > > Is there anybody else who tried 7-STABLE/amd64 with swap on ZFS v13? Does it work or not? > I doubt that you can generate a crash dump using ZFS as swap, especially if you have compression enabled. But I might be wrong. Does anyone know? --- Gary Jennejohn From dan.naumov at gmail.com Tue Jun 23 12:50:07 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Tue Jun 23 12:50:14 2009 Subject: 7.2-STABLE: swap on ZFS v13 In-Reply-To: <20090623142402.2c8d4b65@ernst.jennejohn.org> References: <288239.1376.qm@web37303.mail.mud.yahoo.com> <20090623142402.2c8d4b65@ernst.jennejohn.org> Message-ID: On Tue, Jun 23, 2009 at 3:24 PM, Gary Jennejohn wrote: >> Is there anybody else who tried 7-STABLE/amd64 with swap on ZFS v13? Does it work or not? >> > > I doubt that you can generate a crash dump using ZFS as swap, especially > if you have compression enabled. > > But I might be wrong. > > Does anyone know? If this is true and this is the only serious limitation of using a zvol as swap, I could very well see myself moving to using it for swap in the near future (as it would lead to a lot cleaner filesystem layouts on multidisk systems utilizing zfs). If by any odd chance, I do run into a reproducible system crash I have to debug, I could just attach an USB stick and temporarily move the swap to that. - Dan Naumov From mandrews at bit0.com Tue Jun 23 18:22:58 2009 From: mandrews at bit0.com (Mike Andrews) Date: Tue Jun 23 18:23:05 2009 Subject: weird problem w/ ZFS not reclaiming freed space In-Reply-To: References: Message-ID: On Fri, 19 Jun 2009, Mike Andrews wrote: > Somehow I've managed to get ZFS on one of my machines into a state where it > won't reclaim all space after deleting files AND snapshots off of it: > (this is with 7.2-STABLE amd64, compiled June 10) > > # ls -la /weird > total 4 > drwxr-x--- 2 mysql mysql 2 Jun 19 02:42 . > drwxr-xr-x 29 root wheel 1024 Jun 19 02:44 .. > > # df /weird > Filesystem 1K-blocks Used Avail Capacity Mounted on > scotch/weird 282201472 109151232 173050240 39% /weird > > # zfs list scotch/weird > NAME USED AVAIL REFER MOUNTPOINT > scotch/weird 104G 164G 104G /weird > > # zfs list -t snapshot | grep scotch/weird > > # zfs get all scotch/weird > NAME PROPERTY VALUE SOURCE > scotch/weird type filesystem - > scotch/weird creation Wed Jun 17 1:20 2009 - > scotch/weird used 104G - > scotch/weird available 159G - > scotch/weird referenced 104G - > scotch/weird compressratio 1.00x - > scotch/weird mounted yes - > scotch/weird quota none default > scotch/weird reservation none default > scotch/weird recordsize 128K default > scotch/weird mountpoint /weird local > scotch/weird sharenfs off default > scotch/weird checksum on default > scotch/weird compression off default > scotch/weird atime off local > scotch/weird devices on default > scotch/weird exec off local > scotch/weird setuid off local > scotch/weird readonly off default > scotch/weird jailed off default > scotch/weird snapdir hidden default > scotch/weird aclmode groupmask default > scotch/weird aclinherit restricted default > scotch/weird canmount on default > scotch/weird shareiscsi off default > scotch/weird xattr off temporary > scotch/weird copies 1 default > scotch/weird version 3 - > scotch/weird utf8only off - > scotch/weird normalization none - > scotch/weird casesensitivity sensitive - > scotch/weird vscan off default > scotch/weird nbmand off default > scotch/weird sharesmb off default > scotch/weird refquota none default > scotch/weird refreservation none default > scotch/weird primarycache all default > scotch/weird secondarycache all default > scotch/weird usedbysnapshots 0 - > scotch/weird usedbydataset 104G - > scotch/weird usedbychildren 0 - > scotch/weird usedbyrefreservation 0 - > > > If I then rsync stuff to it, space seems OK, if I continue to rsync to it > every few hours, the used space grows, even if no snapshots are being taken > If I do take snapshots, then change stuff, then delete the snapshots, the > snapshot space does appear to be reclaimed. Also if I 'zfs destroy' the > filesystem, the space is correctly reclaimed, but once I create a new one > and repeat the process, the problem reappears. > > I have not had any luck reproducing this on another machine yet, but > admittedly haven't tried super hard yet. > > Scrubbing the zpool returns no errors. > > I'm guessing zdb is my only hope at debugging this, but as I've never used it > before and as it seems to dump core whenever I try running it, can someone > suggest what I need to check/look for in it? > > I did also have a panic a few days ago that, based on the text, might be > related (I do have the vmdump and core.txt) > > panic: solaris assert: P2PHASE(start, 1ULL << sm->sm_shift) == 0, file: > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, > line: 146 > > ...for which I have a vmdump and a core.txt if anyone wants to look at it. Just to update this and move it to the -fs mailing list, removing the "--sparse" flag from rsync solves this problem, so the bug has something to do with ZFS's handling of sparse files. From kmacy at freebsd.org Wed Jun 24 03:56:02 2009 From: kmacy at freebsd.org (Kip Macy) Date: Wed Jun 24 03:56:09 2009 Subject: ZFS filesystem not showing total size? In-Reply-To: <20090624014855.GB79749@logik.internal.network> References: <20090623003117.GA94466@logik.internal.network> <20090623050707.GB21349@logik.internal.network> <20090624004726.GA25475@blazingdot.com> <20090624014855.GB79749@logik.internal.network> Message-ID: <3c1674c90906232056s2040d6d4u59550b6e5d12a4d0@mail.gmail.com> freebsd-fs is more appropriate - ideally freebsd-questions but I'm not sure how much your mileage will vary there. On Tue, Jun 23, 2009 at 6:48 PM, wrote: > On 2009-06-23 17:47:26, Marcus Reid wrote: >> Hi, >> >> As a side note, freebsd-hackers is not the correct list for this question. > > I considered it a hackers@ question due to the large "ZFS is an experimental > feature" warning. > > xw > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From rihad at mail.ru Wed Jun 24 08:42:36 2009 From: rihad at mail.ru (rihad) Date: Wed Jun 24 08:42:43 2009 Subject: Is it preferable to use the sync command? Message-ID: <4A41E073.70902@mail.ru> Hi, all, Having experienced a FreeBSD 5.1 crash due to power failure (despite using a UPS) resulting in massive /etc corruption and data loss, in order to minimize future risks should I: 0) tweak (decrease) these default sysctls: kern.filedelay: 30 kern.dirdelay: 29 kern.metadelay: 28 1) mount the root FS with soft-updates enabled (left as disabled in sysinstall by default due to known reasons) 2) setup a cron job calling /bin/sync every minute I somehow feel that turning soft-updates on would do the trick (it is not normally written to and has plenty of free space anyway). Thanks in advance for any educated tips. From numisemis at yahoo.com Wed Jun 24 09:28:23 2009 From: numisemis at yahoo.com (Simun Mikecin) Date: Wed Jun 24 09:28:29 2009 Subject: Is it preferable to use the sync command? In-Reply-To: <4A41E073.70902@mail.ru> References: <4A41E073.70902@mail.ru> Message-ID: <130804.21855.qm@web37303.mail.mud.yahoo.com> rihad wrote: > Having experienced a FreeBSD 5.1 crash due to power failure (despite using a UPS) resulting in massive /etc corruption and data loss, in order to minimize future risks should I: > 0) tweak (decrease) these default sysctls: > kern.filedelay: 30 > kern.dirdelay: 29 > kern.metadelay: 28 > 1) mount the root FS with soft-updates enabled (left as disabled in sysinstall by default due to known reasons) > 2) setup a cron job calling /bin/sync every minute > I somehow feel that turning soft-updates on would do the trick (it is not normally written to and has plenty of free space anyway). Do you use ATA or SCSI? Turning soft-updates on for SCSI should do the trick. Since there is no support for gjournal and/or ZFS on 5.1, for ATA only real solution would be disabling write-cache (which degrades performance): "sysctl hw.ata.wc=0". From rihad at mail.ru Wed Jun 24 09:48:40 2009 From: rihad at mail.ru (rihad) Date: Wed Jun 24 09:48:47 2009 Subject: Is it preferable to use the sync command? In-Reply-To: <130804.21855.qm@web37303.mail.mud.yahoo.com> References: <4A41E073.70902@mail.ru> <130804.21855.qm@web37303.mail.mud.yahoo.com> Message-ID: <4A41F672.9080900@mail.ru> Simun Mikecin wrote: > rihad wrote: >> Having experienced a FreeBSD 5.1 crash due to power failure (despite using a UPS) resulting in massive /etc corruption and data loss, in order to minimize future risks should I: >> 0) tweak (decrease) these default sysctls: >> kern.filedelay: 30 >> kern.dirdelay: 29 >> kern.metadelay: 28 >> 1) mount the root FS with soft-updates enabled (left as disabled in sysinstall by default due to known reasons) >> 2) setup a cron job calling /bin/sync every minute >> I somehow feel that turning soft-updates on would do the trick (it is not normally written to and has plenty of free space anyway). > > > Do you use ATA or SCSI? ATA. > Turning soft-updates on for SCSI should do the trick. But not for ATA? Why I'm asking: other partitions using soft-updates don't seem to have lost any data. > Since there is no support for gjournal and/or ZFS on 5.1, for ATA only real solution would be disabling write-cache (which degrades performance): "sysctl hw.ata.wc=0". > I think this is much easier to do remotely than turning soft-updates on :-) I'll still try both solutions, thanks. From dan.naumov at gmail.com Wed Jun 24 20:27:33 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Wed Jun 24 20:27:40 2009 Subject: read benchmarks: ufs/zfs/ext3 raidz/raid5 Message-ID: Another FreeBSD person on a forum I frequent did some read benchmarks on his system: Athlon64 3500+ with 2GB DDR2 SDRAM, a WD 250GB system drive, and 5 Seagate Barracuda 750GB SATA-II data drives. ZFS and UFS testing was done using FreeBSD 7.2-RELEASE amd64, and ext3 testing was done using Ubuntu Server 8.04-LTS amd64. The used disks do not support NCQ, so there is no "NCQ advantage" on the Linux side. Random Access reads, 5MB chunks: http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-5mb.png Random Access reads, 1MB chunks: http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-1mb.png Random Access reads, 5MB chunks (big list): http://virtual.tehinterweb.net/livejournal/raid_performance/raid-diskperf-5mb-all.png Here is the original forum discussion thread: http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/857002910041 Sincerely, - Dan Naumov From fjwcash at gmail.com Wed Jun 24 22:35:26 2009 From: fjwcash at gmail.com (Freddie Cash) Date: Wed Jun 24 22:35:33 2009 Subject: Fail-over SAN setup: ZFS, NFS, and ...? Message-ID: [Not exactly sure which ML this belongs on, as it's related to both clustering and filesystems. If there's a better spot, let me know and I'll update the CC:/reply-to.] We're in the planning stages for building a multi-site, fail-over SAN setup which will be used to provide redundant storage for a virtual machine setup. The setup will be like so: [Server Room 1] . [Server Room 2] ----------------- . ------------------- . [storage server] . [storage server] | . | | . | [storage switch] . [storage switch] \----fibre----/ | . | . | . [storage aggregator] . | . | . /---[switch]---\ . | | | . | [VM box] | . | | | . [VM box] | | . | | [VM box] . | | | . [network switch] . | . | . [internet] Server room 1 and server room 2 are on opposite ends of town (about 3 km) with a dedicated, direct-link, fibre link between them. There will be a set of VM boxes at each site, that use the shared storage, and will act as fail-over for each other. In theory, only 1 server room would ever be active at a time, although we may end up migrating VMs between the two sites for maintenance purposes. We've got the storage server side of things figured out (5U rackmounts with 24 drive bauys, using FreeBSD 7.x and ZFS). We've got the storage switches picked out (HP Procurve 2800 or 2900, depending on if we go with 1 GbE or 10 GbE fibre links between them). We're stuck on the storage aggregator. For a single aggregator box setup, we'd use FreeBSD 7.x with ZFS. The storage servers would each export a single zvol using iSCSI. The storage aggregator would use ZFS to create a pool using a mirrored vdev. To expand the pool, we put in two more storage servers, and add another mirrored vdev to the pool. No biggie. The storage aggregator then uses NFS and/or iSCSI to make storage available to the VM boxes. This is the easy part. However, we'd like to remove the single-point-of-failure that the storage aggregator represents, and have a duplicate of it running at Server Room 1. Right now, we can do this using cold-spares that rsync from the live box every X hours/days. We'd like this to be a live, fail-over spare, though. And this is where we're stuck. What can we use to do this? CARP? Heatbeat? ggate? Should we look at Linux with DRBD or linux-ha or cluster-nfs or similar? Perhaps RedHat Cluster Suite? (We'd prefer not to, as then storage management becomes a nightmare again, requiring mdadm, lvm, and more.) Would a cluster filessytem be needed? AFS or similar? We have next to no knowledge of fail-over clustering when it comes to high-availability and fail-over. Any pointers to things to read online, or tips, or even "don't do that, you're insane" comments greatly appreciated. :) Thanks. -- Freddie Cash fjwcash@gmail.com From ivoras at freebsd.org Wed Jun 24 23:03:16 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Jun 24 23:03:49 2009 Subject: read benchmarks: ufs/zfs/ext3 raidz/raid5 In-Reply-To: References: Message-ID: Dan Naumov wrote: > Another FreeBSD person on a forum I frequent did some read benchmarks > on his system: Athlon64 3500+ with 2GB DDR2 SDRAM, a WD 250GB system > drive, and 5 Seagate Barracuda 750GB SATA-II data drives. ZFS and UFS > testing was done using FreeBSD 7.2-RELEASE amd64, and ext3 testing was > done using Ubuntu Server 8.04-LTS amd64. The used disks do not support > NCQ, so there is no "NCQ advantage" on the Linux side. > > Random Access reads, 5MB chunks: > http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-5mb.png > Random Access reads, 1MB chunks: > http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-1mb.png > Random Access reads, 5MB chunks (big list): > http://virtual.tehinterweb.net/livejournal/raid_performance/raid-diskperf-5mb-all.png > > Here is the original forum discussion thread: > http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/857002910041 This looks about consistent with what I see when comparing ext3 to ZFS and/or UFS on 7.x systems. I have theories and hunches why IO in FreeBSD is slower than on Linux (and for different RAID tools / GEOM classes the reasons are slightly different) but nothing I can back with proof and measurements. I can only confirm that it is consistently slower. I haven't yet tried testing 8 so if you're looking for ideas for testing, try it (disable debugging before you use 8-CURRENT). I think ZFS has some concurrency-enhancing additions there (which will help in case the benchmark was done with a tool testing concurrency; I don't see what was used in the above pages). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090624/976dbe72/signature.pgp From efinleywork at efinley.com Thu Jun 25 00:04:12 2009 From: efinleywork at efinley.com (Elliot Finley) Date: Thu Jun 25 00:04:24 2009 Subject: Fail-over SAN setup: ZFS, NFS, and ...? In-Reply-To: References: Message-ID: <4A42B3C7.9000500@efinley.com> Why not take a look at gluster? Freddie Cash wrote: > [Not exactly sure which ML this belongs on, as it's related to both > clustering and filesystems. If there's a better spot, let me know and I'll > update the CC:/reply-to.] > > We're in the planning stages for building a multi-site, fail-over SAN setup > which will be used to provide redundant storage for a virtual machine setup. > The setup will be like so: > [Server Room 1] . [Server Room 2] > ----------------- . ------------------- > . > [storage server] . [storage server] > | . | > | . | > [storage switch] . [storage switch] > \----fibre----/ | > . | > . | > . [storage aggregator] > . | > . | > . /---[switch]---\ > . | | | > . | [VM box] | > . | | | > . [VM box] | | > . | | [VM box] > . | | | > . [network switch] > . | > . | > . [internet] > > Server room 1 and server room 2 are on opposite ends of town (about 3 km) > with a dedicated, direct-link, fibre link between them. There will be a set > of VM boxes at each site, that use the shared storage, and will act as > fail-over for each other. In theory, only 1 server room would ever be > active at a time, although we may end up migrating VMs between the two sites > for maintenance purposes. > > We've got the storage server side of things figured out (5U rackmounts with > 24 drive bauys, using FreeBSD 7.x and ZFS). We've got the storage switches > picked out (HP Procurve 2800 or 2900, depending on if we go with 1 GbE or 10 > GbE fibre links between them). We're stuck on the storage aggregator. > > For a single aggregator box setup, we'd use FreeBSD 7.x with ZFS. The > storage servers would each export a single zvol using iSCSI. The storage > aggregator would use ZFS to create a pool using a mirrored vdev. To expand > the pool, we put in two more storage servers, and add another mirrored vdev > to the pool. No biggie. The storage aggregator then uses NFS and/or iSCSI > to make storage available to the VM boxes. This is the easy part. > > However, we'd like to remove the single-point-of-failure that the storage > aggregator represents, and have a duplicate of it running at Server Room 1. > Right now, we can do this using cold-spares that rsync from the live box > every X hours/days. We'd like this to be a live, fail-over spare, though. > And this is where we're stuck. > > What can we use to do this? CARP? Heatbeat? ggate? Should we look at > Linux with DRBD or linux-ha or cluster-nfs or similar? Perhaps RedHat > Cluster Suite? (We'd prefer not to, as then storage management becomes a > nightmare again, requiring mdadm, lvm, and more.) Would a cluster > filessytem be needed? AFS or similar? > > We have next to no knowledge of fail-over clustering when it comes to > high-availability and fail-over. Any pointers to things to read online, or > tips, or even "don't do that, you're insane" comments greatly appreciated. > :) From peterjeremy at optushome.com.au Fri Jun 26 07:18:17 2009 From: peterjeremy at optushome.com.au (peterjeremy@optushome.com.au) Date: Fri Jun 26 07:18:23 2009 Subject: read benchmarks: ufs/zfs/ext3 raidz/raid5 In-Reply-To: References: Message-ID: <20090626071812.GA43965@server.vk2pj.dyndns.org> On 2009-Jun-24 23:27:31 +0300, Dan Naumov wrote: >Random Access reads, 5MB chunks: >http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-5mb.png >Random Access reads, 1MB chunks: >http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-untuned-1mb.png >Random Access reads, 5MB chunks (big list): >http://virtual.tehinterweb.net/livejournal/raid_performance/raid-diskperf-5mb-all.png These benchmarks are all fairly meaningless. As a first order approximation, all I/O to a Unix FS should be writes or you don't have enough RAM for your application. A more meaningful benchmark would check writes or a read/write mix with ~90% writes. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090626/13e4c5b1/attachment.pgp From neil at hoggarth.me.uk Fri Jun 26 16:53:48 2009 From: neil at hoggarth.me.uk (Neil Hoggarth) Date: Fri Jun 26 16:53:54 2009 Subject: "zfs upgrade" of a mounted filesystem? Message-ID: I have a 7-STABLE system with root-on-ZFS using the recipe from the wiki (http://wiki.freebsd.org/ZFSOnRoot), with a small UFS /boot partition on a flash disk. I updated the system a few days ago and have upgraded the zpool to version 13 and most of the filesystems to version 3, but I can't upgrade the filesystem version for the root filesystem as this is always mounted and it seems that "zfs upgrade" needs to unmount a filesystem to work on it? neilhoggarth-2# zfs upgrade This system is currently running ZFS filesystem version 3. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any 'zfs send' streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 1 newtank neilhoggarth-2# zfs upgrade newtank cannot unmount '/': Invalid argument Is there any way to work around this? Or will I need alternate boot/livefs media that incorporates ZFS v13 userland utilities to make the change? Regards, Neil. From serenity at exscape.org Fri Jun 26 17:03:35 2009 From: serenity at exscape.org (Thomas Backman) Date: Fri Jun 26 17:03:42 2009 Subject: ZFS send/recv: weird stream size? Message-ID: <25306F0C-437C-4746-98F1-427E335AF8EB@exscape.org> Hi all, First off: not subscribed, please make sure I'm cc:ed or such. I found a minor oddity today when running my backup script. It copies a pool into another, using "zfs send -R -I $LASTSNAP tank@$CURRSNAP | zfs recv -Fvd slave" (the variables should explain themselves). Anyway, I got this in the middle: receiving incremental stream of tank/var/crash@backup-20090626-1505 into slave/var/crash@backup-20090626-1505 received 1.28GB stream in 51 seconds (25.6MB/sec) tank/var/crash is a compressed filesystem, using lzjb. Now, the only notable change in var/crash between backups was the addition of vmcore.22: [root@chaos /var/crash]# du -sch *22* 33K core.txt.22 1.0K info.22 34M kernel.debug.22 459M vmcore.22 493M total [root@chaos /var/crash]# du -schA *22* 88K core.txt.22 512B info.22 57M kernel.debug.22 2.2G vmcore.22 2.3G total So, if the new files' compressed size is 493MB, and the real size is 2.3GB, how come zfs send/recv transfers 1.28 GB? Other, possibly interesting (although I doubt it) stats: [root@chaos ~]# zfs get compress,compressratio tank/var/crash NAME PROPERTY VALUE SOURCE tank/var/crash compression lzjb local tank/var/crash compressratio 2.90x - [root@chaos ~]# du -shc /var/crash 1.0G /var/crash 1.0G total [root@chaos ~]# du -shcA /var/crash 6.9G /var/crash 6.9G total Regards, Thomas From nbari at k9.cx Fri Jun 26 17:36:01 2009 From: nbari at k9.cx (Nicolas de Bari Embriz Garcia Rojas) Date: Fri Jun 26 17:36:08 2009 Subject: zfs nfs Message-ID: <143912190906261007q265a92b6gcf8958e635334df8@mail.gmail.com> Hi all, i updated my sources to the lates stable for fixing the bce0 (lagg) problem, but when doing so the zfs sources where also updated, so I had to rebuild/install world, after doing so and restarting the server, the NFS mount point now are only read only and can not write to any shared nfs point. any ides on how to fix this ? regards -- > nbari.tel From gallasch at free.de Fri Jun 26 23:13:31 2009 From: gallasch at free.de (Kai Gallasch) Date: Fri Jun 26 23:13:39 2009 Subject: Fail-over SAN setup: ZFS, NFS, and ...? In-Reply-To: <4A42B3C7.9000500@efinley.com> References: <4A42B3C7.9000500@efinley.com> Message-ID: <4A454FD6.1080005@free.de> > Freddie Cash wrote: >> [Not exactly sure which ML this belongs on, as it's related to both >> clustering and filesystems. If there's a better spot, let me know and >> I'll >> update the CC:/reply-to.] >> >> We're in the planning stages for building a multi-site, fail-over SAN >> setup Elliot Finley wrote: > Why not take a look at gluster? Quite interesting project - http://www.gluster.org/ But sadly: http://www.gluster.org/docs/index.php/Whats_New_v2.0 [..] Known Issues Some known issues and pending activities stalled for upcoming releases. * Distribute translator: uses 64bit inode numbers, as FreeBSD doesn't support 64bit inodes. Distribute is seen to not work on FreeBSD --Kai. From kmacy at freebsd.org Fri Jun 26 23:34:09 2009 From: kmacy at freebsd.org (Kip Macy) Date: Fri Jun 26 23:34:15 2009 Subject: Fail-over SAN setup: ZFS, NFS, and ...? In-Reply-To: <4A454FD6.1080005@free.de> References: <4A42B3C7.9000500@efinley.com> <4A454FD6.1080005@free.de> Message-ID: <3c1674c90906261634l22910f54r31bbb8ec972d5ebf@mail.gmail.com> On Fri, Jun 26, 2009 at 3:46 PM, Kai Gallasch wrote: >> Freddie Cash wrote: >>> [Not exactly sure which ML this belongs on, as it's related to both >>> clustering and filesystems. ?If there's a better spot, let me know and >>> I'll >>> update the CC:/reply-to.] >>> >>> We're in the planning stages for building a multi-site, fail-over SAN >>> setup > > Elliot Finley wrote: >> Why not take a look at gluster? > > Quite interesting project - http://www.gluster.org/ > > But sadly: > http://www.gluster.org/docs/index.php/Whats_New_v2.0 > > [..] > Known Issues > Some known issues and pending activities stalled for upcoming releases. > ? * Distribute translator: uses 64bit inode numbers, as FreeBSD doesn't > support 64bit inodes. Distribute is seen to not work on FreeBSD ino_t is still a 32-bit type, but it should be safe to make the following change to _types.h: @@ -43,7 +43,7 @@ typedef __uint64_t __fsfilcnt_t; typedef __uint32_t __gid_t; typedef __int64_t __id_t; /* can hold a gid_t, pid_t, or uid_t */ -typedef __uint32_t __ino_t; /* inode number */ +typedef __uint64_t __ino_t; /* inode number */ typedef long __key_t; /* IPC key (for Sys V IPC) */ typedef __int32_t __lwpid_t; /* Thread ID (a.k.a. LWP) */ typedef __uint16_t __mode_t; /* permissions */ Cheers, Kip From dan.naumov at gmail.com Fri Jun 26 23:36:56 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Fri Jun 26 23:37:37 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID Message-ID: To continue the subject of filesystem benchmarking (search the list for READ results posted a few days ago), here are some write and read/write results: Methodology: /data/5M and /data/1M have 5GB of data each in randomly-ordered chunks 5MB and 1MB in size, respectively. /data/zero.bin is a contiguous 8GB file. A process writes a burst of 5MB to a random location in /data/zero.bin once per second; other processes read chunks from /data/1M or /data/5M as appropriate (and as fast as possible) until the entire 5G dataset is read. Contiguous Write Performance: http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-contig-write.png Random Access Read/Write (5mb read chunks): http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-5MB-readwrite.png Random Access Read/Write (1mb read chunks): http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-1MB-readwrite.png These results are from the following forum thread: http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/857002910041/p/4 Sincerely, - Dan Naumov From nork at FreeBSD.org Sat Jun 27 04:02:34 2009 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Sat Jun 27 04:02:41 2009 Subject: "zfs upgrade" of a mounted filesystem? In-Reply-To: References: Message-ID: <20090627130224.2dc662ac.nork@FreeBSD.org> Hi Neil. On Fri, 26 Jun 2009 17:24:15 +0100 (BST) Neil Hoggarth wrote: > Or will I need alternate boot/livefs media that incorporates ZFS v13 > userland utilities to make the change? Yes. I did zfs-upgrade by livefs. From rc_lenzi at yahoo.com.br Sat Jun 27 22:25:08 2009 From: rc_lenzi at yahoo.com.br (Rafael Caesar Lenzi) Date: Sat Jun 27 22:25:15 2009 Subject: Adding more disk's on ZFS array Message-ID: <380571.31027.qm@web51002.mail.re2.yahoo.com> Hi! How i can add more disks on ZFS raid0 or raid5 array? Thanks! Rafael Lenzi ____________________________________________________________________________________ Veja quais s?o os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com From dan.naumov at gmail.com Sat Jun 27 22:27:44 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sat Jun 27 22:27:50 2009 Subject: Adding more disk's on ZFS array In-Reply-To: <380571.31027.qm@web51002.mail.re2.yahoo.com> References: <380571.31027.qm@web51002.mail.re2.yahoo.com> Message-ID: On Sun, Jun 28, 2009 at 12:58 AM, Rafael Caesar Lenzi wrote: > > Hi! > How i can add more disks on ZFS raid0 or raid5 array? > > Thanks! > Rafael Lenzi raid0 (stripe): zpool add POOLNAME DEVICENAME raid5 (raidz): you can't From zbeeble at gmail.com Sun Jun 28 06:51:39 2009 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Sun Jun 28 06:51:45 2009 Subject: Adding more disk's on ZFS array In-Reply-To: References: <380571.31027.qm@web51002.mail.re2.yahoo.com> Message-ID: <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> On Sat, Jun 27, 2009 at 6:27 PM, Dan Naumov wrote: > On Sun, Jun 28, 2009 at 12:58 AM, Rafael Caesar > Lenzi wrote: > > > > Hi! > > How i can add more disks on ZFS raid0 or raid5 array? > > > > Thanks! > > Rafael Lenzi > > raid0 (stripe): zpool add POOLNAME DEVICENAME > raid5 (raidz): you can't > Not entirely true. For a RAID 0 stripe, yes, you can just add a disk. Be clear, however, that existing data is not striped to that disk, but the new disk is used for new data. For RAID 5 (raidz), you have two options. You can replace each disk, in turn, with a larger disk and heal the array each time. I did this, for instance, to move from 5 750G drives to 5 1.5T drives. Another option is to add another bunch of RAID 5 drives. If you have 5 existing drives RAID 5, you can add another set of dries with zpool add. According to documentation, each pool should be of the same RAID type. It doesn't, however, specify that each set of RAID 5 disks should have the same number of disks in it. This seems to mean that you could add a set of 3 disks (raid 5) to an existing raid 5 array with 5 disks. From andrew at modulus.org Sun Jun 28 08:16:23 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 28 08:16:31 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: References: Message-ID: <4A4725FA.80505@modulus.org> > Contiguous Write Performance: > http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-contig-write.png What confuses me about these results is that the '5 disk' performance was barely higher than the 'single disk' performance. All figures are also lower than I get from a single modern SATA disk. My own testing with dd from /dev/zero with FreeBSD ZFS an Intel ICH10 chipset motherboard with Core2duo 2.66ghz showed RAIDZ performance scaling linearly with number of disks: What Write Read -------------------------------- 7 disk RAIDZ2 220 305 6 disk RAIDZ2 173 260 5 disk RAIDZ2 120 213 Only the on-board controllers were used, with Seagate disks of around 250GB capacity. System had 8GB RAM. These results are so different in absolute terms to your results that I don't know how to interpret your set. - Andrew From james-freebsd-fs2 at jrv.org Sun Jun 28 09:08:22 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sun Jun 28 09:08:29 2009 Subject: Adding more disk's on ZFS array In-Reply-To: <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> References: <380571.31027.qm@web51002.mail.re2.yahoo.com> <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> Message-ID: <4A4732F0.3060802@jrv.org> Zaphod Beeblebrox wrote: > Not entirely true. For a RAID 0 stripe, yes, you can just add a disk. Be > clear, however, that existing data is not striped to that disk, but the new > disk is used for new data. > It's probably best not to think of a ZFS pool as a RAID 0 but instead as a set of vdev storage areas. All of the vdevs are candidates for new data writes, depending on free space, etc. > According to > documentation, each pool should be of the same RAID type. It doesn't, > however, specify that each set of RAID 5 disks should have the same number > of disks in it. This seems to mean that you could add a set of 3 disks > (raid 5) to an existing raid 5 array with 5 disks. > A pool is a set of vdevs, and different vdevs may be of a different type and have different characteristics. It is perfectly reasonable to create a pool with a single RAIDZ vdev and later add MIRROR vdevs, or any other kind of vdev. I prefer to use MIRRORs as the vdevs since it's easier to control exposure to various failure modes (power supply, enclosure, controller & disk firmware, etc). From dan.naumov at gmail.com Sun Jun 28 10:30:28 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 28 10:30:38 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: <4A4725FA.80505@modulus.org> References: <4A4725FA.80505@modulus.org> Message-ID: > What confuses me about these results is that the '5 disk' performance was > barely higher than the 'single disk' performance. ?All figures are also > lower than I get from a single modern SATA disk. > > My own testing with dd from /dev/zero with FreeBSD ZFS an Intel ICH10 > chipset motherboard with Core2duo 2.66ghz showed RAIDZ performance scaling > linearly with number of disks: > > > What ? ? ? ? ? ? ? Write ? Read > -------------------------------- > 7 disk RAIDZ2 ? ? ?220 ? ? 305 > 6 disk RAIDZ2 ? ? ?173 ? ? 260 > 5 disk RAIDZ2 ? ? ?120 ? ? 213 What's confusing is that your results are actually out of place with how ZFS numbers are supposed to look, not mine :) When using ZFS RAIDZ, due to the way parity checking works in ZFS, your pool is SUPPOSED to have throughput of the average single disk from that pool and not some numbers growing skyhigh in a linear fashion. The numbers that did surprise me the most were actually gmirror reads (results posted earlier to this list): a geom gmirror is consistently SLOWER for reading that a single disk (and it only gets progressively worse the more disks you have in your gmirror). Read performance of all other mirroring implementations pretty much scale up linearly with the amount of disks present in the mirror. - Sincerely, Dan Naumov From andrew at modulus.org Sun Jun 28 10:39:56 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 28 10:40:03 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: References: <4A4725FA.80505@modulus.org> Message-ID: <4A4747A0.6040902@modulus.org> > What's confusing is that your results are actually out of place with > how ZFS numbers are supposed to look, not mine :) When using ZFS > RAIDZ, due to the way parity checking works in ZFS, your pool is > SUPPOSED to have throughput of the average single disk from that pool > and not some numbers growing skyhigh in a linear fashion. Could you please elaborate on this and explain it? - Andrew From dan.naumov at gmail.com Sun Jun 28 11:02:04 2009 From: dan.naumov at gmail.com (Dan Naumov) Date: Sun Jun 28 11:02:16 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: <4A4747A0.6040902@modulus.org> References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> Message-ID: "Now we come to the crucial decision ZFS has made for raidz and raidz2: in raidz and raidz2, the data block is striped across all of the disks. Instead of a model where a parity stripe is a bunch of data blocks, each with an independent checksum, ZFS stripes a single data block (and its parity), with a single checksum, across all the disks (or as many of them as necessary). This is a rational implementation decision, but when combined with the need to verify checksums, it has an important consequence: in ZFS, reads always involve all disks, because ZFS always must verify the data block's checksum, which requires reading all of the data block, which is spread across all of the drives. This is unlike normal RAID-5 or RAID-6, in which a small enough read will only touch one drive, and means that adding more disks to a ZFS raidz pool does not increase how many random reads you can do per second. (A normal RAID-5 or RAID-6 array has a (theoretical) random read IO capacity equal to the sum of the random IO operations rate of each of the disks in the array, and so adding another disk adds its IOPs per second to your read capacity. A ZFS raidz or raidz2 pool instead has a capacity equal to the slowest disk's IOPs per second, and adding another disk does nothing to help. Effectively a raidz ZFS gives you a single disk's read IOPs per second rate.)" This was on a blog of a SUN engineer (although a post from a few years ago), unfortunately I don't have the link, I actually had to go through my posting history on the Ars Technica forum to even find this quote in the first place. If the situation has changed and the above quote no longer holds true, it would be nice if someone more knowledgeable on the performance implications could elaborate what kind of performance is to be expected on a raidz system :) - Sincerely, Dan Naumov On Sun, Jun 28, 2009 at 1:36 PM, Andrew Snow wrote: >> What's confusing is that your results are actually out of place with >> how ZFS numbers are supposed to look, not mine :) When using ZFS >> RAIDZ, due to the way parity checking works in ZFS, your pool is >> SUPPOSED to have throughput of the average single disk from that pool >> and not some numbers growing skyhigh in a linear fashion. > > Could you please elaborate on this and explain it? > > - Andrew > From andrew at modulus.org Sun Jun 28 11:37:15 2009 From: andrew at modulus.org (Andrew Snow) Date: Sun Jun 28 11:37:21 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> Message-ID: <4A475511.5000700@modulus.org> OK, I thought we were taling about a single-threaded sequential write which was what my benchmark is. It sounds like the graphs you published were of a multi-threaded writers - how many processes were running in parallel in the case of the "Contiguous Write Performance" here? http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-contig-write.png - Andrew From nhoyle at hoyletech.com Sun Jun 28 17:54:28 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Sun Jun 28 17:54:34 2009 Subject: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID In-Reply-To: References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> Message-ID: <4A47AE4A.6090705@hoyletech.com> The clear distinction between the two sets of performance tests you two have done is that Dan's are highly random short i/o's, and Andrew's are large sequential transfers. Large sequential transfers necessarily engage all of the disks in the pool, regardless of the parity strategy, therefore the implied penalty for ZFS to read the parity data from all drives is mostly theoretical, and actually performs more like RAID 5 typically would. In the case of Dan's highly random, short i/o's, the read itself is trivial, making the overhead of spinning/seeking all the disks to calculate the full checksum and validate it inordinately high. The implication of these two benchmarks is clear as well: ZFS RAIDZ may be an excellent choice for large storage capacity with reasonable performance characteristics for large sequential workloads, but should be avoided where many small transfers will be occurring. -Nathanael From serenity at exscape.org Sun Jun 28 20:14:31 2009 From: serenity at exscape.org (Thomas Backman) Date: Sun Jun 28 20:14:53 2009 Subject: zfs send -R segfault, anyone else? In-Reply-To: References: <08D1E6DF-89D3-4887-9234-C3DB9164D794@exscape.org> <20090514133017.362075dhcdy7o2bs@webmail.leidinger.net> <7CD27FF0-CBFA-48B7-9E18-763D8C3ED9B8@exscape.org> <4A0C9B0C.4050403@jrv.org> Message-ID: <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> On May 15, 2009, at 11:30 AM, Thomas Backman wrote: > > On May 15, 2009, at 12:28 AM, James R. Van Artsdalen wrote: > >> Thomas Backman wrote: >>> [root@chaos ~]# zfs send -R -I $OLD tank@$NOW > diff-snap >>> [root@chaos ~]# cat diff-snap | zfs recv -Fvd slave >>> Segmentation fault: 11 (core dumped) >>> >>> Same kinda backtrace, but what's up with strcmp()? >>> I suppose the issue stems from libzfs, and is not within libc: >> >> Different problem The SIGSEGV is happening in strcmp because it is >> called with strcmp(0,0) >> and tries to dereference address -4 (probably another bug itself). >> >> This hack gets around the issue but someone familiar with this >> needs to >> decide the correct action. >> >> The first change is actually unrelated (a sorry attempt at fixing the >> previous zfs send bug). >> >> The last change may be unnecessary as that case may never happen >> unless >> the pool can be renamed? >> >> [... patch ...] > > Thanks! This list is pretty impressive. :) > I can't validate how correct the fix is, considering my lacking > knowledge in C (I know the basics, but kernel/related programming? > no way!), but I CAN say that it appears to work just fine! > > Regards, > Thomas > Any news on this? The bug's been around for a long time, and a fix has been around for at least 1.5 months now, and AFAIK the bug still lives. The patch, again (I can't vouch for its correctness, but I can certainly say that it works just fine *for me*) follows. Regards, Thomas Index: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c =================================================================== --- cddl/contrib/opensolaris/lib/libzfs/common/ libzfs_sendrecv.c (revision 194851) +++ cddl/contrib/opensolaris/lib/libzfs/common/ libzfs_sendrecv.c (working copy) @@ -239,6 +239,8 @@ char *propname = nvpair_name(elem); zfs_prop_t prop = zfs_name_to_prop(propname); nvlist_t *propnv; + if (prop == ZPROP_INVAL) + continue; if (!zfs_prop_user(propname) && zfs_prop_readonly(prop)) continue; @@ -1126,7 +1128,7 @@ uint64_t originguid = 0; uint64_t stream_originguid = 0; uint64_t parent_fromsnap_guid, stream_parent_fromsnap_guid; - char *fsname, *stream_fsname; + char *fsname, *stream_fsname, *p1, *p2; nextfselem = nvlist_next_nvpair(local_nv, fselem); @@ -1295,10 +1297,13 @@ "parentfromsnap", &stream_parent_fromsnap_guid)); /* check for rename */ + p1 = strrchr(fsname, '/'); + p2 = strrchr(stream_fsname, '/'); + if ((stream_parent_fromsnap_guid != 0 && stream_parent_fromsnap_guid != parent_fromsnap_guid) || - strcmp(strrchr(fsname, '/'), - strrchr(stream_fsname, '/')) != 0) { + (p1 != NULL && p2 != NULL && strcmp (p1, p2) != 0) || + ((p1 == NULL) ^ (p2 == NULL))) { nvlist_t *parent; char tryname[ZFS_MAXNAMELEN]; @@ -1317,7 +1322,7 @@ VERIFY(0 == nvlist_lookup_string(parent, "name", &pname)); (void) snprintf(tryname, sizeof (tryname), - "%s%s", pname, strrchr(stream_fsname, '/')); + "%s%s", pname, p2 ? p2 : ""); } else { tryname[0] = '\0'; if (flags.verbose) { From kmacy at freebsd.org Sun Jun 28 20:41:45 2009 From: kmacy at freebsd.org (Kip Macy) Date: Sun Jun 28 20:41:57 2009 Subject: zfs send -R segfault, anyone else? In-Reply-To: <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> References: <08D1E6DF-89D3-4887-9234-C3DB9164D794@exscape.org> <20090514133017.362075dhcdy7o2bs@webmail.leidinger.net> <7CD27FF0-CBFA-48B7-9E18-763D8C3ED9B8@exscape.org> <4A0C9B0C.4050403@jrv.org> <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> Message-ID: <3c1674c90906281341w4b235dd7y809e1b23978ad5c3@mail.gmail.com> I'm a bit preoccupied at the moment. Keep reminding me ... -Kip On Sun, Jun 28, 2009 at 1:14 PM, Thomas Backman wrote: > On May 15, 2009, at 11:30 AM, Thomas Backman wrote: >> >> On May 15, 2009, at 12:28 AM, James R. Van Artsdalen wrote: >> >>> Thomas Backman wrote: >>>> >>>> [root@chaos ~]# zfs send -R -I $OLD tank@$NOW > diff-snap >>>> [root@chaos ~]# cat diff-snap | zfs recv -Fvd slave >>>> Segmentation fault: 11 (core dumped) >>>> >>>> Same kinda backtrace, but what's up with strcmp()? >>>> I suppose the issue stems from libzfs, and is not within libc: >>> >>> Different problem ?The SIGSEGV is happening in strcmp because it is >>> called with strcmp(0,0) >>> and tries to dereference address -4 (probably another bug itself). >>> >>> This hack gets around the issue but someone familiar with this needs to >>> decide the correct action. >>> >>> The first change is actually unrelated (a sorry attempt at fixing the >>> previous zfs send bug). >>> >>> The last change may be unnecessary as that case may never happen unless >>> the pool can be renamed? >>> >>> [... patch ...] >> >> Thanks! This list is pretty impressive. :) >> I can't validate how correct the fix is, considering my lacking knowledge >> in C (I know the basics, but kernel/related programming? no way!), but I CAN >> say that it appears to work just fine! >> >> Regards, >> Thomas >> > Any news on this? The bug's been around for a long time, and a fix has been > around for at least 1.5 months now, and AFAIK the bug still lives. > The patch, again (I can't vouch for its correctness, but I can certainly say > that it works just fine *for me*) follows. > > Regards, > Thomas > > Index: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > =================================================================== > --- cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > ?(revision 194851) > +++ cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > ?(working copy) > @@ -239,6 +239,8 @@ > ? ? ? ? ? ? ? ?char *propname = nvpair_name(elem); > ? ? ? ? ? ? ? ?zfs_prop_t prop = zfs_name_to_prop(propname); > ? ? ? ? ? ? ? ?nvlist_t *propnv; > + ? ? ? ? ? ? ? if (prop == ZPROP_INVAL) > + ? ? ? ? ? ? ? ? ? continue; > > ? ? ? ? ? ? ? ?if (!zfs_prop_user(propname) && zfs_prop_readonly(prop)) > ? ? ? ? ? ? ? ? ? ? ? ?continue; > @@ -1126,7 +1128,7 @@ > ? ? ? ? ? ? ? ?uint64_t originguid = 0; > ? ? ? ? ? ? ? ?uint64_t stream_originguid = 0; > ? ? ? ? ? ? ? ?uint64_t parent_fromsnap_guid, stream_parent_fromsnap_guid; > - ? ? ? ? ? ? ? char *fsname, *stream_fsname; > + ? ? ? ? ? ? ? char *fsname, *stream_fsname, *p1, *p2; > > ? ? ? ? ? ? ? ?nextfselem = nvlist_next_nvpair(local_nv, fselem); > > @@ -1295,10 +1297,13 @@ > ? ? ? ? ? ? ? ? ? ?"parentfromsnap", &stream_parent_fromsnap_guid)); > > ? ? ? ? ? ? ? ?/* check for rename */ > + ? ? ? ? ? ? ? p1 = strrchr(fsname, '/'); > + ? ? ? ? ? ? ? p2 = strrchr(stream_fsname, '/'); > + > ? ? ? ? ? ? ? ?if ((stream_parent_fromsnap_guid != 0 && > ? ? ? ? ? ? ? ? ? ?stream_parent_fromsnap_guid != parent_fromsnap_guid) || > - ? ? ? ? ? ? ? ? ? strcmp(strrchr(fsname, '/'), > - ? ? ? ? ? ? ? ? ? strrchr(stream_fsname, '/')) != 0) { > + ? ? ? ? ? ? ? ? ? (p1 != NULL && p2 != NULL && strcmp (p1, p2) != 0) || > + ? ? ? ? ? ? ? ? ? ?((p1 == NULL) ^ (p2 == NULL))) { > ? ? ? ? ? ? ? ? ? ? ? ?nvlist_t *parent; > ? ? ? ? ? ? ? ? ? ? ? ?char tryname[ZFS_MAXNAMELEN]; > > @@ -1317,7 +1322,7 @@ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?VERIFY(0 == nvlist_lookup_string(parent, > "name", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&pname)); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(void) snprintf(tryname, sizeof (tryname), > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s%s", pname, strrchr(stream_fsname, > '/')); > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s%s", pname, p2 ? p2 : ""); > ? ? ? ? ? ? ? ? ? ? ? ?} else { > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?tryname[0] = '\0'; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if (flags.verbose) { > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From unixtools at hotmail.com Sun Jun 28 22:13:04 2009 From: unixtools at hotmail.com (Sunil Sunder Raj) Date: Sun Jun 28 22:13:11 2009 Subject: File System performance Message-ID: Does the feebsd port collection have any tool to debug or benchmark IO performance systat -iosttat just gives me the below output /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average | /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 cpu user|XX nice| system|X interrupt| idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 ad0 MB/s tps| twed0 MB/s tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX411.32 How do I get detailed information on which process is using the io. Something like perfmon. _________________________________________________________________ Insert movie times and more without leaving Hotmail?. http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009 From rmacklem at uoguelph.ca Mon Jun 29 00:26:35 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Jun 29 00:26:41 2009 Subject: umount -f implementation Message-ID: I just noticed that when I do the following: - start a large write to an NFS mounted fs - network partition the server (unplug a net cable) - do a "umount -f " on the machine that it gets stuck trying to write dirty blocks to the server. I had, in the past, assumed that a "umount -f" of an NFS mount would be used to get rid of an NFS mount on an unresponsive server and that loss of "writes in progress" would be expected to happen. Does that sound correct? (In other words, an I seeing a bug or a feature?) Thanks in advance for any info, rick ps: I have a simple "fix" if this is a bug, but I wanted to check before submitting a patch. From nhoyle at hoyletech.com Mon Jun 29 00:32:21 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Mon Jun 29 00:32:27 2009 Subject: umount -f implementation In-Reply-To: References: Message-ID: <4A480B8C.1060708@hoyletech.com> Rick Macklem wrote: > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a > feature?) > > Thanks in advance for any info, rick > ps: I have a simple "fix" if this is a bug, but I wanted to check before > submitting a patch. I think the answer is probably "it's a feature, not a bug", but that depends on your NFS mount options which you didn't give. I'd suggest you read up on NFS soft versus hard mounts. I think you're seeing the latter and expecting the former behavior. The first hit I found Googling seems pretty decent, though taken from Linux docs should still apply: http://tldp.org/HOWTO/NFS-HOWTO/client.html Under section 4.3.1 "Soft vs. Hard Mounting" there's a basic description. Best of luck, -Nathanael From dg at dglawrence.com Mon Jun 29 04:53:06 2009 From: dg at dglawrence.com (David G Lawrence) Date: Mon Jun 29 04:53:18 2009 Subject: umount -f implementation In-Reply-To: References: Message-ID: <20090629045304.GI39302@tnn.dglawrence.com> > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a > feature?) > > Thanks in advance for any info, rick > ps: I have a simple "fix" if this is a bug, but I wanted to check before > submitting a patch. I would say that you are seeing a bug. -f is supposed to mean "force", of course. Any buffers or outstanding transactions should be terminated immediately. Oh, and most of us know that you, as one of the NFS developers in the past, well-know the difference between hard and soft NFS mounts. ;-) -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 Pave the road of life with opportunities. From attilio at freebsd.org Mon Jun 29 10:16:37 2009 From: attilio at freebsd.org (Attilio Rao) Date: Mon Jun 29 10:16:44 2009 Subject: umount -f implementation In-Reply-To: References: Message-ID: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> 2009/6/29 Rick Macklem : > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a feature?) While that should be real in principle (immediate shutdown of the fs operation and unmounting of the partition) it is totally impossible to have it completely unsleeping, so it can happen that also umount -f sleeps / delays for some times (example: vflush). Currently, umount -f is one of the most complicated thing to handle in our VFS because it puts as requirement that vnodes can be reclaimed in any moment, adding complexity and possibility for races. What's the fix for your problem? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From bugmaster at FreeBSD.org Mon Jun 29 11:06:58 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jun 29 11:07:57 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200906291106.n5TB6vTK046307@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/135594 fs [zfs] Single dataset unresponsive with Samba o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135480 fs [zfs] panic: lock &arg.lock already initialized o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135412 fs [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREA o bin/135314 fs [zfs] assertion failed for zdb(8) usage o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 143 problems total. From rmacklem at uoguelph.ca Mon Jun 29 14:36:29 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Jun 29 14:36:40 2009 Subject: umount -f implementation In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: On Mon, 29 Jun 2009, Attilio Rao wrote: > 2009/6/29 Rick Macklem : >> I just noticed that when I do the following: >> - start a large write to an NFS mounted fs >> - network partition the server (unplug a net cable) >> - do a "umount -f " on the machine >> >> that it gets stuck trying to write dirty blocks to the server. >> >> I had, in the past, assumed that a "umount -f" of an NFS mount would be >> used to get rid of an NFS mount on an unresponsive server and that loss >> of "writes in progress" would be expected to happen. >> >> Does that sound correct? (In other words, an I seeing a bug or a feature?) > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > What's the fix for your problem? > Well, when I tested it I found that it got stuck in two places, both calls to VFS_SYNC(). The first was a sync(); right at the beginning of umount.c. - All I did for that one is move it to after the code that handles option processing and change it to if ((fflag & MNT_FORCE) == 0) sync(); so that it isn't done for the "-f" case. (I believe the sync(); call at the beginning of umount is only a performance optimization, so I don't think not doing it for "-f" should break anything.) - the second happened just before the VFS_UNMOUNT() call in the umount(2) system call. The code looks like: if (((mp->mnt_flag & MNT_RDONLY) || (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) - Although it was tempting to reverse the order of VFS_SYNC() and the test for MNT_FORCE, I thought that might have a negative impact on other file systems, since it avoided doing the VFS_SYNC(), so... - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of nfs_sync(), so that it returns EBUSY for this case instead of getting stuck trying to flush(). Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, it simply ensures that the umount command thread makes it as far as VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It kills RPCs in progress before doing the vflush() and, since no new RPCs can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of a request), the vflush() won't actually flush anything to the server. As such, "umount -f" is pretty well guaranteed to throw away the dirty buffers. I believe this is correct behaviour, but it would mean that a user/sysadmin that uses "umount -f" for cases where the server is still functioning, but slow, will lose data when they probably don't expect to. Does this help? rick ps: During simple testing, it has worked ok. It waits about 1 minute for the RPC threads to shut down, but the "umount -f" does complete after that happens. It the consensus seems to be that patching this is a good idea, I'll get some more testing done. From rmacklem at uoguelph.ca Mon Jun 29 15:15:09 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Jun 29 15:15:17 2009 Subject: umount -f implementation In-Reply-To: <4A480B8C.1060708@hoyletech.com> References: <4A480B8C.1060708@hoyletech.com> Message-ID: On Sun, 28 Jun 2009, Nathanael Hoyle wrote: > I think the answer is probably "it's a feature, not a bug", but that depends > on your NFS mount options which you didn't give. I'd suggest you read up on > NFS soft versus hard mounts. I think you're seeing the latter and expecting > the former behavior. > Well, part of the problem is that I'm working on a client that includes NFSv4 and, at least for NFSv4, getting "intr" or "soft" mounts to work correctly is nearly impossible. Since NFSv4 includes lock state operations that must be strictly serialized and the state maintained in a consistent way, you can't just "terminate" an RPC involving these Ops without breaking all state handling. Also, I/O system calls generally aren't expected to fail with EINTR and many (most??) apps. get broken by this happening. Personally, I believe that "hard" mounts plus the use of "umount -f" to get rid of mounts against unresponsive servers is the preferred way to go and the first step in this direction would be getting "umount -f" to work for the above case (plus agreement that the semantics of "umount -f" include "loss of recently written data"). There was a thread on this a few months ago, which I cant find, but there is pr129760 w.r.t. FreeBSD locking up upon a "umount -f". (Btw, I believe that Mac OS X has adopted this concept. It pops up a "disconnect mount" window for unresponsive servers and does essentially a "umount -f" if the user clicks "ok".) > The first hit I found Googling seems pretty decent, though taken from Linux > docs should still apply: > > http://tldp.org/HOWTO/NFS-HOWTO/client.html > > Under section 4.3.1 "Soft vs. Hard Mounting" there's a basic description. > There was a time when SunOS/Solaris was considered the "gold standard" for NFS (but I suppose this is the Linux era;-). My recollection might be fuzzy, but I don't think SunOS had a "umount -f" in those days and I think "intr" was introduced after their first release, as an improvement over "soft", since NFS servers got really slow when running on 1985 hardware. Solaris10 does have a "umount -f" and the man page notes that data related to open files can be lost when it is used. (This would basically be the semantic "umount -f" on FreeBSD will have if the "sync"s aren't done.) rick From brde at optusnet.com.au Mon Jun 29 19:06:09 2009 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jun 29 19:06:23 2009 Subject: umount -f implementation In-Reply-To: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: <20090630010035.E37426@delplex.bde.org> On Mon, 29 Jun 2009, Rick Macklem wrote: > On Mon, 29 Jun 2009, Attilio Rao wrote: >> >> While that should be real in principle (immediate shutdown of the fs >> operation and unmounting of the partition) it is totally impossible to >> have it completely unsleeping, so it can happen that also umount -f >> sleeps / delays for some times (example: vflush). >> Currently, umount -f is one of the most complicated thing to handle in >> our VFS because it puts as requirement that vnodes can be reclaimed in >> any moment, adding complexity and possibility for races. >> > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > >> What's the fix for your problem? >> > Well, when I tested it I found that it got stuck in two places, both > calls to VFS_SYNC(). The first was a > sync(); > right at the beginning of umount.c. > - All I did for that one is move it to after the code that handles > option processing and change it to > if ((fflag & MNT_FORCE) == 0) > sync(); > so that it isn't done for the "-f" case. (I believe the sync(); call > at the beginning of umount is only a performance optimization, so I > don't think not doing it for "-f" should break anything.) OK. This sync() is probably actually a performance pessimization, since it syncs all file systems while the internal sync in umount(2) only syncs the one being unmounted. > - the second happened just before the VFS_UNMOUNT() call in the > umount(2) system call. The code looks like: > if (((mp->mnt_flag & MNT_RDONLY) || > (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != > 0) > - Although it was tempting to reverse the order of VFS_SYNC() and the > test for MNT_FORCE, I thought that might have a negative impact on > other file systems, since it avoided doing the VFS_SYNC(), so... > > - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of > nfs_sync(), so that it returns EBUSY for this case instead of getting > stuck trying to flush(). OK. This sync is probably an optimization for correctness, since it arranges to do as much as possible without forcing. I checked ffs_mount() and found 2 large bugs, one related: - in the only case that tends to cause problems, namely the non-readonly case, ffs_unmount() does a suspend which calls VOP_SYNC(..., MNT_SUSPEND), but after errors from this sync it checks neither MNT_FORCE nor error == ENXIO. I think the usual effect is the same as if the top-level unmount() didn't check MNT_FORCE after suspend failure: in problematic cases, we have an unrecoverable write, due to the device going away or just an i/o error, and this error has probably already occured (only in rare cases will it be triggered by unmount). Then MNT_FORCE is essentially unused, and the ENXIO hack is not reached either, and unmount usually fails. - the UFS_EXTATTR case destroys infrastructure before committing to succeeding. It used to be just broken on failure. Now it uses a hack to recover (call a constructor) on failure, but the recovery code is not reached in the usual case of failure -- when the suspension fails. ffs_unmount() still seems to have no support for handling unrecoverable write errors (short of you converting them to ENXIO by removing the media). MNT_FORCE only meant FORCECLOSE for it. I see that old nfs was similar, and you are now making MNT_FORCE stronger. I thought that umount(8)'s man page documented -f being strongly forceful, but checking it shows that it only documents a weak force like that of FORCECLOSE (but not precisely enough). Perhaps a different flag should be used for strong forcefulness. Weak forcefulness is still useful and used for mount -f -u -- for remount we would never want errors in the file system itself ignored. This use also shows that the generic FORCECLOSE code must not ignore errors. > Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, > it simply ensures that the umount command thread makes it as far as > VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It > kills RPCs in progress before doing the vflush() and, since no new RPCs > can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of > a request), the vflush() won't actually flush anything to the server. > > As such, "umount -f" is pretty well guaranteed to throw away the dirty > buffers. I believe this is correct behaviour, This is how I think ffs_mount() should work too -- It should be responsible for throwing away the dirty buffers, while nothing else should discard them. Now the discarding seems to be done by falling through to g_vfs_done(), except g_vfs_done() is not reached in most cases (see above). I don't like this -- at best we lose the opportunity to print ffs-specific details about what was discarded. Falling through only works for ENXIO anyway -- on other errors we should discard the unwritable buffers in an fs-specific manner so as to write as many of the writable buffers as possible. > but it would mean that a > user/sysadmin that uses "umount -f" for cases where the server is still > functioning, but slow, will lose data when they probably don't expect to. A new flag would help for this. Bruce From gthiel at smapper.com Mon Jun 29 19:55:04 2009 From: gthiel at smapper.com (Gunther Thiel) Date: Mon Jun 29 19:55:12 2009 Subject: AW: umount -f implementation Message-ID: <0FEF727F15922F4D8349632C35EE6C276FB1A4470A@QHEXMBOX1.hosting.inetserver.de> In practice, there are situations where one does want to get rid of a non reachable mounpoint (specifically for NFS) which basically is not possible as of today. A fix in case the -f (or another new flag like) were supplied, would be highly appreciated. Thanks, Gunther -- SmApper Technologies GmbH ? +43 5372 6912 640 ? www.smapper.com ----- Originalnachricht ----- Von: owner-freebsd-fs@freebsd.org An: freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org Gesendet: Mon Jun 29 02:00:13 2009 Betreff: umount -f implementation I just noticed that when I do the following: - start a large write to an NFS mounted fs - network partition the server (unplug a net cable) - do a "umount -f " on the machine that it gets stuck trying to write dirty blocks to the server. I had, in the past, assumed that a "umount -f" of an NFS mount would be used to get rid of an NFS mount on an unresponsive server and that loss of "writes in progress" would be expected to happen. Does that sound correct? (In other words, an I seeing a bug or a feature?) Thanks in advance for any info, rick ps: I have a simple "fix" if this is a bug, but I wanted to check before submitting a patch. _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From rick-freebsd2008 at kiwi-computer.com Mon Jun 29 23:53:52 2009 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Mon Jun 29 23:53:59 2009 Subject: umount -f implementation In-Reply-To: <4A480B8C.1060708@hoyletech.com> References: <4A480B8C.1060708@hoyletech.com> Message-ID: <20090629232710.GA24986@keira.kiwi-computer.com> On Sun, Jun 28, 2009 at 08:32:12PM -0400, Nathanael Hoyle wrote: > Rick Macklem wrote: > > > >Does that sound correct? (In other words, an I seeing a bug or a > >feature?) > > > I think the answer is probably "it's a feature, not a bug", but that > depends on your NFS mount options which you didn't give. I'd suggest > you read up on NFS soft versus hard mounts. I'm pretty sure the person working on NFSv4 for fbsd knows this difference. > I think you're seeing the > latter and expecting the former behavior. Not necessarily true. I've experienced similar behavior and I only use soft mounts (actually: "rw,soft,intr,bg,rdirplus"). In fact this bit me last week when I wanted to move the NFS export on a server. I did the move/rename, updated /etc/exports, and did a "killall -HUP mountd" on the server and I attempted variations of "mount -u" and "umount -f" on the clients. Subsequently, I had to restart most of the client machines, since: - "mount -u" returned ESTALE - "umount" returned EBUSY - "umount -f" failed, I believe with ENXIO In any case, "umount -f" absolutely has to work. What other option does an admin have? Yes, expect potential data loss and expect the umount may not return immediately (plain "umount" can take awhile too). Instead, I saw a bunch of these messages, when another process continued to write to a geli-mounted md'd file on that stale filesystem: kernel: GEOM_ELI: g_eli_read_done() failed md0.eli[READ(offset=1790541824, length=65536)] -- Rick C. Petty From avg at icyb.net.ua Tue Jun 30 14:30:04 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Jun 30 14:30:10 2009 Subject: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Message-ID: <200906301430.n5UEU31v052203@freefall.freebsd.org> The following reply was made to PR kern/135412; it has been noted by GNATS. From: Andriy Gapon To: bug-followup@FreeBSD.org, danny@cs.huji.ac.il Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 17:22:47 +0300 Danny, maybe you misunderstood Gavin's question or maybe I misunderstood your reply. I have stable/7 amd64 from Jun 26, I have upgraded zpool to version 13 (as reported by 'zpool upgrade' command) and I have upgraded zfs on-disk to version 3 (as reported by 'zfs upgrade'). And I can _not_ reproduce your problem using the program you provided - it successfully creates a file on the first run and it fails with '17 - File exists' on the subsequent ones. So, I'd like to re-iterate the question - what on-disk versions of zpool and zfs you have? Please provide output of 'zpool upgrade' and 'zfs upgrade' commands to avoid further uncertainties. -- Andriy Gapon From danny at cs.huji.ac.il Tue Jun 30 15:10:05 2009 From: danny at cs.huji.ac.il (Danny Braniss) Date: Tue Jun 30 15:10:15 2009 Subject: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Message-ID: <200906301510.n5UFA4so084603@freefall.freebsd.org> The following reply was made to PR kern/135412; it has been noted by GNATS. From: Danny Braniss To: Andriy Gapon Cc: bug-followup@FreeBSD.org, danny@cs.huji.ac.il, danny@cs.huji.ac.il Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 18:05:28 +0300 > > Danny, > > maybe you misunderstood Gavin's question or maybe I misunderstood your reply. > > I have stable/7 amd64 from Jun 26, I have upgraded zpool to version 13 (as > reported by 'zpool upgrade' command) and I have upgraded zfs on-disk to version 3 > (as reported by 'zfs upgrade'). > And I can _not_ reproduce your problem using the program you provided - it > successfully creates a file on the first run and it fails with '17 - File exists' > on the subsequent ones. > > So, I'd like to re-iterate the question - what on-disk versions of zpool and zfs > you have? > Please provide output of 'zpool upgrade' and 'zfs upgrade' commands to avoid > further uncertainties. > > -- > Andriy Gapon you have to run the program on a client that mounted the zfs volume via nfs. it fails no matter what pool version, either 6 or 13 btw, it works fine if the server is solaris/v13 but just to answer your questions, dev is the server host dev> zfs upgrade This system is currently running ZFS filesystem version 3. All filesystems are formatted with the current version. dev> zpool upgrade This system is currently running ZFS pool version 13. All pools are formatted using this version. dev> danny From rmacklem at uoguelph.ca Tue Jun 30 15:59:06 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Jun 30 15:59:18 2009 Subject: umount -f implementation In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: On Mon, 29 Jun 2009, Attilio Rao wrote: > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > > What's the fix for your problem? > >From other responses, it does look like pursuing this is appropriate and that current behaviour is considered a bug. I should have noted in the previous email that I suspected that my simple patch didn't handle all cases, which I have just determined via testing. Unfortunately, the thread doing "umount" can also get stuck in an msleep() while waiting for the mnt_lockref to go to 0, which happens before the VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system calls that call vfs_busy().) I think I can fix this in the experimental nfsv4 client, since it has a kernel thread that can check for MNTK_UNMOUNTF being set and then kill off the RPCs in progress, but that won't help the regular client. It's starting to look like too much work for FreeBSD8, but sounds like it is worth pursuing. (Appologies to anyone that thought I would have it all fixed in a day or two.) rick From attilio at freebsd.org Tue Jun 30 16:08:02 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Jun 30 16:08:15 2009 Subject: umount -f implementation In-Reply-To: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: <3bbf2fe10906300908p6b0f314di25bab46b03b5933a@mail.gmail.com> 2009/6/30 Rick Macklem : > > > On Mon, 29 Jun 2009, Attilio Rao wrote: > >> >> While that should be real in principle (immediate shutdown of the fs >> operation and unmounting of the partition) it is totally impossible to >> have it completely unsleeping, so it can happen that also umount -f >> sleeps / delays for some times (example: vflush). >> Currently, umount -f is one of the most complicated thing to handle in >> our VFS because it puts as requirement that vnodes can be reclaimed in >> any moment, adding complexity and possibility for races. >> >> What's the fix for your problem? >> > From other responses, it does look like pursuing this is appropriate > and that current behaviour is considered a bug. > > I should have noted in the previous email that I suspected that my simple > patch didn't handle all cases, which I have just determined via testing. > > Unfortunately, the thread doing "umount" can also get stuck in an msleep() > while waiting for the mnt_lockref to go to 0, which happens before the > VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system > calls that call vfs_busy().) Sorry for not answering and I still didn't read this thread at all, I just wanted to let you know that this msleep is skipped for the force unmount, it should just happen in a normal unmount case. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From mandrews at bit0.com Tue Jun 30 19:00:19 2009 From: mandrews at bit0.com (Mike Andrews) Date: Tue Jun 30 19:00:25 2009 Subject: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Message-ID: <200906301900.n5UJ0E3g063110@freefall.freebsd.org> The following reply was made to PR kern/135412; it has been noted by GNATS. From: Mike Andrews To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 14:40:43 -0400 I can also confirm that this happens with an un-upgraded v6 pool using the v13 code. From rmacklem at uoguelph.ca Tue Jun 30 20:01:18 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Jun 30 20:01:30 2009 Subject: umount -f implementation In-Reply-To: <20090630193248.GY2884@deviant.kiev.zoral.com.ua> References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> <20090630193248.GY2884@deviant.kiev.zoral.com.ua> Message-ID: On Tue, 30 Jun 2009, Kostik Belousov wrote: >> >> I think I can fix this in the experimental nfsv4 client, since it has >> a kernel thread that can check for MNTK_UNMOUNTF being set and then >> kill off the RPCs in progress, but that won't help the regular client. > This solution sounds good, but see below. > > It may be argued by some people, me included, that umount -f shall not > override any ownership of kernel resources. In particular, you must > not ignore the lockref. Instead, the threads that own misc filesystem > resources, like mount reference counter, locked vnodes etc shall be > weed out of the syscalls. E.g., finishing stalled rpc calls with some > error code that is propagated to return code from vops is good solution. > I think that the thread "fix" above would work this way. Right now, nfs_umount() terminates RPCs in progress for the "-f" case and they return RPC_CANTRECV, which just becomes EACCES at the moment. The problem is that, often, the "umount -f" thread never gets as far as nfs_umount(). All I was thinking of doing, above, is having the kernel thread check for MNTK_UNMOUNTF and then do the same thing. (ie. The NFS VOPs would end up returning EACCES, or whatever Exxx might be preferred.) > Another problem with forced unmounts is that VFS does not block new > threads from arriving into VOPs. When finishing the inflight rpcs, > you may either leave some new rpcs behind or loop infinitely chasing > rpcs that arrive while you finishing old rpcs. > The NFS clients already handle this by returning ESTALE at the beginning of nfs_request() without attempting the RPC, if MNTK_UNMOUNTF is set. (Why ESTALE?? Who knows, although I suspect that just about any Exxx will get the job done?) > Umount -f is needed in two different situations, one is normally worked > filesystem that shall be unmounted by administrative request, detaching > any resources opened by application. Second is the last-resort action > when backing storage (server in NFS case, disk for UFS) is misbehaving. > I think we must not break first case for the second. > I think this is what Bruce Evans was referring to. He suggested that there be two flags, like -f and -F, if I understood his post. rick From kostikbel at gmail.com Tue Jun 30 20:18:12 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Jun 30 20:18:19 2009 Subject: umount -f implementation In-Reply-To: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: <20090630193248.GY2884@deviant.kiev.zoral.com.ua> On Tue, Jun 30, 2009 at 12:01:21PM -0400, Rick Macklem wrote: > > > On Mon, 29 Jun 2009, Attilio Rao wrote: > > > > >While that should be real in principle (immediate shutdown of the fs > >operation and unmounting of the partition) it is totally impossible to > >have it completely unsleeping, so it can happen that also umount -f > >sleeps / delays for some times (example: vflush). > >Currently, umount -f is one of the most complicated thing to handle in > >our VFS because it puts as requirement that vnodes can be reclaimed in > >any moment, adding complexity and possibility for races. > > > >What's the fix for your problem? > > > >From other responses, it does look like pursuing this is appropriate > and that current behaviour is considered a bug. > > I should have noted in the previous email that I suspected that my simple > patch didn't handle all cases, which I have just determined via testing. > > Unfortunately, the thread doing "umount" can also get stuck in an msleep() > while waiting for the mnt_lockref to go to 0, which happens before the > VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system > calls that call vfs_busy().) > > I think I can fix this in the experimental nfsv4 client, since it has > a kernel thread that can check for MNTK_UNMOUNTF being set and then > kill off the RPCs in progress, but that won't help the regular client. This solution sounds good, but see below. > > It's starting to look like too much work for FreeBSD8, but sounds like > it is worth pursuing. (Appologies to anyone that thought I would have it > all fixed in a day or two.) It may be argued by some people, me included, that umount -f shall not override any ownership of kernel resources. In particular, you must not ignore the lockref. Instead, the threads that own misc filesystem resources, like mount reference counter, locked vnodes etc shall be weed out of the syscalls. E.g., finishing stalled rpc calls with some error code that is propagated to return code from vops is good solution. Quite similar problems happen with SIGSTOP and intr NFS mounts. You saw the proposed solution that is quite similar, it forces the threads owning the resources to progress to syscall boundary. Another problem with forced unmounts is that VFS does not block new threads from arriving into VOPs. When finishing the inflight rpcs, you may either leave some new rpcs behind or loop infinitely chasing rpcs that arrive while you finishing old rpcs. Half-measure is the filesystem suspension, that keeps operations that modify filesystem from entering VOPs. UFS uses suspension for unmounts and rw->ro remounts. Umount -f is needed in two different situations, one is normally worked filesystem that shall be unmounted by administrative request, detaching any resources opened by application. Second is the last-resort action when backing storage (server in NFS case, disk for UFS) is misbehaving. I think we must not break first case for the second. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090630/615ed841/attachment.pgp From mckusick at mckusick.com Tue Jun 30 22:37:43 2009 From: mckusick at mckusick.com (Kirk McKusick) Date: Tue Jun 30 22:37:48 2009 Subject: umount -f implementation In-Reply-To: <3bbf2fe10906300908p6b0f314di25bab46b03b5933a@mail.gmail.com> Message-ID: <200906302158.n5ULwdxk002480@chez.mckusick.com> Just for the history books, there originally were two forms of forced unmounts. The gentle force (-f) and the brute force (-F) unmount. The -f unmount flushes out all the dirty buffers so that when the unmount completes no data is lost and the filesystem is in a consistent state. The -F unmount invalidates and discards all the dirty buffers without attempting to do any I/O on them. The result is lost data and a possibly inconsistent filesystem. But it will get the job done even if the disk has died or the server has gone away. For reasons that I never tracked down, the -F unmount option was never incorporated into FreeBSD when they did the merge from 4.4BSD-Lite II, so that functionality never made it into the system. It is actually much easier to do than unmount -f since you just walk through and set B_INVAL and B_ERROR on all the dirty buffers for that filesystem. The problem with unmount -f is that it will hang if the server is gone since it will insist on pushing back all the dirty buffers. Kirk McKusick