From bugmaster at FreeBSD.org Mon Dec 1 03:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Dec 1 03:07:59 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200812011106.mB1B6s9N052534@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs][zfs][panic] NFS v3 Panic when under high load ex o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 28 problems total. From antik at bsd.ee Mon Dec 1 05:57:38 2008 From: antik at bsd.ee (Andrei Kolu) Date: Mon Dec 1 05:58:06 2008 Subject: Disklabel Editor strange labeling Message-ID: <4933ED59.5070802@bsd.ee> Hi, here is my disklabel editor "screenshot"- from where it got ad14cs1 partition name? When I created it then it was named ad14s1... --------------------------------------------------------------------------- FreeBSD Disklabel Editor Disk: ad14 Partition name: ad14cs1 Free: 293041602 blocks (139GB) Disk: ad18 Partition name: ad18s1 Free: 293041602 blocks (139GB) --------------------------------------------------------------------------- When I trie to create partition on this slice then I got error message: "Error mounting /dev/ad14cs1d on /tank/bootdir : No such file or directory" After I close sysinstall and restart it then partition disappear... --------------------------------------------------------------------------- Disk name: ad14 FDISK Partition Editor DISK Geometry: 18241 cyls/255 heads/63 sectors = 293041665 sectors (143086MB) Offset Size(ST) End Name PType Desc Subtype Flags 0 293046768 293046767 - 12 unused 0 --------------------------------------------------------------------------- FreeBSD testiserver 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #1: Mon Dec 1 12:13:41 EET 2008 root@testiserver:/usr/obj/usr/src/sys/KERNEL amd64 From antik at bsd.ee Tue Dec 2 02:16:04 2008 From: antik at bsd.ee (Andrei Kolu) Date: Tue Dec 2 02:16:10 2008 Subject: Disklabel Editor strange labeling In-Reply-To: <4933ED59.5070802@bsd.ee> References: <4933ED59.5070802@bsd.ee> Message-ID: <49350AE0.70408@bsd.ee> Andrei Kolu wrote: > Hi, > > here is my disklabel editor "screenshot"- from where it got ad14cs1 > partition name? When I created it then it was named ad14s1... > --------------------------------------------------------------------------- > > FreeBSD Disklabel Editor > > Disk: ad14 Partition name: ad14cs1 Free: 293041602 blocks (139GB) > Disk: ad18 Partition name: ad18s1 Free: 293041602 blocks (139GB) > --------------------------------------------------------------------------- > > > When I trie to create partition on this slice then I got error message: > > "Error mounting /dev/ad14cs1d on /tank/bootdir : No such file or > directory" > > After I close sysinstall and restart it then partition disappear... > --------------------------------------------------------------------------- > > Disk name: ad14 FDISK Partition > Editor > DISK Geometry: 18241 cyls/255 heads/63 sectors = 293041665 sectors > (143086MB) > > Offset Size(ST) End Name PType Desc > Subtype Flags > > 0 293046768 293046767 - 12 unused 0 > --------------------------------------------------------------------------- > > > FreeBSD testiserver 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #1: Mon Dec > 1 12:13:41 EET 2008 root@testiserver:/usr/obj/usr/src/sys/KERNEL > amd64 > OK, finally resolved problem by filling drive with zeroes: # dd if=/dev/zero of=/dev/ad14 bs=10m From janm at transactionware.com Tue Dec 2 03:05:31 2008 From: janm at transactionware.com (Jan Mikkelsen) Date: Tue Dec 2 03:05:39 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: Message-ID: Hi, Wes Morgan wrote: > On Sun, 16 Nov 2008, Matt Simerson wrote: > > > The Areca cards do NOT have the cache enabled by default. I > ordered the > > optional battery and RAM upgrade for my collection of > 1231ML cards. Even with > > the BBWC, the cache is not enabled by default. I had to go > out of my way to > > enable it, on every single controller. > > Are you using these areca cards successfully with large > arrays? I found a > 1680i card for a decent price and installed it this weekend, > but since > then I'm seeing the raidz2 pool that it's running hang so > frequently that > I can't even trust using it. The hangs occur in both 7-stable and > 8-current with the new ZFS patch. Same exact settings that > have been rock > solid for me before now don't want to work at all. The drives > are just set > as JBOD -- the controller actually defaulted to this, so I > didn't have to > make any real changes in the BIOS. > > Any tips on your setup? Did you have any similar problems? I am seeing I/O related lockups on 7.1-PRE with an Areca ARC-1220 controller and eight drives in a RAID-6 array. The same hardware works fine with 6.3. When I run gstat while it is happening I see I/O performance drop and the time to service each write (ms/w) goes up, and then suddenly goes back down to a sensible value. I have seen it get to about 22000ms. The system is essentially unusable for writes, which limits the utility a bit. Reads seem fine. Is this similar to the behaviour you saw? Thanks, Jan Mikkelsen From morganw at chemikals.org Tue Dec 2 04:04:37 2008 From: morganw at chemikals.org (Wes Morgan) Date: Tue Dec 2 04:04:44 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: Message-ID: On Tue, 2 Dec 2008, Jan Mikkelsen wrote: > Hi, > > Wes Morgan wrote: >> On Sun, 16 Nov 2008, Matt Simerson wrote: >> >>> The Areca cards do NOT have the cache enabled by default. I >> ordered the >>> optional battery and RAM upgrade for my collection of >> 1231ML cards. Even with >>> the BBWC, the cache is not enabled by default. I had to go >> out of my way to >>> enable it, on every single controller. >> >> Are you using these areca cards successfully with large >> arrays? I found a >> 1680i card for a decent price and installed it this weekend, >> but since >> then I'm seeing the raidz2 pool that it's running hang so >> frequently that >> I can't even trust using it. The hangs occur in both 7-stable and >> 8-current with the new ZFS patch. Same exact settings that >> have been rock >> solid for me before now don't want to work at all. The drives >> are just set >> as JBOD -- the controller actually defaulted to this, so I >> didn't have to >> make any real changes in the BIOS. >> >> Any tips on your setup? Did you have any similar problems? > > I am seeing I/O related lockups on 7.1-PRE with an Areca ARC-1220 controller > and eight drives in a RAID-6 array. The same hardware works fine with 6.3. > > When I run gstat while it is happening I see I/O performance drop and the > time to service each write (ms/w) goes up, and then suddenly goes back down > to a sensible value. I have seen it get to about 22000ms. > > The system is essentially unusable for writes, which limits the utility a > bit. Reads seem fine. > > Is this similar to the behaviour you saw? Not quite. The zfs deadlock/hang effected both reads and writes, blocking either of them indefinitely. They were "fixed" by the most recent set of patches in -current. From cattelan at thebarn.com Tue Dec 2 09:33:46 2008 From: cattelan at thebarn.com (Russell Cattelan) Date: Tue Dec 2 09:33:53 2008 Subject: Will XFS be adopted In-Reply-To: <20081109174303.GA5146@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> Message-ID: <49356EC7.80706@thebarn.com> Dan wrote: > With XFS being adopted by Linux now for a number of years, I wonder why > it hasn't been by FreeBSD. It's a great FS that can be resized on the > fly which makes it a perfect journaling FS with volume managers. Anybody > know? > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > So I find my self with a bit of extra time to actually try and make progress on XFS for FreeBSD again. I've started the process of essentially reporting xfs to FreeBSD, (alot of stuff has changed in XFS) It is still quite a ways off but I hope to have mount and read support working again shortly. The lastest work can be found at. http://oss.sgi.com/cgi-bin/gitweb.cgi?p=cattelan/FreeBSD_svn/.git;a=shortlog;h=refs/heads/xfs-work I do realize that I will never have enough time to fully support XFS but I hope to at least get it off the ground again and hopefully build some interest. -Russell Cattelan From zbeeble at gmail.com Tue Dec 2 15:04:23 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Tue Dec 2 15:04:29 2008 Subject: ZFS filesystem size anomaly... Message-ID: <5f67a8c40812021504p1d67fde1x3d9a9ef8d7214dfc@mail.gmail.com> This is all with ZFS version 6 in FreeBSD-7.1-RC, fyi... I have a machine with the ports tree mounted in ZFS right out of the examples: [3:75:375]root@canoe:/usr/ports> zfs list | grep ports canoe/ports 3.54G 66.6G 2.35G /usr/ports canoe/ports/distfiles 1.19G 66.6G 1.19G /usr/ports/distfiles ... but the sizes here are curious. 1.2G for distfiles is about correct... but 2.35G for the rest of ports is unreasonable. [3:76:376]root@canoe:/usr/ports> du -hs . 1.6G . ... saying that ports + distfiles are 1.6G. There are no snapshots for either ports or distfiles. There are no open work directories. The machine has recently rebooted. The only directory with more than 20M in it is distfiles --- which is the subfilesystem. ... this number 2.35G is 2G more than I expect. Where is this space? From rodrigc at crodrigues.org Tue Dec 2 16:49:55 2008 From: rodrigc at crodrigues.org (Craig Rodrigues) Date: Tue Dec 2 16:50:01 2008 Subject: questions about nmount and nfs In-Reply-To: <20081127205417.GE58709@elvis.mu.org> References: <20081127205417.GE58709@elvis.mu.org> Message-ID: <20081203004955.GA85280@crodrigues.org> On Thu, Nov 27, 2008 at 12:54:17PM -0800, Alfred Perlstein wrote: > What do you guys think? Is nmount up for this? Any pointers to > using nmount? Or should I sysctl? nmount() is up to the task. Look at the latest mount_nfs code in CURRENT. -- Craig Rodrigues rodrigc@crodrigues.org From ticso at cicely7.cicely.de Wed Dec 3 00:57:55 2008 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Wed Dec 3 00:58:01 2008 Subject: ZFS filesystem size anomaly... In-Reply-To: <5f67a8c40812021504p1d67fde1x3d9a9ef8d7214dfc@mail.gmail.com> References: <5f67a8c40812021504p1d67fde1x3d9a9ef8d7214dfc@mail.gmail.com> Message-ID: <20081203083734.GC71750@cicely7.cicely.de> On Tue, Dec 02, 2008 at 06:04:21PM -0500, Zaphod Beeblebrox wrote: > This is all with ZFS version 6 in FreeBSD-7.1-RC, fyi... > > I have a machine with the ports tree mounted in ZFS right out of the > examples: > > [3:75:375]root@canoe:/usr/ports> zfs list | grep ports > canoe/ports 3.54G 66.6G 2.35G /usr/ports > canoe/ports/distfiles 1.19G 66.6G 1.19G > /usr/ports/distfiles > > ... but the sizes here are curious. 1.2G for distfiles is about correct... > but 2.35G for the rest of ports is unreasonable. It always includes the subvolumes. So just canoe/ports alone is 2.35G - 1.19G. You can get more details with "zfs get all canoe/ports". -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From josh.carroll at gmail.com Wed Dec 3 14:56:06 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Wed Dec 3 14:56:14 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <20081125150342.GL2042@deviant.kiev.zoral.com.ua> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> Message-ID: <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> > Ok, I describe my concern once more. I do not object against the checking > of the inode size. But, if inode size is changed, then some data is added > to the inode, that could (and usually does, otherwise why extend it ?) > change intrerpetation of the inode. Thus, we need a verification of the > fact that simply ignoring added fields does not damage filesystem or > cause user data corruption. Verification != testing. Let me take another crack at explaining why I think this patch is not dangerous. The inode size is determined by reading the following member: __u16 s_inode_size; of the ext2_super_block structure. I took a look at the Linux 2.6.27.7 kernel source, and indeed they do something very similar if not the same: #define EXT2_INODE_SIZE(s) (EXT2_SB(s)->s_inode_size) If you compare to what I did: #define EXT2_INODE_SIZE(s) ((s)->u.ext2_sb.s_inode_size) This is really the same thing, since EXT2_SB is defined (in the Linux kernel src as): #ifdef __KERNEL__ #include static inline struct ext2_sb_info *EXT2_SB(struct super_block *sb) { return sb->s_fs_info; } And struct ext2_sb_info has the following member: int s_inode_size; Essentially, the changes I made simply query the existing information from the filesystem, which is what the Linux kernel does as well. The inode size, blocks per group, etc are all defined at filesystem creation time by mke2fs and it ensures the sanity of the relationship between the inodes/blocks/block groups. The first diagram here: http://sunsite.nus.sg/LDP/LDP/tlk/node95.html Makes it clear that as long as the number of inodes per block and the blocks per group is is sane during fs creation, looking up the inode size as my patch does is fine, since the creation of the filesystem is ensures a correct by construction situation. We're simply reading the size of the inode based on the filesystem. I hope this is sufficient to convince some further thought about committing this. For those interested in the relevant Linux kernel source, you can have a look at line 105 of include/linux/ext2_fs.h from the linux-2.6.27.7 kernel source. Thanks, Josh From bms at FreeBSD.org Wed Dec 3 23:31:04 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Wed Dec 3 23:31:11 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> Message-ID: <49378379.5050900@FreeBSD.org> Hi, The inode size for the ext3 filesystem which Gentoo created for my last install defaulted to 256 bytes, so I got bit by this problem. I can't speak for the write path. but the read path looks just fine to me, and the patch should go in ASAP. Josh Carroll wrote: >> Ok, I describe my concern once more. I do not object against the checking >> of the inode size. But, if inode size is changed, then some data is added >> to the inode, that could (and usually does, otherwise why extend it ?) >> change intrerpetation of the inode. Thus, we need a verification of the >> fact that simply ignoring added fields does not damage filesystem or >> cause user data corruption. Verification != testing. >> If folk are paranoid, then add a check for dynamic inode size and disable ext2fs writes by downgrading the mount in that case (We can do that, right? Can someone make sure Josh gets the help he needs here?) As Josh points out, the ext2 inode size is stored in the superblock. Whilst it may vary between ext2 filesystems, *the inode size itself does not appear to be something which one can modify in an existing ext2/3 filesystem*. Older ext2 filesystems may not contain the inode size field in the superblock, and the patch appears to default to 128 for that case. The double indirection thus introduced doesn't concern me, our ext2fs is not performance critical code, and the superblock is likely to sit in L2/L3 cache anyway (note: content free argument). Thanks to Josh for fixing this problem. cheers BMS From kostikbel at gmail.com Thu Dec 4 02:51:34 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Dec 4 02:51:58 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <49378379.5050900@FreeBSD.org> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <49378379.5050900@FreeBSD.org> Message-ID: <20081204105129.GA2246@deviant.kiev.zoral.com.ua> On Thu, Dec 04, 2008 at 07:15:05AM +0000, Bruce M. Simpson wrote: > Hi, > > The inode size for the ext3 filesystem which Gentoo created for my last > install defaulted to 256 bytes, so I got bit by this problem. > > I can't speak for the write path. but the read path looks just fine to > me, and the patch should go in ASAP. > > Josh Carroll wrote: > >>Ok, I describe my concern once more. I do not object against the checking > >>of the inode size. But, if inode size is changed, then some data is added > >>to the inode, that could (and usually does, otherwise why extend it ?) > >>change intrerpetation of the inode. Thus, we need a verification of the > >>fact that simply ignoring added fields does not damage filesystem or > >>cause user data corruption. Verification != testing. > >> > > If folk are paranoid, then add a check for dynamic inode size and > disable ext2fs writes by downgrading the mount in that case (We can do > that, right? Can someone make sure Josh gets the help he needs here?) > > As Josh points out, the ext2 inode size is stored in the superblock. > Whilst it may vary between ext2 filesystems, *the inode size itself does > not appear to be something which one can modify in an existing ext2/3 > filesystem*. > > Older ext2 filesystems may not contain the inode size field in the > superblock, and the patch appears to default to 128 for that case. The > double indirection thus introduced doesn't concern me, our ext2fs is not > performance critical code, and the superblock is likely to sit in L2/L3 > cache anyway (note: content free argument). > > Thanks to Josh for fixing this problem. Bruce, feel free to commit the patch. I do not want to spend time on ext2 in any form, and due to our (only partly jokingly) rule of the "last committer is the owner", I do not want to analyze ext2 bug reports after. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081204/61a1b7d2/attachment.pgp From bms at FreeBSD.org Thu Dec 4 05:16:19 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Thu Dec 4 05:16:31 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <20081204105129.GA2246@deviant.kiev.zoral.com.ua> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <49378379.5050900@FreeBSD.org> <20081204105129.GA2246@deviant.kiev.zoral.com.ua> Message-ID: <4937D820.8080803@FreeBSD.org> Kostik Belousov wrote: > ... > Bruce, feel free to commit the patch. > > I do not want to spend time on ext2 in any form, and due to our (only > partly jokingly) rule of the "last committer is the owner", I do not > want to analyze ext2 bug reports after. > Yes, development resource is limited here too, and any involvement on my part here DOES NOT constitute any commitment, express or implied, to maintaining the ext2fs module beyond the change being considered right now. I find that this often has to be reiterated as people are prone to confusing the concepts "open source" and "free", basic economics dictates infinite demand for free goods, and we've all got lives to live. As per our off-list discussion: It's a damned if we do and damned if we don't situation. Take the patch and it eats someone's lunch, and our reptuation suffers. Don't take the patch and look like patriarchal killjoys, and our reputation siffers. Your specific objection is that the testing is insufficient to exercise the patch, and there could be an area of ext2 which this patch doesn't address. That can never be said with 100% certainty, but I agree with you. Content free argument: Based on my reading of the code, the patch must be considered experimental. Whilst the scope of the patch appears to be small, the symbol space of ext2 is bigger -- a case of feeping creaturism due to ext2 itself, but hey, that's evolution for you. If folk are happy with it going in, let it go in, but remember, you get the system you apply effort to. I myself consider the patch experimental -- but HEAD is an experiment, is it not? Reality is what you can get away with. cheers BMS From josh.carroll at gmail.com Thu Dec 4 09:06:49 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Thu Dec 4 09:07:01 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <200812041747.09040.gnemmi@gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> Message-ID: <8cb6106e0812040902g69ec2f84t814c2f2b5cdb33f6@mail.gmail.com> > Could you please point me to your patch and an explanation on how to apply it > and test it? You can grab the patch here: http://pflog.net/~floyd/ext2fs.diff To apply it: cd /usr/src/sys/gnu/fs patch < /path/to/ext2fs.diff cd /usr/src/sys/modules/ext2fs make clean && make kldload ./ext2fs.ko Then umount and mount again your ext2 file systems. This should apply cleanly to RELENG_7_0, RELENG_7_1 and RELENG_7 source. I'm not sure if it'll apply cleanly to -CURRENT or not (I can provide an updated patch if you need it). Note: if you have ext2fs built into your kernel, you'll have to build and install your kernel as usual after patching, instead of building the module separately. Also, if you already have ext2fs loaded, you'll need to kldunload it first of course. If you want to update the ext2fs.ko in your installed kernel in /boot/kernel, after a make in .../modules/ext2fs, you can "make install". Thanks, Josh From gnemmi at gmail.com Thu Dec 4 09:13:48 2008 From: gnemmi at gmail.com (Gonzalo Nemmi) Date: Thu Dec 4 09:13:55 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> Message-ID: <200812041747.09040.gnemmi@gmail.com> On Wednesday 03 December 2008 8:53:43 pm Josh Carroll wrote: > > Ok, I describe my concern once more. I do not object against the checking > > of the inode size. But, if inode size is changed, then some data is added > > to the inode, that could (and usually does, otherwise why extend it ?) > > change intrerpetation of the inode. Thus, we need a verification of the > > fact that simply ignoring added fields does not damage filesystem or > > cause user data corruption. Verification != testing. > > Let me take another crack at explaining why I think this patch is not > dangerous. > > The inode size is determined by reading the following member: > > __u16 s_inode_size; > > of the ext2_super_block structure. > > I took a look at the Linux 2.6.27.7 kernel source, and indeed they do > something very similar if not the same: > > #define EXT2_INODE_SIZE(s) (EXT2_SB(s)->s_inode_size) > > If you compare to what I did: > > #define EXT2_INODE_SIZE(s) ((s)->u.ext2_sb.s_inode_size) > > This is really the same thing, since EXT2_SB is defined (in the Linux > kernel src as): > > #ifdef __KERNEL__ > #include > static inline struct ext2_sb_info *EXT2_SB(struct super_block *sb) > { > return sb->s_fs_info; > } > > And struct ext2_sb_info has the following member: > > int s_inode_size; > > Essentially, the changes I made simply query the existing information > from the filesystem, which is what the Linux kernel does as well. > > The inode size, blocks per group, etc are all defined at filesystem > creation time by mke2fs and it ensures the sanity of the relationship > between the inodes/blocks/block groups. > > The first diagram here: > > http://sunsite.nus.sg/LDP/LDP/tlk/node95.html > > Makes it clear that as long as the number of inodes per block and the > blocks per group is is sane during fs creation, looking up the inode > size as my patch does is fine, since the creation of the filesystem is > ensures a correct by construction situation. We're simply reading the > size of the inode based on the filesystem. > > I hope this is sufficient to convince some further thought about > committing this. > > For those interested in the relevant Linux kernel source, you can have > a look at line 105 of include/linux/ext2_fs.h from the linux-2.6.27.7 > kernel source. > > Thanks, > Josh Could you please point me to your patch and an explanation on how to apply it and test it? I've been following your las emails referencing it ( on @questions and @hackers or current i think it was ... ) and I'd like to give it a spin in here ... I've followed the "can't mount ext2/3" problem for a time (since I have that problem) and would like to know to what extent for patch solves the problem. Here are some of the references: mounting linux partitions Fri May 9 18:05:26 UTC 2008 http://lists.freebsd.org/pipermail/freebsd-questions/2008-May/174588.html bad file descriptor when mounting an ext2fs. Tue Jun 10 11:08:46 UTC 2008 http://lists.freebsd.org/pipermail/freebsd-questions/2008-June/176506.html mounting ext2fs partitions on FBSD7 ( third time a charm?) Fri Jul 4 23:33:53 UTC 2008 http://lists.freebsd.org/pipermail/freebsd-questions/2008-July/178219.html My case: root@inferna:~ # ls -l /dev/ad4s ad4s1% ad4s2% ad4s3% ad4s3a% ad4s3b% ad4s3c% ad4s3d% ad4s3e% ad4s3f% ad4s4% ad4s5% ad4s6% ad4s7% ad4s8% root@inferna:~ # ls -l /dev/ad4s root@inferna:~ # tune2fs -l /dev/ad4s1 | grep "Inode size" Inode size: 256 root@inferna:~ # tune2fs -l /dev/ad4s6 | grep "Inode size" Inode size: 256 root@inferna:~ # tune2fs -l /dev/ad4s7 | grep "Inode size" Inode size: 256 root@inferna:~ # tune2fs -l /dev/ad4s8 | grep "Inode size" Inode size: 256 root@inferna:~ # tune2fs -l /dev/ad4s9 | grep "Inode size" BTW: I'm willing to run any tests you need me too, even if they imply a serious risk of loosing data on the 256 inode partitions. Regards -- Blessings Gonzalo Nemmi From zbeeble at gmail.com Thu Dec 4 09:57:55 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Thu Dec 4 09:58:01 2008 Subject: ZFS filesystem size anomaly... In-Reply-To: <20081203083734.GC71750@cicely7.cicely.de> References: <5f67a8c40812021504p1d67fde1x3d9a9ef8d7214dfc@mail.gmail.com> <20081203083734.GC71750@cicely7.cicely.de> Message-ID: <5f67a8c40812040957h8d90b26t5092fa68c76bafb8@mail.gmail.com> On Wed, Dec 3, 2008 at 3:37 AM, Bernd Walter wrote: > On Tue, Dec 02, 2008 at 06:04:21PM -0500, Zaphod Beeblebrox wrote: > > This is all with ZFS version 6 in FreeBSD-7.1-RC, fyi... > > > > I have a machine with the ports tree mounted in ZFS right out of the > > examples: > > > > [3:75:375]root@canoe:/usr/ports> zfs list | grep ports > > canoe/ports 3.54G 66.6G 2.35G /usr/ports > > canoe/ports/distfiles 1.19G 66.6G 1.19G > > /usr/ports/distfiles > > > > ... but the sizes here are curious. 1.2G for distfiles is about > correct... > > but 2.35G for the rest of ports is unreasonable. > > It always includes the subvolumes. > So just canoe/ports alone is 2.35G - 1.19G. > You can get more details with "zfs get all canoe/ports". Yes... I understand that. There is a 2G discrepancy. Du on distfiles agrees --- 1.2G vs. 1.19. Du on ports says 1.6G --- including the 1.2 gig in distfiles. That means that du thinks that ports itself is 0.4 Gig, not 2.35 Gig. From onemda at gmail.com Thu Dec 4 12:34:02 2008 From: onemda at gmail.com (Paul B. Mahol) Date: Thu Dec 4 12:34:08 2008 Subject: [Call for Test] a patch for kern/121385 - Unionfs cross mount issue In-Reply-To: <4917E0C9.5020105@ongs.co.jp> References: <4917E0C9.5020105@ongs.co.jp> Message-ID: <3a142e750812041221v26520a53q4f7df488af607305@mail.gmail.com> On 11/10/08, Daichi GOTO wrote: > Hi Unionfs users > > About kern/121385 - Unionfs cross mount issue, by discussion at > EuroBSDCon2008, > unionfs does not allow user to do cross mount operation. If you have some > interest > this issue, please get this patch and try with current. I'll commit this > patch after 1 week later. > > PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/121385 > Patch: > http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-cross-mount.diff > > This issue was discussed at EuroBSDCon2008 FreeBSD developer summit. > Thanks for hrs and gnn :) > > -- > Daichi GOTO, http://people.freebsd.org/~daichi > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > I tested CURRENT and was unable to reproduce panic. -- Paul From daichi at freebsd.org Thu Dec 4 16:36:21 2008 From: daichi at freebsd.org (Daichi GOTO) Date: Thu Dec 4 16:36:28 2008 Subject: [Call for Test] a patch for kern/121385 - Unionfs cross mount issue In-Reply-To: <3a142e750812041221v26520a53q4f7df488af607305@mail.gmail.com> References: <4917E0C9.5020105@ongs.co.jp> <3a142e750812041221v26520a53q4f7df488af607305@mail.gmail.com> Message-ID: <49387783.7080503@freebsd.org> Thanks for your test, Paul B. Mahol wrote: > On 11/10/08, Daichi GOTO wrote: >> Hi Unionfs users >> >> About kern/121385 - Unionfs cross mount issue, by discussion at >> EuroBSDCon2008, >> unionfs does not allow user to do cross mount operation. If you have some >> interest >> this issue, please get this patch and try with current. I'll commit this >> patch after 1 week later. >> >> PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/121385 >> Patch: >> http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-cross-mount.diff >> >> This issue was discussed at EuroBSDCon2008 FreeBSD developer summit. >> Thanks for hrs and gnn :) >> >> -- >> Daichi GOTO, http://people.freebsd.org/~daichi >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >> > > I tested CURRENT and was unable to reproduce panic. You cannot get the same panic described on kern/121385 as follow: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/121385 How-To-Repeat: # mkdir -p /unionfs/disk1 # mkdir -p /unionfs/disk2 # mount -t unionfs /unionfs/disk1 /unionfs/disk2 # mount -t unionfs /unionfs/disk2 /unionfs/disk1 # touch /unionfs/disk1/foo The r178483 (http://svn.freebsd.org/viewvc/base?view=revision&revision=178483) avoids this panic. Try without r178483 and you will get a panic. Some discussions with some developers, we are testing a new cross-mount issue patch, after some more check, I'll open it. thanks :) -- Daichi GOTO, http://people.freebsd.org/~daichi From bms at FreeBSD.org Fri Dec 5 02:11:19 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Fri Dec 5 02:11:25 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <200812041747.09040.gnemmi@gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> Message-ID: <4938FE44.9090608@FreeBSD.org> Hi, I just tested the ext2fuse project on FreeBSD 7.1-PRERELEASE as of today and found that it works for read/write on an ext3 filesystem. The inode size was 128 -- I haven't exercised dynamic inode sizes. ext2fuse project: http://sourceforge.net/projects/ext2fuse Required FreeBSD patches: http://people.freebsd.org/~bms/dump/ext2fuse-freebsd.patch Steps: 1. fetch http://ovh.dl.sourceforge.net/sourceforge/ext2fuse/ext2fuse-src-0.8.1.tar.gz 2. tar xvf ext2fuse-src-0.8.1.tar.gz 3. patch -i ext2fuse-0.8.1-freebsd.patch 4. cd ext2fuse-src-0.8.1 5. aclocal ; automake ; autoconf 6. ./configure && gmake Performance seems quite slow, it could probably benefit from being ported to use UBLIO as ntfs-3g for FreeBSD has. I'm going to leave this thing for others to play with, this was a 20 minute bunk-off from other work. It shouldn't take much effort to create a port around it. cheers BMS From bms at FreeBSD.org Fri Dec 5 03:40:49 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Fri Dec 5 03:40:55 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <4938FE44.9090608@FreeBSD.org> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> Message-ID: <4939133E.2000701@FreeBSD.org> Bruce M. Simpson wrote: > ... > Performance seems quite slow, it could probably benefit from being > ported to use UBLIO as ntfs-3g for FreeBSD has. > > I'm going to leave this thing for others to play with, this was a 20 > minute bunk-off from other work. It shouldn't take much effort to > create a port around it. P.S. Ale's patch to use libublio for fusefs-ntfs is here, if anyone picks up on ext2fuse before I do: http://people.freebsd.org/~alepulver/fusefs-ntfs.diff This contains the necessary diffs for UBLIO. cheers BMS From avg at icyb.net.ua Fri Dec 5 04:57:25 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Dec 5 04:57:32 2008 Subject: partition covering the whole slice Message-ID: <49392532.4000300@icyb.net.ua> I have a disk with two slices and each slices has a single real partition covering the whole slice, sector-to-sector. I don't remember how I managed to configure the disk this way, is this even possible? :-) $ gpart show => 63 781422705 ad12 MBR (373G) 63 209712447 1 freebsd [active] (100G) 209712510 571705155 2 freebsd [active] (273G) 781417665 5103 - free - (2.5M) => 0 209712447 ufs/extbackup BSD (100G) 0 209712447 1 freebsd-ufs (100G) => 0 209712447 ad12s1 BSD (100G) 0 209712447 1 freebsd-ufs (100G) => 0 571705155 ufs/extstuff BSD (273G) 0 571705155 1 freebsd-ufs (273G) => 0 571705155 ad12s2 BSD (273G) 0 571705155 1 freebsd-ufs (273G) You can immediately spot another oddity - I never used glabel on this disk, but I did use tunefs -L to label the UFS filesystems within the partitions. Now it seems that the label of filesystems is also somehow recognized as a label for the whole slice. E.g. "ufs/extbackup" is exatcly the same as "ad12s1". Weird. Here's some additional data: $ ls -1 /dev/ad12* /dev/ad12 /dev/ad12s1 /dev/ad12s1a /dev/ad12s2 /dev/ad12s2a Looks usual. $ ls -1 /dev/ufs/ extbackup extbackupa extstuff extstuffa So there is one "normal" label for each filesystem and the second label for it as a filesystem in partition "a" of a labeled slice. There is nothing in /dev/label though. And a bit more: $ file -s /dev/ad12s1 /dev/ad12s1: Unix Fast File system [v2] (little-endian) last mounted on /automnt/ufs/extbackupa, volume name extbackup, last written at Tue Dec 2 17:47:21 2008, clean flag 1, readonly flag 0, number of blocks 13107027, number of data blocks 13002290, number of cylinder groups 35, block size 65536, fragment size 8192, average file size 16384, average number of files in dir 64, pending blocks to free 0, pending inodes to free 0, system-wide uuid 0, minimum percentage of free blocks 8, TIME optimization $ file -s /dev/ad12s1a /dev/ad12s1a: Unix Fast File system [v2] (little-endian) last mounted on /automnt/ufs/extbackupa, volume name extbackup, last written at Tue Dec 2 17:47:21 2008, clean flag 1, readonly flag 0, number of blocks 13107027, number of data blocks 13002290, number of cylinder groups 35, block size 65536, fragment size 8192, average file size 16384, average number of files in dir 64, pending blocks to free 0, pending inodes to free 0, system-wide uuid 0, minimum percentage of free blocks 8, TIME optimization So it looks like start of ad12s1 is the same as ad12s1a. On some better configured disks I see: $ file -s /dev/ad6s1 /dev/ad6s1: x86 boot sector; partition 4: ID=0xa5, active, starthead 0, startsector 0, 50000 sectors Ultimately I would like to fix this so that I don't see labels on the slices. -- Andriy Gapon From avg at icyb.net.ua Fri Dec 5 05:11:28 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Dec 5 05:11:34 2008 Subject: partition covering the whole slice [repost] Message-ID: <4939287C.3020208@icyb.net.ua> [Repost: I originally cc-ed gnome instead of geom; Sorry.] I have a disk with two slices and each slices has a single real partition covering the whole slice, sector-to-sector. I don't remember how I managed to configure the disk this way, is this even possible? :-) $ gpart show => 63 781422705 ad12 MBR (373G) 63 209712447 1 freebsd [active] (100G) 209712510 571705155 2 freebsd [active] (273G) 781417665 5103 - free - (2.5M) => 0 209712447 ufs/extbackup BSD (100G) 0 209712447 1 freebsd-ufs (100G) => 0 209712447 ad12s1 BSD (100G) 0 209712447 1 freebsd-ufs (100G) => 0 571705155 ufs/extstuff BSD (273G) 0 571705155 1 freebsd-ufs (273G) => 0 571705155 ad12s2 BSD (273G) 0 571705155 1 freebsd-ufs (273G) You can immediately spot another oddity - I never used glabel on this disk, but I did use tunefs -L to label the UFS filesystems within the partitions. Now it seems that the label of filesystems is also somehow recognized as a label for the whole slice. E.g. "ufs/extbackup" is exatcly the same as "ad12s1". Weird. Here's some additional data: $ ls -1 /dev/ad12* /dev/ad12 /dev/ad12s1 /dev/ad12s1a /dev/ad12s2 /dev/ad12s2a Looks usual. $ ls -1 /dev/ufs/ extbackup extbackupa extstuff extstuffa So there is one "normal" label for each filesystem and the second label for it as a filesystem in partition "a" of a labeled slice. There is nothing in /dev/label though. And a bit more: $ file -s /dev/ad12s1 /dev/ad12s1: Unix Fast File system [v2] (little-endian) last mounted on /automnt/ufs/extbackupa, volume name extbackup, last written at Tue Dec 2 17:47:21 2008, clean flag 1, readonly flag 0, number of blocks 13107027, number of data blocks 13002290, number of cylinder groups 35, block size 65536, fragment size 8192, average file size 16384, average number of files in dir 64, pending blocks to free 0, pending inodes to free 0, system-wide uuid 0, minimum percentage of free blocks 8, TIME optimization $ file -s /dev/ad12s1a /dev/ad12s1a: Unix Fast File system [v2] (little-endian) last mounted on /automnt/ufs/extbackupa, volume name extbackup, last written at Tue Dec 2 17:47:21 2008, clean flag 1, readonly flag 0, number of blocks 13107027, number of data blocks 13002290, number of cylinder groups 35, block size 65536, fragment size 8192, average file size 16384, average number of files in dir 64, pending blocks to free 0, pending inodes to free 0, system-wide uuid 0, minimum percentage of free blocks 8, TIME optimization So it looks like start of ad12s1 is the same as ad12s1a. On some better configured disks I see: $ file -s /dev/ad6s1 /dev/ad6s1: x86 boot sector; partition 4: ID=0xa5, active, starthead 0, startsector 0, 50000 sectors Ultimately I would like to fix this so that I don't see labels on the slices. -- Andriy Gapon From joao.barros at gmail.com Sat Dec 6 07:38:23 2008 From: joao.barros at gmail.com (Joao Barros) Date: Sat Dec 6 07:38:29 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> <70e8236f0811291708h7ece06dcm1bff0081b5b0fde8@mail.gmail.com> Message-ID: <70e8236f0812060738x5e259df6la00529006c07fc23@mail.gmail.com> On Sun, Nov 30, 2008 at 9:05 AM, Doug Rabson wrote: > > On 30 Nov 2008, at 01:08, Joao Barros wrote: > >> On Tue, Nov 25, 2008 at 1:48 AM, Joao Barros >> wrote: >>> >>> On Fri, Nov 21, 2008 at 9:31 PM, Olivier SMEDTS wrote: >>>> >>>> 2008/11/20 Doug Rabson : >>>>> >>>>> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I want to boot off a ZFS pool (version 13) on an USB stick for testing >>>>>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>>>>> I'm using a 2 hours old CURRENT. >>>>>> >>>>>> # kldload usb2_storage_mass >>>>>> # kldload zfs >>>>>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>>>>> # fdisk -BI da0 >>>>>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>>>>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>>>>> bsdlabel: boot code /boot/zfsboot is wrong size >>>>>> >>>>>> Is what I'm trying to do with bsdlabel wrong ? >>>>>> I previously tried with the default bootstrap code but I had an >>>>>> (expected) "boot: Not ufs" error at boot. >>>>>> >>>>>> PS : I'm not subscribed to this list. >>>>> >>>>> The process for install zfsboot is a bit manual (and undocumented). Try >>>>> something like this: >>>>> >>>>> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >>>>> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 >>>>> >>>>> Alternatively, you might try using the brand new support for GPT that I >>>>> committed yesterday: >>>>> >>>>> # gpt create -f da0 >>>>> # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 >>>>> # gpt add -t freebsd-zfs da0 >>>>> # zpool create mypool da0p2 >>>> >>>> It works ! >>>> >>>> Now I'm stuck at loader(8) prompt. >>> >>> That's a me too. >>> >>> I tried this under vmware with LOADER_ZFS_SUPPORT=yes on make.conf: >>> # gpart create -s gpt ad0 >>> # gpart add -b 34 -s 128 -t freebsd-boot ad0 >>> ad0p1 added >>> # gpart add -b 162 -s 15078327 -t freebsd-zfs ad0 >>> ad0p2 added >>> # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 >>> # zpool create tank ad0p2 >>> # zpool set bootfs = tank tank >>> >>> lsdev on loader shows: >>> cd devices: >>> disk devices: >>> disk0: BIOS drive c: >>> disk0p1: FreeBSD boot >>> disk0p2: FreeBSD ZFS >>> pxe devices: >>> zfs devices: >>> >>> Any hints? >>> >> >> >> I'm trying to figure out why loader doesn't see my zfs pool and here's >> what I got: >> >> FreeBSD/i386 boot >> Default: tank:/boot/loader >> boot: status pool: tank >> config: >> NAME STATE >> tank ONLINE >> ad0p2 ONLINE >> >> I added some printfs on loader\main.c: >> >> guid = kargs->zfspool; >> unit = zfs_guid_to_unit(guid); >> if (unit >= 0) { >> sprintf(devname, "zfs%d", unit); >> setenv("currdev", devname, 1); >> } >> >> and guid returns the correct guid for my pool but unit returns -1 >> which by looking at zfs_guid_to_unit means something is not right. >> >> Any pointers Doug? > > It looks like loader didn't manage to find the pool for some reason. This > probing process happens in sys/boot/zfs/zfs.c in the function > zfs_dev_init(). Its supposed to taste all the available disks and partitions > for the presence of a ZFS pool. The actual tasting process happens in > vdev_probe(). > > Paul Saab just commited this: http://svn.freebsd.org/changeset/base/185711 It's working now! Thank you very much to both! :-D -- Joao Barros From ps at mu.org Sat Dec 6 10:54:02 2008 From: ps at mu.org (Paul Saab) Date: Sat Dec 6 10:54:09 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <70e8236f0812060738x5e259df6la00529006c07fc23@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> <70e8236f0811291708h7ece06dcm1bff0081b5b0fde8@mail.gmail.com> <70e8236f0812060738x5e259df6la00529006c07fc23@mail.gmail.com> Message-ID: <02D2F116-55EF-49D6-91C7-DE71D3AF6F55@mu.org> ya. I spent a while looking at this. when you look at the generated code it was obvious. including the wrong sys/type.h is bad On Dec 6, 2008, at 7:38 AM, "Joao Barros" wrote: > On Sun, Nov 30, 2008 at 9:05 AM, Doug Rabson wrote: >> >> On 30 Nov 2008, at 01:08, Joao Barros wrote: >> >>> On Tue, Nov 25, 2008 at 1:48 AM, Joao Barros >>> wrote: >>>> >>>> On Fri, Nov 21, 2008 at 9:31 PM, Olivier SMEDTS >>>> wrote: >>>>> >>>>> 2008/11/20 Doug Rabson : >>>>>> >>>>>> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I want to boot off a ZFS pool (version 13) on an USB stick for >>>>>>> testing >>>>>>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>>>>>> I'm using a 2 hours old CURRENT. >>>>>>> >>>>>>> # kldload usb2_storage_mass >>>>>>> # kldload zfs >>>>>>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>>>>>> # fdisk -BI da0 >>>>>>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>>>>>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>>>>>> bsdlabel: boot code /boot/zfsboot is wrong size >>>>>>> >>>>>>> Is what I'm trying to do with bsdlabel wrong ? >>>>>>> I previously tried with the default bootstrap code but I had an >>>>>>> (expected) "boot: Not ufs" error at boot. >>>>>>> >>>>>>> PS : I'm not subscribed to this list. >>>>>> >>>>>> The process for install zfsboot is a bit manual (and >>>>>> undocumented). Try >>>>>> something like this: >>>>>> >>>>>> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >>>>>> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 >>>>>> >>>>>> Alternatively, you might try using the brand new support for >>>>>> GPT that I >>>>>> committed yesterday: >>>>>> >>>>>> # gpt create -f da0 >>>>>> # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 >>>>>> # gpt add -t freebsd-zfs da0 >>>>>> # zpool create mypool da0p2 >>>>> >>>>> It works ! >>>>> >>>>> Now I'm stuck at loader(8) prompt. >>>> >>>> That's a me too. >>>> >>>> I tried this under vmware with LOADER_ZFS_SUPPORT=yes on make.conf: >>>> # gpart create -s gpt ad0 >>>> # gpart add -b 34 -s 128 -t freebsd-boot ad0 >>>> ad0p1 added >>>> # gpart add -b 162 -s 15078327 -t freebsd-zfs ad0 >>>> ad0p2 added >>>> # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 >>>> # zpool create tank ad0p2 >>>> # zpool set bootfs = tank tank >>>> >>>> lsdev on loader shows: >>>> cd devices: >>>> disk devices: >>>> disk0: BIOS drive c: >>>> disk0p1: FreeBSD boot >>>> disk0p2: FreeBSD ZFS >>>> pxe devices: >>>> zfs devices: >>>> >>>> Any hints? >>>> >>> >>> >>> I'm trying to figure out why loader doesn't see my zfs pool and >>> here's >>> what I got: >>> >>> FreeBSD/i386 boot >>> Default: tank:/boot/loader >>> boot: status pool: tank >>> config: >>> NAME STATE >>> tank ONLINE >>> ad0p2 ONLINE >>> >>> I added some printfs on loader\main.c: >>> >>> guid = kargs->zfspool; >>> unit = zfs_guid_to_unit(guid); >>> if (unit >= 0) { >>> sprintf(devname, "zfs%d", unit); >>> setenv("currdev", devname, 1); >>> } >>> >>> and guid returns the correct guid for my pool but unit returns -1 >>> which by looking at zfs_guid_to_unit means something is not right. >>> >>> Any pointers Doug? >> >> It looks like loader didn't manage to find the pool for some >> reason. This >> probing process happens in sys/boot/zfs/zfs.c in the function >> zfs_dev_init(). Its supposed to taste all the available disks and >> partitions >> for the presence of a ZFS pool. The actual tasting process happens in >> vdev_probe(). >> >> > > Paul Saab just commited this: http://svn.freebsd.org/changeset/base/185711 > It's working now! Thank you very much to both! :-D > > > -- > Joao Barros From bms at incunabulum.net Sat Dec 6 22:37:03 2008 From: bms at incunabulum.net (Bruce M. Simpson) Date: Sat Dec 6 22:37:08 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0812040902g69ec2f84t814c2f2b5cdb33f6@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <8cb6106e0812040902g69ec2f84t814c2f2b5cdb33f6@mail.gmail.com> Message-ID: <493B6AB6.2040704@incunabulum.net> FYI: The ext2 IFS driver for Windows v1.11a also appears to have the inode size issue: http://www.fs-driver.org/ I was not able to mount an ext2 filesystem with 256 byte inode size using this driver. Its installer will see that the filesystem exists, that it's ext2, but whenever you try to mount, you get nothing -- a very similar failure mode to the FreeBSD ext2fs driver. Again, it sounds like the author is actively maintaining it, so it might be worth contacting him to pool resources. cheers, BMS From danielgonzalesn at gmail.com Sun Dec 7 11:26:20 2008 From: danielgonzalesn at gmail.com (DanielX) Date: Sun Dec 7 11:26:26 2008 Subject: Error install FreeBSD 7.0 Message-ID: <993c11f30812071104x3751b4a3g1ba39417eda68a12@mail.gmail.com> hello, I had problems when trying to install FreeBSD 7.0 (amd and i386) to my dv2125nr HP Pavilion laptop and other HP Pavilion dv9700t laptop comes this error, install succeed, but I doubt it comes out that error: loading required module `pci` ACPI autoload failed - no search file or directory int=00000006 en=000000000 efl=00010882 eip=00460da8 ?.. BTX halted Thank you -- DanielX H. Gonzales Nu?ez Est. Ing. Sistemas Uni. Privada del Norte From bms at FreeBSD.org Mon Dec 8 01:53:23 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Mon Dec 8 01:53:28 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <4939133E.2000701@FreeBSD.org> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> Message-ID: <493CEE90.7050104@FreeBSD.org> I have rolled a port for ext2fuse: http://people.freebsd.org/~bms/dump/fusefs-ext2fs.tar I look forward to your feedback. thanks, BMS From bugmaster at FreeBSD.org Mon Dec 8 03:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Dec 8 03:07:49 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200812081106.mB8B6t0L014247@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs][zfs][panic] NFS v3 Panic when under high load ex o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 28 problems total. From bryanalves at gmail.com Mon Dec 8 22:22:53 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Mon Dec 8 22:22:59 2008 Subject: ZFS resize disk vdev Message-ID: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> I'm thinking about using a hardware raid array with ZFS, using a single disk vdev zpool. I want the ability to add/remove disks to an array, and I'm still unsure of the stability of zfs as a whole. I'm looking for an easy way to resize and manage disks that are greater than 2 terabytes. If I have a single block device, /dev/da0, on my system that is represented by a zfs disk vdev, and the size of this block device grows (because the underlying hardware raid expands), will zfs correctly expand? And will it correctly expand in place? From andrew at modulus.org Mon Dec 8 22:43:58 2008 From: andrew at modulus.org (Andrew Snow) Date: Mon Dec 8 22:44:04 2008 Subject: ZFS resize disk vdev In-Reply-To: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> Message-ID: <493E12EC.4050801@modulus.org> Bryan Alves wrote: > I'm thinking about using a hardware raid array with ZFS, using a single disk > vdev zpool. I want the ability to add/remove disks to an array, and I'm > still unsure of the stability of zfs as a whole. I'm looking for an easy > way to resize and manage disks that are greater than 2 terabytes. > If I have a single block device, /dev/da0, on my system that is represented > by a zfs disk vdev, and the size of this block device grows (because the > underlying hardware raid expands), will zfs correctly expand? And will it > correctly expand in place? In theory, this works fine - I have never tried it myself. The only other way to expand a zpool is by adding more vdevs: You cannot change a vdev once it is created other than to take it from a single disk to a mirror. Sun's ZFS best practice guide states that you should avoid a single disk vdev because performance on the whole suffers and is worse than UFS. I am going to publish some benchmark figures soon to back this up, based on testing I did with a 16 disk hardware RAID6. ZFS was *alot* faster when I gave it the disks in a RAIDZ2 vdev. - Andrew From james-freebsd-fs2 at jrv.org Tue Dec 9 00:53:53 2008 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Tue Dec 9 00:54:00 2008 Subject: ZFS resize disk vdev In-Reply-To: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> Message-ID: <493E2AD2.8070704@jrv.org> Bryan Alves wrote: > I'm thinking about using a hardware raid array with ZFS, using a single disk > vdev zpool. I want the ability to add/remove disks to an array, and I'm > still unsure of the stability of zfs as a whole. I'm looking for an easy > way to resize and manage disks that are greater than 2 terabytes. > > If I have a single block device, /dev/da0, on my system that is represented > by a zfs disk vdev, and the size of this block device grows (because the > underlying hardware raid expands), will zfs correctly expand? And will it > correctly expand in place? > I see no benefit to using hardware RAID for a vdev. If there is any concern over ZFS stability then you're using a filesystem you suspect on an - at best - really reliable disk: not a step forward! I think best practice is to configure the disk controller to present the disks as JBOD and let ZFS handle things: avoid fancy hardware RAID controllers altogether and use the fastest JBOD controller configuration available. Using a hardware RAID seems likely to hurt performance since the hardware RAID must issue extra reads for partial parity-stripe updates: ZFS never does in-place disk writes and rarely if ever does partial parity-stripe updates. Block allocation will suffer since the filesystem allocator can't know the geometry of the underlying storage array when laying out a file. Parity rebuilds ("resilvering") can be much faster in ZFS since only things that are different need to be recomputed when a disk is reattached to a redundant vdev (and if a disk is replaced free space need not have parity computed). And hardware RAID just adds another layer of processing to slow things down. I'm not sure how ZFS reacts to an existing disk drive suddenly becoming larger. Real disk drives don't do that and ZFS is intended to use real disks. There are some uberblocks (pool superblocks) at the end of the disk and ZFS probably won't be able to find them if the uberblocks at the front of the disk are clobbered and the "end of the disk" has moved out away from the remaining uberblocks. You can replace all of the members of a redundant vdev one-by-one with larger disks and increase the storage capacity of that vdev and hence the pool. I routinely run zpools of 4TB and 5TB, which isn't even warming up for some people. Sun has had customers with ZFS pools in the petabytes. "disks that are greater than 2 terabytes" are pocket change. From ivoras at freebsd.org Tue Dec 9 01:58:39 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Dec 9 01:58:45 2008 Subject: ZFS resize disk vdev In-Reply-To: <493E12EC.4050801@modulus.org> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E12EC.4050801@modulus.org> Message-ID: Andrew Snow wrote: > Bryan Alves wrote: >> I'm thinking about using a hardware raid array with ZFS, using a >> single disk >> vdev zpool. I want the ability to add/remove disks to an array, and I'm >> still unsure of the stability of zfs as a whole. I'm looking for an easy >> way to resize and manage disks that are greater than 2 terabytes. >> If I have a single block device, /dev/da0, on my system that is >> represented >> by a zfs disk vdev, and the size of this block device grows (because the >> underlying hardware raid expands), will zfs correctly expand? And >> will it >> correctly expand in place? > > In theory, this works fine - I have never tried it myself. The only > other way to expand a zpool is by adding more vdevs: You cannot change > a vdev once it is created other than to take it from a single disk to a > mirror. > > Sun's ZFS best practice guide states that you should avoid a single disk > vdev because performance on the whole suffers and is worse than UFS. > > I am going to publish some benchmark figures soon to back this up, based > on testing I did with a 16 disk hardware RAID6. ZFS was *alot* faster > when I gave it the disks in a RAIDZ2 vdev. You mean it was a lot faster *per disk*, right? (otherwise, it's completely expected that a large RAID array will be faster than its components). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081209/d1916a4d/signature.pgp From andrew at modulus.org Tue Dec 9 04:10:35 2008 From: andrew at modulus.org (Andrew Snow) Date: Tue Dec 9 04:10:42 2008 Subject: ZFS resize disk vdev In-Reply-To: References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E12EC.4050801@modulus.org> Message-ID: <493E6032.1060300@modulus.org> Ivan Voras wrote: > You mean it was a lot faster *per disk*, right? (otherwise, it's > completely expected that a large RAID array will be faster than its > components). Yeah - Software RAIDZ2 was faster than both ZFS and UFS on hardware RAID6 at sequential writing. From onemda at gmail.com Tue Dec 9 05:53:50 2008 From: onemda at gmail.com (Paul B. Mahol) Date: Tue Dec 9 05:53:56 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <493CEE90.7050104@FreeBSD.org> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> Message-ID: <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> On 12/8/08, Bruce M. Simpson wrote: > I have rolled a port for ext2fuse: > http://people.freebsd.org/~bms/dump/fusefs-ext2fs.tar Ignoring fact that is buggy, slooow and port doesnt have any cache implemented and port leaves files behind in share/doc/ext2fuse when package deleted it looks fine. -- Paul From chris at arnold.se Tue Dec 9 06:27:44 2008 From: chris at arnold.se (Christopher Arnold) Date: Tue Dec 9 06:27:52 2008 Subject: ZFS and other filesystem semantics. Message-ID: Hi, i have been thinking a bit about filesytem semantics lately. Mainly about open files. Classicly if a file is open the filedescriptor continues accessing the same file regardless if it is deleted or someone did a mv and replaced it. But what happens in ZFS? * delete file in ZFS I guess this is a no brainer, standard unix way of accessing the old file. * The fs get snapshotted and file deleted Same as above i guess. * The fs gets snapshotted and later the snapshot get deleted... What happens here? Or maybe even: * The fs gets snapshotted, file deleted, then snapshot deleted. (These questions are actually just a sidestep from the issue im trying to figure out right no. But i guess they are nevertheless interesting.) The reason i have been thinking about this is that i'm implementing a remote RO filesystem with local caching. And to reduce latency i download chunks of the files and cache these chunks. I'm trying to keep the filesystem stateless, but my issue is that if the file get changed under our feet the resulting chunks would be from different files. Have anyone seen a nice solution to this issue? Does anyone have any ideas of how to implement unix like semantics over a stateless procotol without to much magic? /Chris -- http://www.arnold.se/chris/ From bryanalves at gmail.com Tue Dec 9 08:04:05 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Tue Dec 9 08:04:12 2008 Subject: ZFS resize disk vdev In-Reply-To: <493E2AD2.8070704@jrv.org> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E2AD2.8070704@jrv.org> Message-ID: <92f477740812090804k102dcb62qcd893b3263da56a9@mail.gmail.com> On Tue, Dec 9, 2008 at 3:22 AM, James R. Van Artsdalen < james-freebsd-fs2@jrv.org> wrote: > Bryan Alves wrote: > > I'm thinking about using a hardware raid array with ZFS, using a single > disk > > vdev zpool. I want the ability to add/remove disks to an array, and I'm > > still unsure of the stability of zfs as a whole. I'm looking for an easy > > way to resize and manage disks that are greater than 2 terabytes. > > > > If I have a single block device, /dev/da0, on my system that is > represented > > by a zfs disk vdev, and the size of this block device grows (because the > > underlying hardware raid expands), will zfs correctly expand? And will > it > > correctly expand in place? > > > I see no benefit to using hardware RAID for a vdev. If there is any > concern over ZFS stability then you're using a filesystem you suspect on > an - at best - really reliable disk: not a step forward! I think best > practice is to configure the disk controller to present the disks as > JBOD and let ZFS handle things: avoid fancy hardware RAID controllers > altogether and use the fastest JBOD controller configuration available. > > Using a hardware RAID seems likely to hurt performance since the > hardware RAID must issue extra reads for partial parity-stripe updates: > ZFS never does in-place disk writes and rarely if ever does partial > parity-stripe updates. Block allocation will suffer since the > filesystem allocator can't know the geometry of the underlying storage > array when laying out a file. Parity rebuilds ("resilvering") can be > much faster in ZFS since only things that are different need to be > recomputed when a disk is reattached to a redundant vdev (and if a disk > is replaced free space need not have parity computed). And hardware > RAID just adds another layer of processing to slow things down. > > I'm not sure how ZFS reacts to an existing disk drive suddenly becoming > larger. Real disk drives don't do that and ZFS is intended to use real > disks. There are some uberblocks (pool superblocks) at the end of the > disk and ZFS probably won't be able to find them if the uberblocks at > the front of the disk are clobbered and the "end of the disk" has moved > out away from the remaining uberblocks. > > You can replace all of the members of a redundant vdev one-by-one with > larger disks and increase the storage capacity of that vdev and hence > the pool. > > I routinely run zpools of 4TB and 5TB, which isn't even warming up for > some people. Sun has had customers with ZFS pools in the petabytes. > "disks that are greater than 2 terabytes" are pocket change. My reason for wanting to use my hardware controller isn't for speed, it's for the ability to migrate in place. I'm currently using 5 750GB drives, and I would like the flexibility to be able to purchase a 6th and grow my array by 750GB in place. If I could achieve something, anything, similar in ZFS (namely, buy an amount of disks smaller than the number of total disks in the array and see a gain in storage capacity), I would use ZFS. If I could do something like take a zpool that exists of a raidz vdev of my 5 750GB drives, and then I go off and purchase 3 1.5TB new drives and create a second raidz vdev and stripe them, and have the ability to remove the vdev from the pool without data loss assuming I have enough free space, then I would be happy. Maybe I am insanely overthinking this, and the use case of wanting to tack on 1 or 2 new drives isn't worth stressing out about. I'm looking for a middle ground between keep an array as is and replace every drive in the array at once to see a tangible gain. Also I'm using the hardware raid controller because it reports which drive failed and fail LED lights up. If I can get a physical sign of which drive died (assuming I have per-drive lights) with ZFS, then that is a non-issue. From freebsd-fs at tychl.net Tue Dec 9 08:41:55 2008 From: freebsd-fs at tychl.net (Nick Gustas) Date: Tue Dec 9 08:42:02 2008 Subject: ZFS resize disk vdev In-Reply-To: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> Message-ID: <493E9FB8.2090808@tychl.net> Bryan Alves wrote: > I'm thinking about using a hardware raid array with ZFS, using a single disk > vdev zpool. I want the ability to add/remove disks to an array, and I'm > still unsure of the stability of zfs as a whole. I'm looking for an easy > way to resize and manage disks that are greater than 2 terabytes. > > If I have a single block device, /dev/da0, on my system that is represented > by a zfs disk vdev, and the size of this block device grows (because the > underlying hardware raid expands), will zfs correctly expand? And will it > correctly expand in place? > If you have ZFS on the raw da0 and not on partition, it will expand with a zpool export/import or a reboot after the hardware is done expanding the array. If you put ZFS on a partition, you'll first need to extend the partition after the expansion finishes. I have a friend running ZFS on a 24 port 3ware controller that has expanded his system from a 4 disk raid-5 to a 17 disk raid-6, 1 to 2 disks at a time. Obviously he's well past 2TB at this point. Performance isn't as good as it would be natively, but it's faster than needed and the only option at the moment. No troubles yet other than a disk failure or two, the system has been in use since Sep 2007. > zpool status pool: threeware state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM threeware ONLINE 0 0 0 da0 ONLINE 0 0 0 errors: No known data errors From des at des.no Wed Dec 10 06:46:11 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed Dec 10 06:46:17 2008 Subject: ZFS and other filesystem semantics. In-Reply-To: (Christopher Arnold's message of "Tue, 9 Dec 2008 15:01:00 +0100 (CET)") References: Message-ID: <86k5a8qea6.fsf@ds4.des.no> Christopher Arnold writes: > * The fs gets snapshotted and later the snapshot get deleted... > What happens here? Snapshots are mounted like ordinary file systems, you have to unmount them before you destroy them. Of course, you can also forcibly unmount them like you would any other file system, with the usual consequences. DES -- Dag-Erling Sm?rgrav - des@des.no From uspoerlein at gmail.com Thu Dec 11 09:53:54 2008 From: uspoerlein at gmail.com (Ulrich Spoerlein) Date: Thu Dec 11 09:54:05 2008 Subject: ZFS backup advice Message-ID: <20081211175349.GA2735@roadrunner.spoerlein.net> Servus, I'm looking for advice on setting up a ZFS based hard disk backup solution. Given a large set of data, with perhaps 500MB data changes per day and two 1 TB disks, which option would you prefer and why: A) Separate ZFS pools on both disk. Using zfs send|recv to transfer snapshots every 2-3 days, taking the "backup" pool offline in the time in between (to keep the disk safe from surges, etc). or B) One ZFS mirror pool across both disk, resilvering the second half every 2-3 days and then detaching it again. Right now I'm favouring option A, as I can selectively "backup" part of the pool (excluding /usr/obj for example, though it is <10% of total capacity, so not a strong point), can use compression on the backup-pool and can potentially keep more snapshots on it than on the live pool. It should also be faster than resilvering the mirror every other day. I'd use B, iff ZFS is able to "self-heal" defective sectors on one mirror half, even if it is not fully resilvered. Does anyone know if this is possible? Please keep me CC'ed. Thanks! Cheers, Ulrich Spoerlein -- It is better to remain silent and be thought a fool, than to speak, and remove all doubt. From cowens at greatbaysoftware.com Thu Dec 11 13:53:05 2008 From: cowens at greatbaysoftware.com (Charles Owens) Date: Thu Dec 11 13:53:12 2008 Subject: gjournal root FS okay (with kern/128529 patch)? Message-ID: <49418925.2030101@greatbaysoftware.com> Folks, The recently discovered kern/128529 had me briefly worried, as we're planning to leverage gjournal pretty heavily with a RELENG_7 -based custom build. I've manually applied the fix (at present only in -CURRENT) to the 7.x sources can report that it appears to work well. I wanted to report that fact (as a vote for MFC) and also ask: with this fix in place does anyone have any lingering misgivings about use of gjournal for root filesystem? Our thanks to all responsible on kern/128529 and geom_journal in general. Charles -- **Charles Owens** *Great Bay Software** **|** v: *603.766.6105 *|** m: *603.866.0860 *|** f: *603.430.0713 *|** e: *cowens@GreatBaySoftware.com**** From rick-freebsd2008 at kiwi-computer.com Sat Dec 13 09:39:03 2008 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Sat Dec 13 09:39:09 2008 Subject: UFS label limitations Message-ID: <20081213173902.GA96883@keira.kiwi-computer.com> I always found it strange that when creating or changing a UFS label, you were restricted to alphanumeric characters. I really wanted some sort of separator, so I took a look at the code. Since the checks use isalnum(3) it is possible to create labels that are not 7-bit clean, depending upon your locale. After further investigation, I can't see any reason (in geom_label or otherwise) that the characters should be restricted in such a way. I applied the attached (inline) patch and have had no troubles creating, editing, or mounting via UFS labels. The patch allows you to create labels with any characters except '/' (for obvious reasons) and should work with most locales (with the tiny exception that multibyte characters which use 0x2F in subsequent bytes should be rejected, since geom_label is locale-agnostic). Would someone mind reviewing and committing this patch? Thank you, -- Rick C. Petty --- src/sbin/newfs/newfs.c.orig 2007-03-02 14:07:59.000000000 -0600 +++ src/sbin/newfs/newfs.c 2008-12-12 19:13:19.000000000 -0600 @@ -168,11 +168,10 @@ case 'L': volumelabel = optarg; i = -1; - while (isalnum(volumelabel[++i])); - if (volumelabel[i] != '\0') { - errx(1, "bad volume label. Valid characters are alphanumerics."); - } - if (strlen(volumelabel) >= MAXVOLLEN) { + while ((ch = volumelabel[++i]) != '\0') + if (ch == '/') + errx(1, "bad volume label. \"/\" not allowed."); + if (i >= MAXVOLLEN) { errx(1, "bad volume label. Length is longer than %d.", MAXVOLLEN); } --- src/sbin/tunefs/tunefs.c.orig 2008-02-26 14:25:35.000000000 -0600 +++ src/sbin/tunefs/tunefs.c 2008-12-12 19:19:33.000000000 -0600 @@ -153,13 +153,12 @@ name = "volume label"; Lvalue = optarg; i = -1; - while (isalnum(Lvalue[++i])); - if (Lvalue[i] != '\0') { + while ((ch = Lvalue[++i]) != '\0') + if (ch == '/') errx(10, - "bad %s. Valid characters are alphanumerics.", + "bad %s. \"/\" not allowed.", name); - } - if (strlen(Lvalue) >= MAXVOLLEN) { + if (i >= MAXVOLLEN) { errx(10, "bad %s. Length is longer than %d.", name, MAXVOLLEN - 1); } From bms at incunabulum.net Sat Dec 13 09:43:30 2008 From: bms at incunabulum.net (Bruce Simpson) Date: Sat Dec 13 09:43:42 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> Message-ID: <4943F43B.4060105@incunabulum.net> Paul B. Mahol wrote: > On 12/8/08, Bruce M. Simpson wrote: > >> I have rolled a port for ext2fuse: >> http://people.freebsd.org/~bms/dump/fusefs-ext2fs.tar >> > > Ignoring fact that is buggy, slooow and port doesnt have any cache implemented > and port leaves files behind in share/doc/ext2fuse when package > deleted it looks fine. > Can you please relay this feedback to the authors of ext2fuse? As mentioned earlier in the thread, the ext2fuse code could benefit from UBLIO-ization. Are you or any other volunteers happy to help out here? Can you elaborate further on the files being left behind by the port? I didn't see this issue in my own testing. thank you BMS From jh at saunalahti.fi Sat Dec 13 10:46:11 2008 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Sat Dec 13 10:46:17 2008 Subject: UFS label limitations In-Reply-To: <20081213173902.GA96883@keira.kiwi-computer.com> References: <20081213173902.GA96883@keira.kiwi-computer.com> Message-ID: <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> Hi, On 2008-12-13, Rick C. Petty wrote: > I applied the attached (inline) patch and have had no troubles creating, > editing, or mounting via UFS labels. The patch allows you to create > labels with any characters except '/' (for obvious reasons) and should > work with most locales (with the tiny exception that multibyte characters > which use 0x2F in subsequent bytes should be rejected, since geom_label > is locale-agnostic). geom_label has problems with other characters too. The problem is that it doesn't encode characters for XML output properly. See these PRs: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/104389 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120044 This already causes problems with file systems which allow non-ASCII label names. IMO the problem should be addressed before extending allowed characters in UFS labels. -- Jaakko From rick-freebsd2008 at kiwi-computer.com Sat Dec 13 11:23:21 2008 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Sat Dec 13 11:23:28 2008 Subject: UFS label limitations In-Reply-To: <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> References: <20081213173902.GA96883@keira.kiwi-computer.com> <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <20081213192320.GA97766@keira.kiwi-computer.com> On Sat, Dec 13, 2008 at 08:30:59PM +0200, Jaakko Heinonen wrote: > > geom_label has problems with other characters too. The problem is that > it doesn't encode characters for XML output properly. See these PRs: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/104389 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120044 > > This already causes problems with file systems which allow non-ASCII > label names. IMO the problem should be addressed before extending > allowed characters in UFS labels. Well at the very least can we allow all characters between 0x20 and 0x7e except for: "&/<>\ Why hasn't the patch from kern/104389 made it in to geom? It's been almost 20 months since any activity on it. The last patch seems pretty good, except for NUL and SOH (0x00 and 0x01), but can't you just use "&#xx;" encoding for those also? XML has its own limitations. I don't think it's worth limiting everybody else just because of this XML bug. -- Rick C. Petty From onemda at gmail.com Sat Dec 13 14:03:54 2008 From: onemda at gmail.com (Paul B. Mahol) Date: Sat Dec 13 14:04:07 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <4943F43B.4060105@incunabulum.net> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> <4943F43B.4060105@incunabulum.net> Message-ID: <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> On 12/13/08, Bruce Simpson wrote: > Paul B. Mahol wrote: >> On 12/8/08, Bruce M. Simpson wrote: >> >>> I have rolled a port for ext2fuse: >>> http://people.freebsd.org/~bms/dump/fusefs-ext2fs.tar >>> >> >> Ignoring fact that is buggy, slooow and port doesnt have any cache >> implemented >> and port leaves files behind in share/doc/ext2fuse when package >> deleted it looks fine. >> > > Can you please relay this feedback to the authors of ext2fuse? > > As mentioned earlier in the thread, the ext2fuse code could benefit from > UBLIO-ization. Are you or any other volunteers happy to help out here? Well, first higher priority would be to fix existing bugs. It would be very little gain with user cache, because it is already too much IMHO slow and adding user cache will not make it faster, but that is not port problem. > Can you elaborate further on the files being left behind by the port? I > didn't see this issue in my own testing. It install files in this way: test -z "/usr/local/share/doc/ext2fuse" || ./install-sh -c -d "/usr/local/share/doc/ext2fuse" make deinstall and pkg_delete doesnt not remove that files/dir, -- Paul From bms at incunabulum.net Sat Dec 13 16:15:30 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Sat Dec 13 16:15:43 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> <4943F43B.4060105@incunabulum.net> <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> Message-ID: <4944501E.40900@incunabulum.net> Paul B. Mahol wrote: >> Can you please relay this feedback to the authors of ext2fuse? >> >> As mentioned earlier in the thread, the ext2fuse code could benefit from >> UBLIO-ization. Are you or any other volunteers happy to help out here? >> > > Well, first higher priority would be to fix existing bugs. It would be > very little > gain with user cache, because it is already too much IMHO slow and > adding user cache > will not make it faster, but that is not port problem. > I'm not aware of bugs with ext2fuse itself; my work on the port was merely to try to raise awareness that a user-space project for ext2 filesystem access existed. Can you elaborate further on your experience with ext2fuse which seems to you to be buggy, i.e. symptoms, root cause analysis etc. ? Have you reported these to the author(s)? Have you measured the performance? Is the performance sufficient for the needs of an occasional desktop user? I realise we are largely involved in content-free argument here, however the trade-off of ext2fuse vs ext2fs in the FreeBSD kernel source tree, is that of a hopefully more actively maintained implementation vs one which is not maintained at all, and any alternatives for FreeBSD users would be welcome. thanks BMS From morganw at chemikals.org Sun Dec 14 07:29:09 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sun Dec 14 07:29:16 2008 Subject: SFF-8087 to fanout cable question Message-ID: Like some of you who have built big ZFS arrays on multi-port controllers, I've got a mess of cables in my chassis... Now I have some SAS backplanes with mini-sas (SFF-8087) plugs. I know they work because I've used them with an Areca 1680 controller and some standard mini-sas cables. I decided that I wanted to go a different direction, and purchased an ASUS P5BV/SAS board that has a builtin 8-port LSI 1068-based SAS controller (and I highly recommend it). Now I have the 4x SAS to SFF-8087, and it doesn't want to work... But it worked going from the Areca controller to a regular 8-port SAS/SATA backplane. Are these cables only usable in one direction? In summary: mini-sas (controller) to mini-sas (backplane) - works mini-sas (controller) to 8-port SAS/SATA (backplane) - works 8-port SAS (controller) to mini-sas (backplane) - nada 8-port SAS (controller) to 8-port SAS/SATA (backplane) - works Any ideas? Sorry if this is just a "duh" question. I don't do this for a living, I'm just obsessive about my media server :) From onemda at gmail.com Sun Dec 14 07:47:29 2008 From: onemda at gmail.com (Paul B. Mahol) Date: Sun Dec 14 07:47:37 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <4944501E.40900@incunabulum.net> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> <4943F43B.4060105@incunabulum.net> <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> <4944501E.40900@incunabulum.net> Message-ID: <3a142e750812140747r2eb5ebadp7ac2b2c8ae357bae@mail.gmail.com> On 12/14/08, Bruce M Simpson wrote: > Paul B. Mahol wrote: >>> Can you please relay this feedback to the authors of ext2fuse? >>> >>> As mentioned earlier in the thread, the ext2fuse code could benefit from >>> UBLIO-ization. Are you or any other volunteers happy to help out here? >>> >> >> Well, first higher priority would be to fix existing bugs. It would be >> very little >> gain with user cache, because it is already too much IMHO slow and >> adding user cache >> will not make it faster, but that is not port problem. >> > > I'm not aware of bugs with ext2fuse itself; my work on the port was > merely to try to raise awareness that a user-space project for ext2 > filesystem access existed. > > Can you elaborate further on your experience with ext2fuse which seems > to you to be buggy, i.e. symptoms, root cause analysis etc. ? Have you > reported these to the author(s)? I have read TODO. > Have you measured the performance? Is the performance sufficient for the > needs of an occasional desktop user? Performance was not sufficient, and adding user cache will not improve access speed on first read. After mounting ext2fs volume (via md(4)) created with e2fsprogs port and copying data from ufs to ext2, reading was quite slow. Also ext2fuse after mount doesnt exits it is still running displaying debug data - explaining why project itselfs is in alpha state. > I realise we are largely involved in content-free argument here, however > the trade-off of ext2fuse vs ext2fs in the FreeBSD kernel source tree, > is that of a hopefully more actively maintained implementation vs one > which is not maintained at all, and any alternatives for FreeBSD users > would be welcome. Project itself doesnt look very active, but I may be wrong. It is in alpha state as reported on SF. IMHO it is better to maintain our own because it is in better shape, but I'm not intersted in ext* as developer. -- Paul From michael at fuckner.net Sun Dec 14 08:33:55 2008 From: michael at fuckner.net (Michael Fuckner) Date: Sun Dec 14 08:34:02 2008 Subject: SFF-8087 to fanout cable question In-Reply-To: References: Message-ID: <34a9ac6411eca56bca7766a5c745c694.squirrel@dedihh.fuckner.net> > Like some of you who have built big ZFS arrays on multi-port controllers, > I've got a mess of cables in my chassis... Now I have some SAS backplanes > with mini-sas (SFF-8087) plugs. I know they work because I've used them > with an Areca 1680 controller and some standard mini-sas cables. I decided > that I wanted to go a different direction, and purchased an ASUS P5BV/SAS > board that has a builtin 8-port LSI 1068-based SAS controller (and I > highly recommend it). Now I have the 4x SAS to SFF-8087, and it doesn't > want to work... But it worked going from the Areca controller to a regular > 8-port SAS/SATA backplane. > Are these cables only usable in one direction? Yes! Multilane-multilane cables are all the same (even if some vendors have different part numbers for different directions). If you connect discrete ports to a multilane-backplane you need a reverse break-out cable. See: http://3ware.com/products/pdf/3ware_Cable_Brochure.pdf In contrast you need forward breakout for multilane controllers and discrete backplanes. > In summary: > > mini-sas (controller) to mini-sas (backplane) - works > mini-sas (controller) to 8-port SAS/SATA (backplane) - works > 8-port SAS (controller) to mini-sas (backplane) - nada > 8-port SAS (controller) to 8-port SAS/SATA (backplane) - works > > Any ideas? Sorry if this is just a "duh" question. I don't do this for a > living, I'm just obsessive about my media server :) Trust me, it took me quite a while to figure this out... Regards, Michael! From bugmaster at FreeBSD.org Mon Dec 15 03:06:52 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Dec 15 03:07:56 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200812151106.mBFB6pE4004316@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs][zfs][panic] NFS v3 Panic when under high load ex o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 28 problems total. From des at des.no Mon Dec 15 09:05:09 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon Dec 15 09:05:16 2008 Subject: UFS label limitations In-Reply-To: <20081213192320.GA97766@keira.kiwi-computer.com> (Rick C. Petty's message of "Sat, 13 Dec 2008 13:23:20 -0600") References: <20081213173902.GA96883@keira.kiwi-computer.com> <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> <20081213192320.GA97766@keira.kiwi-computer.com> Message-ID: <86y6yh5pz0.fsf@ds4.des.no> "Rick C. Petty" writes: > Well at the very least can we allow all characters between 0x20 and 0x7e > except for: "&/<>\ Stick to the POSIX portable file name character set: [A-Za-z0-9._-] DES -- Dag-Erling Sm?rgrav - des@des.no From zbeeble at gmail.com Mon Dec 15 11:55:42 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Mon Dec 15 11:55:49 2008 Subject: ZFS and other filesystem semantics. In-Reply-To: References: Message-ID: <5f67a8c40812151155o166b96b1meef07e685307c9ba@mail.gmail.com> On Tue, Dec 9, 2008 at 9:01 AM, Christopher Arnold wrote: > i have been thinking a bit about filesytem semantics lately. Mainly about > open files. > > Classicly if a file is open the filedescriptor continues accessing the same > file regardless if it is deleted or someone did a mv and replaced it. > > But what happens in ZFS? > > * delete file in ZFS > I guess this is a no brainer, standard unix way of accessing the old file. When all references to data are freed, the data is freed. directory entries and open files are both references. > * The fs get snapshotted and file deleted > Same as above i guess. A snapshot counts as a reference > * The fs gets snapshotted and later the snapshot get deleted... > What happens here? A snapshot is a reference. When the file is "deleted" the snapshot still references the data. When the snapshot is deleted, if the data has no other references, it is freed. > Or maybe even: > * The fs gets snapshotted, file deleted, then snapshot deleted. > > (These questions are actually just a sidestep from the issue im trying to > figure out right no. But i guess they are nevertheless interesting.) > > The reason i have been thinking about this is that i'm implementing a > remote RO filesystem with local caching. And to reduce latency i download > chunks of the files and cache these chunks. I'm trying to keep the > filesystem stateless, but my issue is that if the file get changed under our > feet the resulting chunks would be from different files. > > Have anyone seen a nice solution to this issue? > > Does anyone have any ideas of how to implement unix like semantics over a > stateless procotol without to much magic? The semantics you desire are basically reference counting. From zbeeble at gmail.com Mon Dec 15 12:39:42 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Mon Dec 15 12:39:48 2008 Subject: ZFS resize disk vdev In-Reply-To: <92f477740812090804k102dcb62qcd893b3263da56a9@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E2AD2.8070704@jrv.org> <92f477740812090804k102dcb62qcd893b3263da56a9@mail.gmail.com> Message-ID: <5f67a8c40812151239o2b1f1f4cje7170cb1221133cd@mail.gmail.com> On Tue, Dec 9, 2008 at 11:04 AM, Bryan Alves wrote: > On Tue, Dec 9, 2008 at 3:22 AM, James R. Van Artsdalen < > james-freebsd-fs2@jrv.org> wrote: > > > > I'm not sure how ZFS reacts to an existing disk drive suddenly becoming > > larger. Real disk drives don't do that and ZFS is intended to use real > > disks. There are some uberblocks (pool superblocks) at the end of the > > disk and ZFS probably won't be able to find them if the uberblocks at > > the front of the disk are clobbered and the "end of the disk" has moved > > out away from the remaining uberblocks. > Very well, in fact. In fact, one way to "grow" a RAID Z1 or Z2 pool is to replace each disk with a larger one. When the last one is finished resilvering, you will have more space. My reason for wanting to use my hardware controller isn't for speed, it's > for the ability to migrate in place. I'm currently using 5 750GB drives, > and I would like the flexibility to be able to purchase a 6th and grow my > array by 750GB in place. If I could achieve something, anything, similar > in > ZFS (namely, buy an amount of disks smaller than the number of total disks > in the array and see a gain in storage capacity), I would use ZFS. You can't add one disk... but you can add several (easily). There are two ways ZFS grows and both are well documented. The first is add another set of disks (at least 2 for mirroring, 3 for Z1 and 4 for z2). ZFS recomends not more than 9 disks per RAID group anyways. In my case, I have 6 750G drives in my array. They're pretty much full... so I'm looking at adding another 6 1T drives shortly. This is transparent and the "industry" would call this RAID50... that is two raid 5 (Z1) groups striped together. The second way to add space is to replace disks with larger ones (one-by-one). Lets say, down the road, that my disks are full again and 4T disks are common and cheap. I replace each 750G disk with a 4T disk and let things resilver. My array would have been 8.75 gig (3.75T from the 750's and 5T from the 1T drives) and it would suddenly be 25T (20 from the 4T drives and 5T from the 1T drives). This increase in space occurs when the last drive is resilvered. This last step is good because at some point drives are not worth the power to run. I turned off my array of 18G SCSI drives a couple of years ago --- it wasn't worth the power. In the ZFS realm... instead of transferring the data and turning off the system, you upgrade. From bryanalves at gmail.com Mon Dec 15 13:12:09 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Mon Dec 15 13:12:16 2008 Subject: ZFS resize disk vdev In-Reply-To: <5f67a8c40812151239o2b1f1f4cje7170cb1221133cd@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E2AD2.8070704@jrv.org> <92f477740812090804k102dcb62qcd893b3263da56a9@mail.gmail.com> <5f67a8c40812151239o2b1f1f4cje7170cb1221133cd@mail.gmail.com> Message-ID: <92f477740812151312vccef91eu171062a50eb46ca1@mail.gmail.com> On Mon, Dec 15, 2008 at 3:39 PM, Zaphod Beeblebrox wrote: > On Tue, Dec 9, 2008 at 11:04 AM, Bryan Alves wrote: > >> On Tue, Dec 9, 2008 at 3:22 AM, James R. Van Artsdalen < >> james-freebsd-fs2@jrv.org> wrote: >> > > >> >> > I'm not sure how ZFS reacts to an existing disk drive suddenly becoming >> > larger. Real disk drives don't do that and ZFS is intended to use real >> > disks. There are some uberblocks (pool superblocks) at the end of the >> > disk and ZFS probably won't be able to find them if the uberblocks at >> > the front of the disk are clobbered and the "end of the disk" has moved >> > out away from the remaining uberblocks. >> > > Very well, in fact. In fact, one way to "grow" a RAID Z1 or Z2 pool is to > replace each disk with a larger one. When the last one is finished > resilvering, you will have more space. > > My reason for wanting to use my hardware controller isn't for speed, it's >> for the ability to migrate in place. I'm currently using 5 750GB drives, >> and I would like the flexibility to be able to purchase a 6th and grow my >> array by 750GB in place. If I could achieve something, anything, similar >> in >> ZFS (namely, buy an amount of disks smaller than the number of total disks >> in the array and see a gain in storage capacity), I would use ZFS. > > > You can't add one disk... but you can add several (easily). There are two > ways ZFS grows and both are well documented. > > The first is add another set of disks (at least 2 for mirroring, 3 for Z1 > and 4 for z2). ZFS recomends not more than 9 disks per RAID group anyways. > In my case, I have 6 750G drives in my array. They're pretty much full... > so I'm looking at adding another 6 1T drives shortly. This is transparent > and the "industry" would call this RAID50... that is two raid 5 (Z1) groups > striped together. > > The second way to add space is to replace disks with larger ones > (one-by-one). Lets say, down the road, that my disks are full again and 4T > disks are common and cheap. I replace each 750G disk with a 4T disk and let > things resilver. My array would have been 8.75 gig (3.75T from the 750's > and 5T from the 1T drives) and it would suddenly be 25T (20 from the 4T > drives and 5T from the 1T drives). This increase in space occurs when the > last drive is resilvered. > > This last step is good because at some point drives are not worth the power > to run. I turned off my array of 18G SCSI drives a couple of years ago --- > it wasn't worth the power. In the ZFS realm... instead of transferring the > data and turning off the system, you upgrade. > In the case of option one, after this stripe of 2 raidz's is created though, those old drives can't be pulled from the array can they? More specifically, after we "upgrade" to what would be termed Raid50, we can't "downgrade" back to Raid5, right? From zbeeble at gmail.com Mon Dec 15 14:09:23 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Mon Dec 15 14:10:06 2008 Subject: ZFS resize disk vdev In-Reply-To: <92f477740812151312vccef91eu171062a50eb46ca1@mail.gmail.com> References: <92f477740812082155y3365bec7v5574206dd1a98e26@mail.gmail.com> <493E2AD2.8070704@jrv.org> <92f477740812090804k102dcb62qcd893b3263da56a9@mail.gmail.com> <5f67a8c40812151239o2b1f1f4cje7170cb1221133cd@mail.gmail.com> <92f477740812151312vccef91eu171062a50eb46ca1@mail.gmail.com> Message-ID: <5f67a8c40812151409g665b81f2v261a8aa035db679b@mail.gmail.com> On Mon, Dec 15, 2008 at 4:12 PM, Bryan Alves wrote: > In the case of option one, after this stripe of 2 raidz's is created > though, those old drives can't be pulled from the array can they? More > specifically, after we "upgrade" to what would be termed Raid50, we can't > "downgrade" back to Raid5, right? > According to the ZFS website, the ability of removing vdevs is planned but not yet implemented. You can also not shink a vdev... so you're not losing any functionality there. Even if your hardware raid supported shrinking the array, ZFS would not. From rick-freebsd2008 at kiwi-computer.com Mon Dec 15 15:48:11 2008 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Mon Dec 15 15:48:17 2008 Subject: UFS label limitations In-Reply-To: <86y6yh5pz0.fsf@ds4.des.no> References: <20081213173902.GA96883@keira.kiwi-computer.com> <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> <20081213192320.GA97766@keira.kiwi-computer.com> <86y6yh5pz0.fsf@ds4.des.no> Message-ID: <20081215234809.GA24403@keira.kiwi-computer.com> On Mon, Dec 15, 2008 at 06:05:07PM +0100, Dag-Erling Sm?rgrav wrote: > "Rick C. Petty" writes: > > Well at the very least can we allow all characters between 0x20 and 0x7e > > except for: "&/<>\ > > Stick to the POSIX portable file name character set: [A-Za-z0-9._-] Good idea. It gives me the separators I need. Would a committer be willing to review and commit the attached (inline) patch? -- Rick C. Petty --- src/sbin/newfs/newfs.c.orig 2007-03-02 14:07:59.000000000 -0600 +++ src/sbin/newfs/newfs.c 2008-12-15 17:29:26.000000000 -0600 @@ -168,11 +168,15 @@ case 'L': volumelabel = optarg; i = -1; - while (isalnum(volumelabel[++i])); - if (volumelabel[i] != '\0') { - errx(1, "bad volume label. Valid characters are alphanumerics."); - } - if (strlen(volumelabel) >= MAXVOLLEN) { + while ((ch = volumelabel[++i]) != '\0') + if (ch != '-' && ch != '.' && ch != '_' && + (ch < '0' || ch > '9') && + (ch < 'A' || ch > 'Z') && + (ch < 'a' || ch > 'z')) + errx(1, + "bad volume label. Valid characters are " + "[0-9A-Za-z._-]."); + if (i >= MAXVOLLEN) { errx(1, "bad volume label. Length is longer than %d.", MAXVOLLEN); } --- src/sbin/tunefs/tunefs.c.orig 2008-02-26 14:25:35.000000000 -0600 +++ src/sbin/tunefs/tunefs.c 2008-12-15 17:27:58.000000000 -0600 @@ -153,13 +153,16 @@ name = "volume label"; Lvalue = optarg; i = -1; - while (isalnum(Lvalue[++i])); - if (Lvalue[i] != '\0') { + while ((ch = Lvalue[++i]) != '\0') + if (ch != '-' && ch != '.' && ch != '_' && + (ch < '0' || ch > '9') && + (ch < 'A' || ch > 'Z') && + (ch < 'a' || ch > 'z')) errx(10, - "bad %s. Valid characters are alphanumerics.", + "bad %s. Valid characters are " + "[0-9A-Za-z._-].", name); - } - if (strlen(Lvalue) >= MAXVOLLEN) { + if (i >= MAXVOLLEN) { errx(10, "bad %s. Length is longer than %d.", name, MAXVOLLEN - 1); } From des at des.no Tue Dec 16 05:29:53 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Dec 16 05:30:02 2008 Subject: UFS label limitations In-Reply-To: <20081215234809.GA24403@keira.kiwi-computer.com> (Rick C. Petty's message of "Mon, 15 Dec 2008 17:48:09 -0600") References: <20081213173902.GA96883@keira.kiwi-computer.com> <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> <20081213192320.GA97766@keira.kiwi-computer.com> <86y6yh5pz0.fsf@ds4.des.no> <20081215234809.GA24403@keira.kiwi-computer.com> Message-ID: <8663lk5ju7.fsf@ds4.des.no> "Rick C. Petty" writes: > Dag-Erling Sm?rgrav writes: > > Stick to the POSIX portable file name character set: [A-Za-z0-9._-] > Good idea. It gives me the separators I need. Would a committer be > willing to review and commit the attached (inline) patch? Take a look at strspn(3). DES -- Dag-Erling Sm?rgrav - des@des.no From rick-freebsd2008 at kiwi-computer.com Tue Dec 16 12:10:48 2008 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Tue Dec 16 12:10:55 2008 Subject: UFS label limitations In-Reply-To: <8663lk5ju7.fsf@ds4.des.no> References: <20081213173902.GA96883@keira.kiwi-computer.com> <20081213183058.GA20992@a91-153-125-115.elisa-laajakaista.fi> <20081213192320.GA97766@keira.kiwi-computer.com> <86y6yh5pz0.fsf@ds4.des.no> <20081215234809.GA24403@keira.kiwi-computer.com> <8663lk5ju7.fsf@ds4.des.no> Message-ID: <20081216201046.GA34809@keira.kiwi-computer.com> On Tue, Dec 16, 2008 at 02:29:52PM +0100, Dag-Erling Sm?rgrav wrote: > "Rick C. Petty" writes: > > Dag-Erling Sm?rgrav writes: > > > Stick to the POSIX portable file name character set: [A-Za-z0-9._-] > > Good idea. It gives me the separators I need. Would a committer be > > willing to review and commit the attached (inline) patch? > > Take a look at strspn(3). You think it's better to use strspn than a couple range checks? I would have thought the character lookup with strspn would be slower and more ghastly to look at. -- Rick C. Petty From osharoiko at gmail.com Tue Dec 16 23:18:18 2008 From: osharoiko at gmail.com (Oleg Sharoyko) Date: Tue Dec 16 23:18:24 2008 Subject: Strange behaviour with unionfs Message-ID: <1229497075.1182.9.camel@brain.cc.rsu.ru> Hi! Could please someone check the following sequence of commands in recent -CURRENT: cd /tmp mkdir sandbox cd sandbox/ mkdir -p 1 mkdir -p 2/2 mkdir -p 3/3 echo Test > 3/3/test.txt mount -t unionfs 2 1 mount -t nullfs 3 1/2 cat 1/2/3/test.txt test -d 1/2 cat 1/2/3/test.txt I'm running -STABLE with patch for unix sockets (which I converted from -CURRENT) and it gives me really strange results: hetzner-srv1, /tmp # cd /tmp hetzner-srv1, /tmp # mkdir sandbox hetzner-srv1, /tmp # cd sandbox/ hetzner-srv1, /tmp/sandbox # mkdir -p 1 hetzner-srv1, /tmp/sandbox # mkdir -p 2/2 hetzner-srv1, /tmp/sandbox # mkdir -p 3/3 hetzner-srv1, /tmp/sandbox # echo Test > 3/3/test.txt hetzner-srv1, /tmp/sandbox # mount -t unionfs 2 1 hetzner-srv1, /tmp/sandbox # mount -t nullfs 3 1/2 hetzner-srv1, /tmp/sandbox # cat 1/2/3/test.txt cat: 1/2/3/test.txt: No such file or directory hetzner-srv1, /tmp/sandbox # test -d 1/2 hetzner-srv1, /tmp/sandbox # cat 1/2/3/test.txt Test hetzner-srv1, /tmp/sandbox # It looks like files in subdirectories of filesystems mounted on top of unionfs are not visible until I somehow test the mountpoint. -- Oleg Sharoyko. Software and Network Engineer Computer Center of Rostov State University. From dfr at rabson.org Wed Dec 17 10:25:54 2008 From: dfr at rabson.org (Doug Rabson) Date: Wed Dec 17 10:26:00 2008 Subject: Booting from ZFS raidz Message-ID: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> I've been working on adding raidz and raidz2 support to the boot code and I have a patch which could use some testing if anyone here is interested. This http://people.freebsd.org/~dfr/ raidzboot-17122008.diff adds support for raidz and raidz2. The easiest way to prepare a bootable pool is to put a GPT boot partition on each disk that will make up the raidz pool and install gptzfsboot on the boot partition of every drive. You can boot from any of the drives and as long as the BIOS can see enough drives you should be able to boot. The boot code supports booting from degraded pools and pools where some of the data is corrupt (as long as it has enough data available to repair the problem). Currently the ZFS kernel code refuses to allow you to set the bootfs pool property on raidz pools (because Solaris can't boot from them). This means that you are limited to booting from the root filesystem of the pool for now (it shouldn't be hard to relax this restriction). The root filesystem of the pool should contain a directory /boot with the usual contents which must include a /boot/loader which was built with the 'LOADER_ZFS_SUPPORT' make option. From zbeeble at gmail.com Wed Dec 17 13:51:02 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Wed Dec 17 13:51:08 2008 Subject: More on ZFS filesystem sizes. Message-ID: <5f67a8c40812171351j66dc5484pee631198030a5739@mail.gmail.com> So... I posted before about the widly different sizes reported by zfs list and du -h for my ports repository. Nobody explained this to any satisfying degree. I now have another quandry. I have ZFS on my laptop (two drives, mirrored) and I "zfs send" backups to my big array (6 drives, raid-Z1). The problem is that they don't match up: On the 6 drive array: vr2/backup/canoe/64/usr@20080307-1541 746M - 4.82G - vr2/backup/canoe/64/usr@20080309-1443 221M - 4.79G - vr2/backup/canoe/64/usr@20080319-1722 334M - 4.97G - vr2/backup/canoe/64/usr@20080329-0041 27.8M - 5.24G - vr2/backup/canoe/64/usr@20080402-2300 21.9M - 5.27G - vr2/backup/canoe/64/usr@20080416-0223 18.5M - 5.29G - vr2/backup/canoe/64/usr@20080417-0117 18.6M - 5.29G - On the 2 drive laptop: canoe/64/usr@20080307-1541 738M - 4.76G - canoe/64/usr@20080309-1443 217M - 4.73G - canoe/64/usr@20080319-1722 330M - 4.90G - canoe/64/usr@20080329-0041 26.7M - 5.17G - canoe/64/usr@20080402-2300 20.6M - 5.20G - canoe/64/usr@20080416-0223 17.5M - 5.22G - canoe/64/usr@20080417-0117 17.5M - 5.22G - ... note that the snapshot sizes differ by many megabytes ... and not seemingly any fixed amount, either. From brooks at freebsd.org Wed Dec 17 15:17:13 2008 From: brooks at freebsd.org (Brooks Davis) Date: Wed Dec 17 15:17:19 2008 Subject: More on ZFS filesystem sizes. In-Reply-To: <5f67a8c40812171351j66dc5484pee631198030a5739@mail.gmail.com> References: <5f67a8c40812171351j66dc5484pee631198030a5739@mail.gmail.com> Message-ID: <20081217231757.GE27041@lor.one-eyed-alien.net> On Wed, Dec 17, 2008 at 04:51:00PM -0500, Zaphod Beeblebrox wrote: > So... I posted before about the widly different sizes reported by zfs list > and du -h for my ports repository. Nobody explained this to any satisfying > degree. > > I now have another quandry. I have ZFS on my laptop (two drives, mirrored) > and I "zfs send" backups to my big array (6 drives, raid-Z1). The problem > is that they don't match up: > > On the 6 drive array: > > vr2/backup/canoe/64/usr@20080307-1541 746M - 4.82G - > vr2/backup/canoe/64/usr@20080309-1443 221M - 4.79G - > vr2/backup/canoe/64/usr@20080319-1722 334M - 4.97G - > vr2/backup/canoe/64/usr@20080329-0041 27.8M - 5.24G - > vr2/backup/canoe/64/usr@20080402-2300 21.9M - 5.27G - > vr2/backup/canoe/64/usr@20080416-0223 18.5M - 5.29G - > vr2/backup/canoe/64/usr@20080417-0117 18.6M - 5.29G - > > On the 2 drive laptop: > > canoe/64/usr@20080307-1541 738M - 4.76G - > canoe/64/usr@20080309-1443 217M - 4.73G - > canoe/64/usr@20080319-1722 330M - 4.90G - > canoe/64/usr@20080329-0041 26.7M - 5.17G - > canoe/64/usr@20080402-2300 20.6M - 5.20G - > canoe/64/usr@20080416-0223 17.5M - 5.22G - > canoe/64/usr@20080417-0117 17.5M - 5.22G - > > ... note that the snapshot sizes differ by many megabytes ... and not > seemingly any fixed amount, either. Have you tried asking the zfs developers? I'd tend to assume zfs is reporting the amount of space it thinks it's using and that as long as the numbers are close to expected it's not likely to be a FreeBSD issue. It might well be the case that a given bit of data takes different amounts of space when stored on different pool types due to needing different meta data. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081217/39feedb1/attachment.pgp From andrew at modulus.org Wed Dec 17 16:10:57 2008 From: andrew at modulus.org (Andrew Snow) Date: Wed Dec 17 16:11:04 2008 Subject: More on ZFS filesystem sizes. In-Reply-To: <20081217231757.GE27041@lor.one-eyed-alien.net> References: <5f67a8c40812171351j66dc5484pee631198030a5739@mail.gmail.com> <20081217231757.GE27041@lor.one-eyed-alien.net> Message-ID: <494994F9.4010105@modulus.org> > I now have another quandry. I have ZFS on my laptop (two drives, mirrored) > and I "zfs send" backups to my big array (6 drives, raid-Z1). The problem > is that they don't match up As you know, ZFS has variable block sizes from 512 bytes to 128kb with every power of 2 in between. Each block has a fair chunk of meta-data to go with it (those 128 bit pointers aren't very space efficient!) I suppose what you're seeing is due to fragmentation, since with copy-on-write for snapshots, big blocks can be replaced with smaller ones when a file is partially updated, but these can be written more efficiently during the send/receive process, as only the actually referenced data needs to be stored. Given all of that, your numbers are only out by 1 to 1.5%, so is it really that surprising? Regarding du on ZFS, it calculates the result based on the number of blocks consumed by the file, excluding metadata and parity and checksums, and after compression. /usr/ports will be full of tiny, compressable files resulting in a large ratio of metadata to actual file data. "zfs list" returns the space consumed including metadata, parity, and checksums. (Also, filesystem metadata is stored twice by default, or three times optionally, in addition to whatever RAID you are using.) So it is weird, but I believe what you're seeing is normal. Maybe you need special ZFS sunglasses which black out whenever you start trying to look at what ZFS is doing to your files :-) - Andrew From bms at incunabulum.net Thu Dec 18 07:44:34 2008 From: bms at incunabulum.net (Bruce Simpson) Date: Thu Dec 18 07:44:40 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <3a142e750812140747r2eb5ebadp7ac2b2c8ae357bae@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <8cb6106e0812031453j6dc2f2f4i374145823c084eaa@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> <4943F43B.4060105@incunabulum.net> <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> <4944501E.40900@incunabulum.net> <3a142e750812140747r2eb5ebadp7ac2b2c8ae357bae@mail.gmail.com> Message-ID: <494A6FDF.8030103@incunabulum.net> Paul B. Mahol wrote: > Project itself doesnt look very active, but I may be wrong. It is in alpha state > as reported on SF. > IMHO it is better to maintain our own because it is in better shape, but I'm not > intersted in ext* as developer. > Shelved due to lack of interest, then... others can feel free to pick up. thanks BMS From matt at corp.spry.com Thu Dec 18 10:19:37 2008 From: matt at corp.spry.com (Matt Simerson) Date: Thu Dec 18 10:19:43 2008 Subject: ZFS performance gains real or imaginary? Message-ID: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> Did I miss some major ZFS performance enhancements? I upgraded the disks in my home file server to 1.5TB disks. Rather than using gmirror as I did last time, I decided to use ZFS to mirror them. The file server was running 7.0 and booted off a CF card so it was simply a matter of adding in the extra disks, configuring them with ZFS, and copying all the data over. [root@storage] ~ # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ad11 ONLINE 0 0 0 ad13 ONLINE 0 0 0 ZFS under FreeBSD 7 is horrendously slow. It took almost two days to copy 600GB of data (a bunch of MP3s, movies, and UFS backups of my servers in data centers) to the ZFS volume. Once completed, I removed the old disks. The file system performance after switching to ZFS is quite underwhelming. I notice it when doing any sort of writes to it. This echoes my experience with ZFS on my production backup servers at work. (all systems are multi-core Intel with 4GB+ RAM). $ ssh back01 uname -a FreeBSD back01.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Aug 15 16:42:36 PDT 2008 root@back01.int.spry.com:/usr/obj/usr/src/ sys/BACK01 amd64 $ ssh back02 uname -a FreeBSD back02.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #1: Wed Aug 13 13:57:19 PDT 2008 root@back02.int.spry.com:/usr/obj/usr/src/ sys/BACK02-HEAD amd64 On the two systems above (amd64 with 16GB of RAM and 24 1TB disks) I get about 30 days of uptime before the system hangs with a ZFS error. They write backups to disk 24x7 and never stop. I could not anything near that level of stability with back03 (below) which was much older hardware maxed out at 4GB of RAM. I finally resolved the stability issues on back03 by ditching ZFS and using geom_stripe across the two hardware RAID arrays. $ ssh back03 uname -a FreeBSD back03.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Tue Oct 28 16:54:22 PDT 2008 root@back03.int.spry.com:/usr/obj/usr/src/ sys/GENERIC amd64 Yesterday I did a cvsup to 8-HEAD and built a new kernel and world. I installed the new kernel, and then paniced slightly when I booted off the new kernel and the ZFS utilities proved completely worthless in attempts to get /usr and /var mounted (which are both on ZFS). It took a quick Google search to remember the solution: mount -t zfs tank/usr /usr mount -t zfs tank/var /var After installing world and rebooting, the system is positively snappy. File system interaction, which is lethargic on every ZFS system I've installed seems to be much faster. I haven't benchmarked the IO performance but something definitely changed. It's almost like the latency has decreased. Would changes committed since mid-August (when I built my last ZFS servers from -HEAD + the patch) and now explain this? If so, then I really should be upgrading my production ZFS servers to the latest -HEAD. Matt PS: I am using compression and getting the following results: [root@storage] ~ # zfs get compressratio NAME PROPERTY VALUE SOURCE tank compressratio 1.12x - tank/usr compressratio 1.12x - tank/usr/.snapshots compressratio 2.09x - tank/var compressratio 2.13x - In retrospect, I wouldn't bother with compression on /usr. But, / usr/.snapshots is my rsnapshot based backups of my servers sitting in remote data centers. Since the majority of changes between snapshots is log files, the data is quite compressible and ZFS compressions is quite effective. It's also quite effective on /var, as is shown. ZFS compression is effectively getting me 1/3 more disk space off my 1.5TB disks. From andrew at modulus.org Thu Dec 18 16:13:03 2008 From: andrew at modulus.org (Andrew Snow) Date: Thu Dec 18 16:13:10 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> Message-ID: <494AE6F4.30506@modulus.org> > Did I miss some major ZFS performance enhancements? ZFS under 7 is almost completely useless, since I can make it crash reliably by running "rsync", there's not alot of point talking about its speed! Would changes committed since mid-August (when I > built my last ZFS servers from -HEAD + the patch) and now explain this? Yes. > If so, then I really should be upgrading my production ZFS servers to > the latest -HEAD. Thats correct, that is the only way to get the best working version of ZFS. Of course, then everything is unstable and broken - eg. SMBFS became unusable for me and would crash the server. . ZFS > compression is effectively getting me 1/3 more disk space off my 1.5TB > disks You should try gzip-9 compression mode, it saves almost that much space again all over :-) - Andrew From morganw at chemikals.org Thu Dec 18 16:48:21 2008 From: morganw at chemikals.org (Wes Morgan) Date: Thu Dec 18 16:48:28 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> Message-ID: On Thu, 18 Dec 2008, Matt Simerson wrote: > ZFS under FreeBSD 7 is horrendously slow. It took almost two days to copy > 600GB of data (a bunch of MP3s, movies, and UFS backups of my servers in data > centers) to the ZFS volume. Once completed, I removed the old disks. The file > system performance after switching to ZFS is quite underwhelming. I notice it > when doing any sort of writes to it. This echoes my experience with ZFS on > my production backup servers at work. (all systems are multi-core Intel with > 4GB+ RAM). That sounds completely contrary to my experience. I was able to migrate a 1.3 TB 6-disk raidz to a 8-disk raidz2, so the data had to come off and go back on. Took about 12-14 hours in total. My original setup included an SiS 2-port PCI SATA controller, which was a dog. Upgrading to a better setup improved the write performance drastically. But I don't think I load my systems down quite as much. I did have to upgrade to -current once I went to a board with higher throughput, as -stable would eventually deadlock each pool. > > On the two systems above (amd64 with 16GB of RAM and 24 1TB disks) I get > about 30 days of uptime before the system hangs with a ZFS error. They write > backups to disk 24x7 and never stop. I could not anything near that level of > stability with back03 (below) which was much older hardware maxed out at 4GB > of RAM. I finally resolved the stability issues on back03 by ditching ZFS > and using geom_stripe across the two hardware RAID arrays. Were you doing a zfs mirror across two hardware raid arrays? The performance of that type of setup would probably be sub-optimal versus a zpool with two raidz volumes. > Yesterday I did a cvsup to 8-HEAD and built a new kernel and world. I > installed the new kernel, and then paniced slightly when I booted off the new > kernel and the ZFS utilities proved completely worthless in attempts to get > /usr and /var mounted (which are both on ZFS). It took a quick Google search > to remember the solution: *cough* ABI compatibility isn't always preserved across releases. The best way to go from 7 to 8 is usually to perform the buildworld and buildkernel, drop into single user mode and install them both, then reboot. However, you're likely to run into problems that would require to to export/import your pools. > After installing world and rebooting, the system is positively snappy. File > system interaction, which is lethargic on every ZFS system I've installed > seems to be much faster. I haven't benchmarked the IO performance but > something definitely changed. It's almost like the latency has decreased. > Would changes committed since mid-August (when I built my last ZFS servers > from -HEAD + the patch) and now explain this? > > If so, then I really should be upgrading my production ZFS servers to the > latest -HEAD. > > Matt > > PS: I am using compression and getting the following results: > > [root@storage] ~ # zfs get compressratio > NAME PROPERTY VALUE SOURCE > tank compressratio 1.12x - > tank/usr compressratio 1.12x - > tank/usr/.snapshots compressratio 2.09x - > tank/var compressratio 2.13x - > > In retrospect, I wouldn't bother with compression on /usr. But, > /usr/.snapshots is my rsnapshot based backups of my servers sitting in remote > data centers. Since the majority of changes between snapshots is log files, > the data is quite compressible and ZFS compressions is quite effective. It's > also quite effective on /var, as is shown. ZFS compression is effectively > getting me 1/3 more disk space off my 1.5TB > disks._______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From matt at corp.spry.com Thu Dec 18 21:37:42 2008 From: matt at corp.spry.com (Matt Simerson) Date: Thu Dec 18 21:37:48 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> Message-ID: On Dec 18, 2008, at 4:48 PM, Wes Morgan wrote: >> On the two systems above (amd64 with 16GB of RAM and 24 1TB disks) >> I get about 30 days of uptime before the system hangs with a ZFS >> error. They write backups to disk 24x7 and never stop. I could not >> anything near that level of stability with back03 (below) which was >> much older hardware maxed out at 4GB of RAM. I finally resolved >> the stability issues on back03 by ditching ZFS and using >> geom_stripe across the two hardware RAID arrays. > > Were you doing a zfs mirror across two hardware raid arrays? The > performance of that type of setup would probably be sub-optimal > versus a zpool with two raidz volumes. I haven't benchmarked it with -HEAD but with FreeBSD 7, using a ZFS mirror across two 12-disk hardware RAID arrays (Areca 1231ML) was significantly (not quite double) faster than using JBOD and raidz. I tested a few variations (four disk pools, six disk zpools, 8 disk zpools, etc). I'll be getting another 24 disk system to add to my backup pool in a month or two. When it arrives, I'll run some additional benchmarks with -HEAD and see where the numbers fall. I'll be quite surprised if raidz can outrun a hardware RAID controller with 512MB of BBWC. Matt From james-freebsd-fs2 at jrv.org Fri Dec 19 01:09:11 2008 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Fri Dec 19 01:09:18 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> Message-ID: <494B61F7.3030904@jrv.org> Matt Simerson wrote: > I haven't benchmarked it with -HEAD but with FreeBSD 7, using a ZFS > mirror across two 12-disk hardware RAID arrays (Areca 1231ML) was > significantly (not quite double) faster than using JBOD and raidz. I > tested a few variations (four disk pools, six disk zpools, 8 disk > zpools, etc). A backup server is a *highly* specialized type of server. It's likely that data is only rarely updated, meaning that there are very few partial parity-stripe writes for the Areca to deal with. A database server receiving many updates would have an entirely different pattern of write I/O, possibly forcing many partial stripe updates. Since ZFS (almost?) never does partial stripe writes in a RAIDZ the performance comparison between ZFS with JBOD and your hardware setup might change considerably with a database workload. Not to mention the dominance of sequential I/O in a backup server, etc. For a backup server ZFS has other advantages. A client's backup server recently ran low on space so I took over another 4x1GB enclosure and added it to the pool with no downtime: there were a couple of large file writes to that pool running when I arrived that were still going when I left. There's also the issue of cost: once SATA port multiplier support works in FreeBSD it will be very practical to build cheap ~15TB servers for a small business using ZFS. From linimon at FreeBSD.org Fri Dec 19 04:44:52 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Fri Dec 19 04:45:04 2008 Subject: kern/129760: [nfs] after 'umount -f' of a stale NFS share FreeBSD locks up Message-ID: <200812191244.mBJCiqe4075367@freefall.freebsd.org> Old Synopsis: after 'umount -f' of a stale NFS share FreeBSD locks up New Synopsis: [nfs] after 'umount -f' of a stale NFS share FreeBSD locks up Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Dec 19 12:44:33 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=129760 From zbeeble at gmail.com Fri Dec 19 09:30:29 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Fri Dec 19 09:30:41 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <494B61F7.3030904@jrv.org> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> <494B61F7.3030904@jrv.org> Message-ID: <5f67a8c40812190930s51353898w2c8479b6afc25c8b@mail.gmail.com> On Fri, Dec 19, 2008 at 3:57 AM, James R. Van Artsdalen < james-freebsd-fs2@jrv.org> wrote: > There's also the issue of cost: once SATA port multiplier support works > in FreeBSD it will be very practical to build cheap ~15TB servers for a > small business using ZFS. It's certainly not bad already. There are consumer cases that will take 15 to 18 hard drives internally. There are motherboards with 6 or 8 SATA ports. And there are simple SATA cards that are cheap enough these days. I think I got a 4 port for $40 for my machine. I see "buy it now"'s for 8 port cards around $100 on eBay. 16 ports * 1T drives is ~15TB. Make it 1.5T drives and RAID-Z2 and you have more protection and a bit more space. From bahamasfranks at gmail.com Fri Dec 19 14:18:07 2008 From: bahamasfranks at gmail.com (Steve Franks) Date: Fri Dec 19 14:18:13 2008 Subject: ext2fuse: user-space ext2 implementation In-Reply-To: <3a142e750812140747r2eb5ebadp7ac2b2c8ae357bae@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <200812041747.09040.gnemmi@gmail.com> <4938FE44.9090608@FreeBSD.org> <4939133E.2000701@FreeBSD.org> <493CEE90.7050104@FreeBSD.org> <3a142e750812090553l564bff84pe1f02cd1b03090ff@mail.gmail.com> <4943F43B.4060105@incunabulum.net> <3a142e750812131403p31841403ub9d5693278c74111@mail.gmail.com> <4944501E.40900@incunabulum.net> <3a142e750812140747r2eb5ebadp7ac2b2c8ae357bae@mail.gmail.com> Message-ID: <539c60b90812191351i6090f24ejb9006471f74f01b9@mail.gmail.com> On Sun, Dec 14, 2008 at 8:47 AM, Paul B. Mahol wrote: > On 12/14/08, Bruce M Simpson wrote: >> Paul B. Mahol wrote: >>>> Can you please relay this feedback to the authors of ext2fuse? >>>> >>>> As mentioned earlier in the thread, the ext2fuse code could benefit from >>>> UBLIO-ization. Are you or any other volunteers happy to help out here? >>>> >>> >>> Well, first higher priority would be to fix existing bugs. It would be >>> very little >>> gain with user cache, because it is already too much IMHO slow and >>> adding user cache >>> will not make it faster, but that is not port problem. >>> >> >> I'm not aware of bugs with ext2fuse itself; my work on the port was >> merely to try to raise awareness that a user-space project for ext2 >> filesystem access existed. >> >> Can you elaborate further on your experience with ext2fuse which seems >> to you to be buggy, i.e. symptoms, root cause analysis etc. ? Have you >> reported these to the author(s)? > > I have read TODO. > >> Have you measured the performance? Is the performance sufficient for the >> needs of an occasional desktop user? > > Performance was not sufficient, and adding user cache will not improve access > speed on first read. > After mounting ext2fs volume (via md(4)) created with e2fsprogs port > and copying data > from ufs to ext2, reading was quite slow. Also ext2fuse after mount > doesnt exits it > is still running displaying debug data - explaining why project > itselfs is in alpha > state. > >> I realise we are largely involved in content-free argument here, however >> the trade-off of ext2fuse vs ext2fs in the FreeBSD kernel source tree, >> is that of a hopefully more actively maintained implementation vs one >> which is not maintained at all, and any alternatives for FreeBSD users >> would be welcome. > > Project itself doesnt look very active, but I may be wrong. It is in alpha state > as reported on SF. > IMHO it is better to maintain our own because it is in better shape, but I'm not > intersted in ext* as developer. AFAIK our ext* either barfs or corrupts ext3, and since linux is pretty much all using ext3 these days, we're stuck in read-only for ext3, which is rather undesirable, methinks (seems everyone's using fuse's ntfs for this same reason [which is stable, however]). Which is not to say stealing the ext3 (journal?) implementation and putting it in our code isn't a better choice, I'm just pointing out there is no good choice right now... Steve From stb at lassitu.de Fri Dec 19 14:23:27 2008 From: stb at lassitu.de (Stefan Bethke) Date: Fri Dec 19 14:23:34 2008 Subject: Booting from ZFS raidz In-Reply-To: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: Am 17.12.2008 um 19:25 schrieb Doug Rabson: > I've been working on adding raidz and raidz2 support to the boot > code and I have a patch which could use some testing if anyone here > is interested. This http://people.freebsd.org/~dfr/raidzboot-17122008.diff > adds support for raidz and raidz2. The easiest way to prepare a > bootable pool is to put a GPT boot partition on each disk that will > make up the raidz pool and install gptzfsboot on the boot partition > of every drive. Not sure I did things the right way, and it doesn't appear to be working correctly. I'm trying this in VMware Fusion, with three SCSI disks, which I configured like this: Updated sources yesterday, then applied the patch and added LOADER_ZFS_SUPPORT?=YES to make.conf, then make buildworld buildkernel. Created a GPT label and one partition on each of the three drives: gpart create -s gpt $1 gpart add -b 34 -s 128 -t freebsd-boot $1 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 gpart add -b 512 -s 41900000 -t freebsd-zfs $1 gpart list $1 (The disks are 20GB each) root@freebsd-current:~# gpart list da3 Geom name: da3 fwheads: 255 fwsectors: 63 last: 41943006 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da3p1 Mediasize: 65536 (64K) Sectorsize: 512 Mode: r0w0e0 rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f label: (null) length: 65536 offset: 17408 type: freebsd-boot index: 1 2. Name: da3p2 Mediasize: 21452800000 (20G) Sectorsize: 512 Mode: r1w1e1 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 21452800000 offset: 262144 type: freebsd-zfs index: 2 Consumers: 1. Name: da3 Mediasize: 21474836480 (20G) Sectorsize: 512 Mode: r1w1e2 Created a raidz pool: # zpool create tank raidz da1p2 da2p2 da3p2 Populated the filesystem with # cd /usr/src && make installworld installkernel distribution DESTDIR=/ tank Added zfs_load="YES" and vfs.root.mountfrom="zfs:tank" to loader.conf When trying to boot, I get a number of "error 4 lba xxx", then "ZFS: i/ o error - all block copies are unavailable". The loader starts up, but cannot load /boot/loader.conf or /boot/device.hints. The LBA blocks are all towards the end of the disks, in the 4294626000 and up range. Booted again from a different disk and ran zpool scrub; waited for that to complete without errors. Next boot try now gives me (transcribed by hand): ZFS: i/o error - all block copies unavailable ZFS: can't read MOS ZFS: unexpected object set type lld ZFS: unexpected object set type lld FreeBSD/i386 boot Default: tank:/boot/kernel/kernel boot: ZFS: unexpected object set type lld FreeBSD/i386 boot Default: tank:/boot/kernel/kernel boot: Booting again from a different disk, running zpool status reveals no errors. Running scrub again, then next boot try. root@freebsd-current:~# zpool scrub tank root@freebsd-current:~# zpool status pool: tank state: ONLINE scrub: scrub in progress for 0h0m, 11.18% done, 0h0m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1p2 ONLINE 0 0 0 da2p2 ONLINE 0 0 0 da3p2 ONLINE 0 0 0 errors: No known data errors root@freebsd-current:~# zpool status pool: tank state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Fri Dec 19 22:40:18 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1p2 ONLINE 0 0 0 da2p2 ONLINE 0 0 0 da3p2 ONLINE 0 0 0 errors: No known data errors On the third boot try, same errors as on the second one. Stefan -- Stefan Bethke Fon +49 170 346 0140 From yanefbsd at gmail.com Fri Dec 19 19:40:04 2008 From: yanefbsd at gmail.com (Garrett Cooper) Date: Fri Dec 19 19:40:11 2008 Subject: bin/129760: after 'umount -f' of a stale NFS share FreeBSD locks up Message-ID: <200812200340.mBK3e3Fm039991@freefall.freebsd.org> The following reply was made to PR kern/129760; it has been noted by GNATS. From: "Garrett Cooper" To: "Eugene M. Zheganin" Cc: freebsd-gnats-submit@freebsd.org Subject: Re: bin/129760: after 'umount -f' of a stale NFS share FreeBSD locks up Date: Fri, 19 Dec 2008 19:31:38 -0800 This has been an outstanding issue with FreeBSD that's only been fixed recently in OSX. Maybe it deserves a backport? -Garrett From dfr at rabson.org Sat Dec 20 06:23:10 2008 From: dfr at rabson.org (Doug Rabson) Date: Sat Dec 20 06:23:18 2008 Subject: Booting from ZFS raidz In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: <2F0DF92C-4240-48D4-9A5F-8B826D6D6E95@rabson.org> On 19 Dec 2008, at 21:46, Stefan Bethke wrote: > Am 17.12.2008 um 19:25 schrieb Doug Rabson: > >> I've been working on adding raidz and raidz2 support to the boot >> code and I have a patch which could use some testing if anyone here >> is interested. This http://people.freebsd.org/~dfr/raidzboot-17122008.diff >> adds support for raidz and raidz2. The easiest way to prepare a >> bootable pool is to put a GPT boot partition on each disk that will >> make up the raidz pool and install gptzfsboot on the boot partition >> of every drive. > > Not sure I did things the right way, and it doesn't appear to be > working correctly. I'm trying this in VMware Fusion, with three SCSI > disks, which I configured like this: > > Updated sources yesterday, then applied the patch and added > LOADER_ZFS_SUPPORT?=YES to make.conf, then make buildworld > buildkernel. > > Created a GPT label and one partition on each of the three drives: > > gpart create -s gpt $1 > gpart add -b 34 -s 128 -t freebsd-boot $1 > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 > gpart add -b 512 -s 41900000 -t freebsd-zfs $1 > gpart list $1 > > (The disks are 20GB each) > > root@freebsd-current:~# gpart list da3 > ... > > Created a raidz pool: > # zpool create tank raidz da1p2 da2p2 da3p2 > > Populated the filesystem with > # cd /usr/src && make installworld installkernel distribution > DESTDIR=/tank > > Added zfs_load="YES" and vfs.root.mountfrom="zfs:tank" to loader.conf > > > When trying to boot, I get a number of "error 4 lba xxx", then "ZFS: > i/o error - all block copies are unavailable". The loader starts up, > but cannot load /boot/loader.conf or /boot/device.hints. The LBA > blocks are all towards the end of the disks, in the 4294626000 and > up range. I did my testing in vmware too with a slightly different configuration (4x2G virtual disks in various arrangements). I just tried to reproduce your exact sequence of steps and it worked fine up to the mountroot prompt. I don't think ZFS likes having the root filesystam at the root of the pool. A few things to check: 1. Are you absolutely sure you are using gptzfsboot built with the patch - the steps you list above show you building it but not installing it on the system which is initialising the pool. 2. Do you have the changes from r186243? This might cause something like your problem - there was an overflow in the code which looked up a ZFS object from an inode number. 3. I'm a little confused as to how you are getting LBA numbers above 4G - a 20G virtual disk should only have 40 million 512 byte blocks. From stb at lassitu.de Sat Dec 20 07:01:28 2008 From: stb at lassitu.de (Stefan Bethke) Date: Sat Dec 20 07:01:35 2008 Subject: Booting from ZFS raidz In-Reply-To: <2F0DF92C-4240-48D4-9A5F-8B826D6D6E95@rabson.org> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <2F0DF92C-4240-48D4-9A5F-8B826D6D6E95@rabson.org> Message-ID: <87E89284-D3BF-4A5A-B6F7-C30709A3F2D9@lassitu.de> Am 20.12.2008 um 15:23 schrieb Doug Rabson: > On 19 Dec 2008, at 21:46, Stefan Bethke wrote: > >> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 > 1. Are you absolutely sure you are using gptzfsboot built with the > patch - the steps you list above show you building it but not > installing it on the system which is initialising the pool. Ugh, sorry. That is in fact the old version from before the patch. I will try again tonight, with updated sources and the right gptzfsboot. Thanks, Stefan -- Stefan Bethke Fon +49 170 346 0140 From dfr at rabson.org Sat Dec 20 07:06:30 2008 From: dfr at rabson.org (Doug Rabson) Date: Sat Dec 20 07:06:36 2008 Subject: Booting from ZFS raidz In-Reply-To: <87E89284-D3BF-4A5A-B6F7-C30709A3F2D9@lassitu.de> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <2F0DF92C-4240-48D4-9A5F-8B826D6D6E95@rabson.org> <87E89284-D3BF-4A5A-B6F7-C30709A3F2D9@lassitu.de> Message-ID: <4AC3BEB2-B47E-4280-85E1-C72891412D09@rabson.org> On 20 Dec 2008, at 15:01, Stefan Bethke wrote: > Am 20.12.2008 um 15:23 schrieb Doug Rabson: > >> On 19 Dec 2008, at 21:46, Stefan Bethke wrote: >> >>> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 > >> 1. Are you absolutely sure you are using gptzfsboot built with the >> patch - the steps you list above show you building it but not >> installing it on the system which is initialising the pool. > > Ugh, sorry. That is in fact the old version from before the patch. I > will try again tonight, with updated sources and the right gptzfsboot. You should be able to re-install gptzfsboot without changing anything else using something like: # dd if=/boot/gptzfsboot of=dap1 conv=osync From gerryw at compvia.com Sun Dec 21 23:29:57 2008 From: gerryw at compvia.com (Gerry Weaver) Date: Sun Dec 21 23:30:04 2008 Subject: Headers files included by vnode.h Message-ID: <20081222065954.8a0760ac@mail01.compvia.com> Hello All, I hope this is the right place to post this. I've noticed that there are several header files included by /usr/include/sys/vnode.h that are not present in the directory. Are these files supposed to be there? If not, what is the proper include path to use when including vnode.h? They only appear in the source tree on my system. FreeBSD 7.0-RELEASE vnode_if.h vnode_if_newproto.h vnode_if_typedef.h Thanks, Gerry From arnaud.houdelette at tzim.net Mon Dec 22 02:30:01 2008 From: arnaud.houdelette at tzim.net (Arnaud Houdelette) Date: Mon Dec 22 02:30:08 2008 Subject: Booting from ZFS raidz In-Reply-To: <4AC3BEB2-B47E-4280-85E1-C72891412D09@rabson.org> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <2F0DF92C-4240-48D4-9A5F-8B826D6D6E95@rabson.org> <87E89284-D3BF-4A5A-B6F7-C30709A3F2D9@lassitu.de> <4AC3BEB2-B47E-4280-85E1-C72891412D09@rabson.org> Message-ID: <494F6C21.2000801@tzim.net> As I'm fairly interrested in this kind of setup, I set up a virtual machine (VirtualBox) with 3 HD. Sources are from a fresh current (csup yesterday). Applied your patch successfully. Done a make installworld / installkernel to the zfs root. Applied the bootcode as Stephan. The seem's the loader gets loaded, but it cant proceed further. I got those kind of errors : ZFS: i/o error - all block copies unavailable ZFS: can't read MOS object directory Can't find root filesystem - giving up Then I get to the loader prompt. ls give same errors. show lists (partial) : currdev=zfs0 loaddev=disk1a lsdev lists : cd devices: disk devices : disk0: BIOS drive A: disk1: BIOS drive C: disk1p1: FreeBSD boot disk1p2: FreeBSD swap disk1p3: FreeBSD ZFS pxe devices: zfs devices: zfs0: ztboot ztboot is the name of my pool. but other physical disks aren't shown... Seem's the loader can't read the ZFS raidz pool. The loader as been built with LOADER_ZFS_SUPPORT option on /etc/make.conf Anything I could get wrong ? Arnaud Houdelette Doug Rabson a ?crit : >
> On 20 Dec 2008, at 15:01, Stefan Bethke wrote: > >> Am 20.12.2008 um 15:23 schrieb Doug Rabson: >> >>> On 19 Dec 2008, at 21:46, Stefan Bethke wrote: >>> >>>> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 >> >>> 1. Are you absolutely sure you are using gptzfsboot built with the >>> patch - the steps you list above show you building it but not >>> installing it on the system which is initialising the pool. >> >> Ugh, sorry. That is in fact the old version from before the patch. I >> will try again tonight, with updated sources and the right gptzfsboot. > > You should be able to re-install gptzfsboot without changing anything > else using something like: > > # dd if=/boot/gptzfsboot of=dap1 conv=osync > > > >
From bugmaster at FreeBSD.org Mon Dec 22 03:06:51 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Dec 22 03:07:55 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200812221106.mBMB6oMr060557@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs][zfs][panic] NFS v3 Panic when under high load ex o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 29 problems total. From gary.jennejohn at freenet.de Mon Dec 22 05:53:59 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Mon Dec 22 05:54:06 2008 Subject: Headers files included by vnode.h In-Reply-To: <20081222065954.8a0760ac@mail01.compvia.com> References: <20081222065954.8a0760ac@mail01.compvia.com> Message-ID: <20081222145356.56c91d0b@ernst.jennejohn.org> On Mon, 22 Dec 2008 00:59:54 -0600 "Gerry Weaver" wrote: > Hello All, > > I hope this is the right place to post this. > > I've noticed that there are several header files included by /usr/include/sys/vnode.h that are not present in the directory. Are these files supposed to be there? If not, what is the proper include path to use when including vnode.h? They only appear in the source tree on my system. > > FreeBSD 7.0-RELEASE > > vnode_if.h > vnode_if_newproto.h > vnode_if_typedef.h > These files (among others) are dynamically generated when you make a kernel. See /sys/kern/vnode_if.src and /sys/tools/vnode_if.awk. --- Gary Jennejohn From gerryw at compvia.com Mon Dec 22 08:59:03 2008 From: gerryw at compvia.com (Gerry Weaver) Date: Mon Dec 22 08:59:10 2008 Subject: Headers files included by vnode.h In-Reply-To: 20081222145356.56c91d0b@ernst.jennejohn.org Message-ID: <20081222165858.be1f1fea@mail01.compvia.com> Hi, Shouldn't these headers be installed/linked as part of the kernel make install process then? It seems odd to use an include path to the kernel source tree. Thanks, Gerry _____ From: Gary Jennejohn [mailto:gary.jennejohn@freenet.de] To: Gerry Weaver [mailto:gerryw@compvia.com] Cc: freebsd-fs@freebsd.org Sent: Mon, 22 Dec 2008 07:53:56 -0600 Subject: Re: Headers files included by vnode.h On Mon, 22 Dec 2008 00:59:54 -0600 "Gerry Weaver" wrote: > Hello All, > > I hope this is the right place to post this. > > I've noticed that there are several header files included by /usr/include/sys/vnode.h that are not present in the directory. Are these files supposed to be there? If not, what is the proper include path to use when including vnode.h? They only appear in the source tree on my system. > > FreeBSD 7.0-RELEASE > > vnode_if.h > vnode_if_newproto.h > vnode_if_typedef.h > These files (among others) are dynamically generated when you make a kernel. See /sys/kern/vnode_if.src and /sys/tools/vnode_if.awk. --- Gary Jennejohn From hselasky at freebsd.org Mon Dec 22 09:36:03 2008 From: hselasky at freebsd.org (Hans Petter Selasky) Date: Mon Dec 22 09:36:34 2008 Subject: Small patch for vfs_mount.c needs review In-Reply-To: <20081222121725.GN18389@elvis.mu.org> References: <20081221112157.GU18389@elvis.mu.org> <200812221125.58857.hselasky@c2i.net> <20081222121725.GN18389@elvis.mu.org> Message-ID: <200812221738.16396.hselasky@freebsd.org> Hi, Here is a small patch for vfs_mount.c which irons out some problems with USB2 and booting from a memory stick. This patch is just a workaround. I think that an event driven model would be better where the OS is allowed to boot at the moment the root partition disk is plugged in. Typically the USB memory stick will appear within 2 seconds of waiting. The patch has been tested with FreeSBIE. I hope that the mailing list software did not strip away the attached patch. --HPS -------------- next part -------------- A non-text attachment was scrubbed... Name: vfs_mount.c.diff Type: text/x-diff Size: 1458 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081222/309d0193/vfs_mount.c.bin From gary.jennejohn at freenet.de Mon Dec 22 10:16:55 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Mon Dec 22 10:17:01 2008 Subject: Headers files included by vnode.h In-Reply-To: <20081222165858.be1f1fea@mail01.compvia.com> References: <20081222165858.be1f1fea@mail01.compvia.com> Message-ID: <20081222191651.051cb2b6@ernst.jennejohn.org> On Mon, 22 Dec 2008 10:58:58 -0600 "Gerry Weaver" wrote: > From: Gary Jennejohn [mailto:gary.jennejohn@freenet.de] > > On Mon, 22 Dec 2008 00:59:54 -0600 > "Gerry Weaver" wrote: > > > Hello All, > > > > I hope this is the right place to post this. > > > > I've noticed that there are several header files included by /usr/include/sys/vnode.h that are not present in the directory. Are these files supposed to be there? If not, what is the proper include path to use when including vnode.h? They only appear in the source tree on my system. > > > > FreeBSD 7.0-RELEASE > > > > vnode_if.h > > vnode_if_newproto.h > > vnode_if_typedef.h > > > > These files (among others) are dynamically generated when you make a > kernel. See /sys/kern/vnode_if.src and /sys/tools/vnode_if.awk. > > Shouldn't these headers be installed/linked as part of the kernel make > install process then? It seems odd to use an include path to the kernel > source tree. > Please don't top post and try to wrap your lines. Because these files are dynamically generated it makes no sense to install them. There are quite a few files like these which are used during the kernel generation process to dynamically create include files. This allows greater flexibility. --- Gary Jennejohn From gerryw at compvia.com Mon Dec 22 11:44:25 2008 From: gerryw at compvia.com (Gerry Weaver) Date: Mon Dec 22 11:44:31 2008 Subject: Headers files included by vnode.h In-Reply-To: 20081222191651.051cb2b6@ernst.jennejohn.org Message-ID: <20081222194420.98abb8bb@mail01.compvia.com> _____ From: Gary Jennejohn [mailto:gary.jennejohn@freenet.de] To: Gerry Weaver [mailto:gerryw@compvia.com] Cc: freebsd-fs@freebsd.org Sent: Mon, 22 Dec 2008 12:16:51 -0600 Subject: Re: Headers files included by vnode.h On Mon, 22 Dec 2008 10:58:58 -0600 "Gerry Weaver" wrote: > From: Gary Jennejohn [mailto:gary.jennejohn@freenet.de] > > On Mon, 22 Dec 2008 00:59:54 -0600 > "Gerry Weaver" wrote: > > > Hello All, > > > > I hope this is the right place to post this. > > > > I've noticed that there are several header files included by /usr/include/sys/vnode.h that are not present in the directory. Are these files supposed to be there? If not, what is the proper include path to use when including vnode.h? They only appear in the source tree on my system. > > > > FreeBSD 7.0-RELEASE > > > > vnode_if.h > > vnode_if_newproto.h > > vnode_if_typedef.h > > > > These files (among others) are dynamically generated when you make a > kernel. See /sys/kern/vnode_if.src and /sys/tools/vnode_if.awk. > > Shouldn't these headers be installed/linked as part of the kernel make > install process then? It seems odd to use an include path to the kernel > source tree. > Please don't top post and try to wrap your lines. Because these files are dynamically generated it makes no sense to install them. There are quite a few files like these which are used during the kernel generation process to dynamically create include files. This allows greater flexibility. --- Gary JennejohnHi, Thanks Gary. I appreciate your help. Thanks, Gerry From rwatson at FreeBSD.org Tue Dec 23 03:41:14 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Tue Dec 23 03:41:20 2008 Subject: Headers files included by vnode.h In-Reply-To: <20081222194420.98abb8bb@mail01.compvia.com> References: <20081222194420.98abb8bb@mail01.compvia.com> Message-ID: On Mon, 22 Dec 2008, Gerry Weaver wrote: > Because these files are dynamically generated it makes no sense to > install them. There are quite a few files like these which are used > during the kernel generation process to dynamically create include > files. This allows greater flexibility. > > Thanks Gary. I appreciate your help. The usual question here is "what are you trying to do?". Normally we dynamically generate the implementations of the VOP interfaces when the kernel (or a module built independently from the kernel) depends on them. If you're creating a kernel module, you can add a dependency on vnode_if.h, which will cause the kernel module build framework to generate local copies of the file in your build directory during the build, as seen in the Coda module: # $FreeBSD: src/sys/modules/coda/Makefile,v 1.17.2.1 2008/03/14 17:12:40 rwatson Exp $ .PATH: ${.CURDIR}/../../fs/coda KMOD= coda SRCS= vnode_if.h \ coda_fbsd.c coda_psdev.c coda_subr.c coda_venus.c coda_vfsops.c \ coda_vnops.c opt_coda.h .include If it's for the purposes of debugging a kernel, you should be able to find the generated copies of the files in the build directory for the kernel. Robert N M Watson Computer Laboratory University of Cambridge From gerryw at compvia.com Tue Dec 23 12:16:28 2008 From: gerryw at compvia.com (Gerry Weaver) Date: Tue Dec 23 12:16:35 2008 Subject: Headers files included by vnode.h In-Reply-To: alpine.BSF.1.10.0812231138272.90302@fledge.watson.org Message-ID: <20081223201626.caa483ad@mail01.compvia.com> _____ From: Robert Watson [mailto:rwatson@FreeBSD.org] To: Gerry Weaver [mailto:gerryw@compvia.com] Cc: gary.jennejohn@freenet.de, freebsd-fs@freebsd.org Sent: Tue, 23 Dec 2008 05:41:13 -0600 Subject: Re: Headers files included by vnode.h On Mon, 22 Dec 2008, Gerry Weaver wrote: > Because these files are dynamically generated it makes no sense to > install them. There are quite a few files like these which are used > during the kernel generation process to dynamically create include > files. This allows greater flexibility. > > Thanks Gary. I appreciate your help. The usual question here is "what are you trying to do?". Normally we dynamically generate the implementations of the VOP interfaces when the kernel (or a module built independently from the kernel) depends on them. If you're creating a kernel module, you can add a dependency on vnode_if.h, which will cause the kernel module build framework to generate local copies of the file in your build directory during the build, as seen in the Coda module: # $FreeBSD: src/sys/modules/coda/Makefile,v 1.17.2.1 2008/03/14 17:12:40 rwatson Exp $ .PATH: ${.CURDIR}/../../fs/coda KMOD= coda SRCS= vnode_if.h \ coda_fbsd.c coda_psdev.c coda_subr.c coda_venus.c coda_vfsops.c \ coda_vnops.c opt_coda.h .include If it's for the purposes of debugging a kernel, you should be able to find the generated copies of the files in the build directory for the kernel. Robert N M Watson Computer Laboratory University of CambridgeHi Robert, Perfect. Yes, I am building a kernel module. This is exactly what I was looking for. I really appreciate your help. I am also trying to figure out a zero copy approach to kernel memory access from user space. Would you happen to know which list I should post the question to? Have a great holiday all. Thanks, Gerry From matt at corp.spry.com Tue Dec 23 12:43:50 2008 From: matt at corp.spry.com (Matt Simerson) Date: Tue Dec 23 12:44:00 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <494AE6F4.30506@modulus.org> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> <494AE6F4.30506@modulus.org> Message-ID: <1424BEB3-69FE-4BA2-884F-4862B3D7BCFD@corp.spry.com> On Dec 18, 2008, at 4:12 PM, Andrew Snow wrote: >> If so, then I really should be upgrading my production ZFS servers >> to the latest -HEAD. > > Thats correct, that is the only way to get the best working version > of ZFS. Of course, then everything is unstable and broken - eg. > SMBFS became unusable for me and would crash the server. I got bit by the ARP bug (kern/129730) which really annoyed our network manager. After applying a patch for that, I've got a working kernel and upgraded ZFS. Unfortunately, the newer kernel hangs much more frequently. Previously I was getting nearly a month between reboots. Now I don't even get 1 day between hangs. Worse yet, I upgraded the ZFS version of the pools, to see if that would make any difference. It did not, and now I can't revert. :-( I have these settings in /boot/loader.conf vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.arc_max="100M" I have also experimented with vfs.zfs.prefetch_disable, vfs.zfs.arc_min in the past, and I'm open to suggestions on what might help under this workload (multiple concurrent rsync processes from remote systems to this one). Matt Specs: SuperMicro 24 disk, 8-core Xeon, 16GB RAM, 2 x 12 disk HW RAID striped (RAID 0). back01# uname -a FreeBSD back01.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Fri Dec 19 15:37:12 PST 2008 root@back01.int.spry.com:/usr/obj/usr/src/ sys/BACK01 amd64 back01# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT back01 18.1T 16.5T 1.60T 91% ONLINE - back01# zpool status pool: back01 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM back01 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 errors: No known data errors back01# zpool upgrade This system is currently running ZFS pool version 13. All pools are formatted using this version. From mcdouga9 at egr.msu.edu Tue Dec 23 20:07:42 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Tue Dec 23 20:07:53 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <1424BEB3-69FE-4BA2-884F-4862B3D7BCFD@corp.spry.com> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> <494AE6F4.30506@modulus.org> <1424BEB3-69FE-4BA2-884F-4862B3D7BCFD@corp.spry.com> Message-ID: <20081224034812.GR87625@egr.msu.edu> On Tue, Dec 23, 2008 at 12:43:47PM -0800, Matt Simerson wrote: On Dec 18, 2008, at 4:12 PM, Andrew Snow wrote: >> If so, then I really should be upgrading my production ZFS servers >> to the latest -HEAD. > > Thats correct, that is the only way to get the best working version > of ZFS. Of course, then everything is unstable and broken - eg. > SMBFS became unusable for me and would crash the server. I got bit by the ARP bug (kern/129730) which really annoyed our network manager. After applying a patch for that, I've got a working kernel and upgraded ZFS. Unfortunately, the newer kernel hangs much more frequently. Previously I was getting nearly a month between reboots. Now I don't even get 1 day between hangs. Worse yet, I upgraded the ZFS version of the pools, to see if that would make any difference. It did not, and now I can't revert. :-( I have these settings in /boot/loader.conf vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.arc_max="100M" I have also experimented with vfs.zfs.prefetch_disable, vfs.zfs.arc_min in the past, and I'm open to suggestions on what might help under this workload (multiple concurrent rsync processes from remote systems to this one). Matt Specs: SuperMicro 24 disk, 8-core Xeon, 16GB RAM, 2 x 12 disk HW RAID striped (RAID 0). back01# uname -a FreeBSD back01.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Fri Dec 19 15:37:12 PST 2008 root@back01.int.spry.com:/usr/obj/usr/src/ sys/BACK01 amd64 Can you try: vm.kmem_size=2G vm.kmem_size_max=2G vfs.zfs.arc_max=512M This has been working for me on one amd64 system that only has 2G of ram but had similar problem frequency to yours. I don't know if its coincidence with the data that I am rsyncing lately, but: 10:47PM up 22 days, 7:12 From matt at corp.spry.com Wed Dec 24 22:35:01 2008 From: matt at corp.spry.com (Matt Simerson) Date: Wed Dec 24 22:35:07 2008 Subject: ZFS performance gains real or imaginary? In-Reply-To: <20081225052903.GC87625@egr.msu.edu> References: <22C8092E-210F-4E91-AA09-CFD38966975C@spry.com> <494AE6F4.30506@modulus.org> <1424BEB3-69FE-4BA2-884F-4862B3D7BCFD@corp.spry.com> <20081224034812.GR87625@egr.msu.edu> <20081225052903.GC87625@egr.msu.edu> Message-ID: <5C120CEB-6CEB-4722-BB23-7E4B83F779C2@corp.spry.com> On Dec 24, 2008, at 9:29 PM, Adam McDougall wrote: >> On Wed, Dec 24, 2008 at 01:00:14PM -0800, Matt Simerson wrote: >> >> On Dec 23, 2008, at 7:48 PM, Adam McDougall wrote: >> >>>> On Tue, Dec 23, 2008 at 12:43:47PM -0800, Matt Simerson wrote: >>>> >>>>> On Dec 18, 2008, at 4:12 PM, Andrew Snow wrote: >>>>> >>>>>> If so, then I really should be upgrading my production ZFS >>>>>> servers >>>>>> to the latest -HEAD. >>>>> >>>>> Thats correct, that is the only way to get the best working >>>>> version >>>>> of ZFS. Of course, then everything is unstable and broken - eg. >>>>> SMBFS became unusable for me and would crash the server. >>>> >>>> Unfortunately, the newer kernel hangs much more frequently. >>>> >>>> I have these settings in /boot/loader.conf >>>> >>>> vm.kmem_size="1536M" >>>> vm.kmem_size_max="1536M" >>>> vfs.zfs.arc_max="100M" >>>> >>>> I have also experimented with vfs.zfs.prefetch_disable, >>>> vfs.zfs.arc_min in the past, and I'm open to suggestions on what >>>> might help under this workload (multiple concurrent rsync >>>> processes from remote systems to this one). >>> >>> Can you try: >>> >>> vm.kmem_size=2G >>> vm.kmem_size_max=2G >>> vfs.zfs.arc_max=512M >>> >>> This has been working for me on one amd64 system that only >>> has 2G of ram but had similar problem frequency to yours. I >>> don't know if its coincidence with the data that I am rsyncing >>> lately, but: 10:47PM up 22 days, 7:12 >> >> I made it 23 minutes. I've reduced my rsync concurrency to 1, so I'm >> not hitting the system nearly as hard but it seems not to matter. >> >> Other workloads, like a 'make buildworld' will complete with no >> problems. For whatever reason, rsync sessions of entire unix systems >> to my backup servers are very troublesome. >> >> Matt > > Ok. Since you have 16G of ram, I suppose you could try setting both > kmem > sizes to something like 8G to see if it makes a difference? I'm > getting > a feeling that even if we don't see an outright failure, it might be > deadlocking due to a kmem shortage. back01# w 10:17PM up 40 mins, 2 users, load averages: 4.20, 3.07, 1.74 This is with: vm.kmem_size="4G" vm.kmem_size_max="4G" vfs.zfs.arc_max="512M" I'll let it trundle along with that setting and see how long it lasts. Matt PS: These settings earlier today resulted in 12+ hours of uptime, until I rebooted to test raising kmem_size to 4G. vm.kmem_size="2G" vm.kmem_size_max="2G" vfs.zfs.arc_max="512M" vfs.zfs.zil_disable=1 vfs.zfs.prefetch_disable=1 PPS: If/when it hangs with 4G, I'll raise it again to 6 or 8 GB and see how long it lasts. Whatever pattern emerges might be useful for Pawel. From morganw at chemikals.org Fri Dec 26 19:22:15 2008 From: morganw at chemikals.org (Wes Morgan) Date: Fri Dec 26 19:22:21 2008 Subject: zpool resilver restarting In-Reply-To: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: I just did a zpool replace on a new drive, and now it's resilvering. Only, when it gets about 20mb resilvered it restarts. I can see all the drive activity simply halting for a period then resuming in gstat. I see some bugs in the opensolaris tracker about this, but no resolutions. It doesn't seem to be related to calling "zpool status" because I can watch gstat and see it restarting... Anyone seen this before, and hopefully have a workaround...? The pool lost a drive on Wednesday and was running with a device missing, however due to the device numbering changing on the scsi bus, I had to export/import the pool to get it to come up, the same for after replacing it. [morganw@volatile:~$]: zpool status media (hangs a bit) pool: media state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h1m, 0.01% done, 368h10m to go config: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 da2 ONLINE 0 0 0 20.3M resilvered da6 ONLINE 0 0 0 20.3M resilvered da0 ONLINE 0 0 0 20.3M resilvered da7 ONLINE 0 0 0 17.0M resilvered da1 ONLINE 0 0 0 20.2M resilvered da3 ONLINE 0 0 0 20.2M resilvered da5 ONLINE 0 0 0 19.2M resilvered replacing DEGRADED 0 0 0 17628927049345412941 UNAVAIL 3 3.91K 0 was /dev/da4 da4 ONLINE 0 0 0 22.6M resilvered errors: No known data errors [morganw@volatile:~$]: zpool status media pool: media state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.00% done, 766h21m to go config: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 da2 ONLINE 0 0 0 738K resilvered da6 ONLINE 0 0 0 742K resilvered da0 ONLINE 0 0 0 739K resilvered da7 ONLINE 0 0 0 563K resilvered da1 ONLINE 0 0 0 732K resilvered da3 ONLINE 0 0 0 734K resilvered da5 ONLINE 0 0 0 738K resilvered replacing DEGRADED 0 0 0 17628927049345412941 UNAVAIL 3 4.67K 0 was /dev/da4 da4 ONLINE 0 0 0 848K resilvered errors: No known data errors From morganw at chemikals.org Fri Dec 26 19:57:08 2008 From: morganw at chemikals.org (Wes Morgan) Date: Fri Dec 26 19:57:14 2008 Subject: zpool resilver restarting In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: On Fri, 26 Dec 2008, Wes Morgan wrote: > I just did a zpool replace on a new drive, and now it's resilvering. > Only, when it gets about 20mb resilvered it restarts. I can see all the drive > activity simply halting for a period then resuming in gstat. I see some bugs > in the opensolaris tracker about this, but no resolutions. It doesn't seem to > be related to calling "zpool status" because I can watch gstat and see it > restarting... Anyone seen this before, and hopefully have a workaround...? > > The pool lost a drive on Wednesday and was running with a device missing, > however due to the device numbering changing on the scsi bus, I had to > export/import the pool to get it to come up, the same for after replacing it. Replying to myself with some more information. zpool history -l -i shows the scrub loop happening: 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user root on volatile] 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 maxtxg=6463720 [user root on volatile] 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user root on volatile] 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 maxtxg=6463720 [user root on volatile] 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user root on volatile] 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 maxtxg=6463720 [user root on volatile] 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user root on volatile] 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 maxtxg=6463720 [user root on volatile] From morganw at chemikals.org Sat Dec 27 11:59:59 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sat Dec 27 12:00:06 2008 Subject: zpool devices "stuck" (was zpool resilver restarting) In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: On Fri, 26 Dec 2008, Wes Morgan wrote: > On Fri, 26 Dec 2008, Wes Morgan wrote: > >> I just did a zpool replace on a new drive, and now it's resilvering. >> Only, when it gets about 20mb resilvered it restarts. I can see all the >> drive activity simply halting for a period then resuming in gstat. I see >> some bugs in the opensolaris tracker about this, but no resolutions. It >> doesn't seem to be related to calling "zpool status" because I can watch >> gstat and see it restarting... Anyone seen this before, and hopefully have >> a workaround...? >> >> The pool lost a drive on Wednesday and was running with a device missing, >> however due to the device numbering changing on the scsi bus, I had to >> export/import the pool to get it to come up, the same for after replacing >> it. > > Replying to myself with some more information. zpool history -l -i shows the > scrub loop happening: > > 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user > root on volatile] > 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user > root on volatile] > 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user > root on volatile] > 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user > root on volatile] > 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] It seems that the resliver and drive replacement were "fighting" each other somehow. Detaching the new drive allowed the resilver to complete, but now I'm stuck with two nonexistent devices trying to replace each other, and I can't replace a device that is being replaced: replacing UNAVAIL 0 36.4K 0 insufficient replicas 17628927049345412941 FAULTED 0 0 0 was /dev/da4 5474360425105728553 FAULTED 0 0 0 was /dev/da4 errors: No known data errors So, how the heck do I cancel that replacement and restart it using /dev/da4? From morganw at chemikals.org Sun Dec 28 22:38:54 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sun Dec 28 22:39:00 2008 Subject: zpool devices "stuck" (was zpool resilver restarting) In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: On Sat, 27 Dec 2008, Wes Morgan wrote: > On Fri, 26 Dec 2008, Wes Morgan wrote: > >> On Fri, 26 Dec 2008, Wes Morgan wrote: >> >>> I just did a zpool replace on a new drive, and now it's resilvering. >>> Only, when it gets about 20mb resilvered it restarts. I can see all the >>> drive activity simply halting for a period then resuming in gstat. I see >>> some bugs in the opensolaris tracker about this, but no resolutions. It >>> doesn't seem to be related to calling "zpool status" because I can watch >>> gstat and see it restarting... Anyone seen this before, and hopefully have >>> a workaround...? >>> >>> The pool lost a drive on Wednesday and was running with a device missing, >>> however due to the device numbering changing on the scsi bus, I had to >>> export/import the pool to get it to come up, the same for after replacing >>> it. >> >> Replying to myself with some more information. zpool history -l -i shows >> the scrub loop happening: >> >> 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user >> root on volatile] >> 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user >> root on volatile] >> 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user >> root on volatile] >> 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user >> root on volatile] >> 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] > > > It seems that the resliver and drive replacement were "fighting" each other > somehow. Detaching the new drive allowed the resilver to complete, but now > I'm stuck with two nonexistent devices trying to replace each other, and I > can't replace a device that is being replaced: > > replacing UNAVAIL 0 36.4K 0 insufficient > replicas > 17628927049345412941 FAULTED 0 0 0 was /dev/da4 > 5474360425105728553 FAULTED 0 0 0 was /dev/da4 > > errors: No known data errors > > So, how the heck do I cancel that replacement and restart it using /dev/da4? Ok, dear sweet mercy, I think I've dug myself out of the huge hole. I found a bug in the opensolaris tracker that is basically the same as my issue: http://bugs.opensolaris.org/view_bug.do?bug_id=6782540 So, I spent most of the weekend trying to figure out how to repair the damage. I ended up re-creating the actual zfs disk label for the 547xxx device and dumping that onto the drive. After some trouble with checksums, the system came back to life a few hours ago and I thought I was out of the woods when the resilver started up. However, I was not... I had simply got myself back into the resilver loop that I could not stop. Back to the drawing board... Using gvirstor, created a 500gb volume (with only 100gb available to back it), dumped the label of the 176xxxx device onto it, export/import and then the resilver starts back up. Checking gstat showed that the true device was not being written to at all, so I realized that it was going to try to resliver the 176 device first before doing the replacement. Not good... After some more floundering, I discovered that I could "zpool detach" the virstor volume, leaving me with only real devices in the pool. Except now it did not want to do a complete and true resilver, only resilvering a tiny bit of data, about 20mb or something. My wild guess is that it might have something to do with tgx id's and how the resilver tries to only do the data that is "new". Since there is no way (that I know of) to force a resilver with zpool, I simply started scrubbing the array. This would probably have worked, but it was going to take far too long, and was simply throwing up millions of checksum errors on the new drive. So I cancelled the scrub and figured I could just offline the drive and replace it with itself... Nope, no dice, it was reported as "busy". However, after mucking around with the label some more, I was able to finally get the drive to replace itself and start resilvering. Hopefully it will finish successfully. I'm still not sure what went wrong. Part of what happened seems to be related to scsi devices not being wired down like atapi devices, so successive reboots replaced "offline" devices with "faulted", and the pool kept trying to write to them, just generating more errors. Do the folks on the opensolaris zfs-discuss take reports from FreeBSD users, or do they just toss it back at you? I did actually boot an opensolaris live cd at one point, but it couldn't match the vdevs with devices well enough to import the pool. I don't think it would have handled it properly anyway, given the bug I found in their database. Hope no one ever has to deal with this themselves! Whew... From gary.jennejohn at freenet.de Mon Dec 29 03:02:01 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Mon Dec 29 03:02:20 2008 Subject: zpool devices "stuck" (was zpool resilver restarting) In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: <20081229120155.224a34b6@ernst.jennejohn.org> On Mon, 29 Dec 2008 00:38:49 -0600 (CST) Wes Morgan wrote: > I'm still not sure what went wrong. Part of what happened seems to be > related to scsi devices not being wired down like atapi devices, so > successive reboots replaced "offline" devices with "faulted", and the pool > kept trying to write to them, just generating more errors. > This is probably irrelevant now, but it is possible to wire down SCSI devices in /boot/device.hints. I had this in there when I was still using SCSI: hint.scbus.0.at="ahc0" hint.scbus.0.bus="0" hint.da.0.at="scbus0" hint.da.0.target="8" hint.da.0.unit="0" hint.da.1.at="scbus0" hint.da.1.target="10" hint.da.1.unit="0" hint.da.2.at="scbus0" hint.da.2.target="12" hint.da.2.unit="0" hint.da.3.at="scbus0" hint.da.3.target="14" hint.da.3.unit="0" hint.da.4.at="scbus0" hint.da.4.target="1" hint.da.4.unit="0" --- Gary Jennejohn From bugmaster at FreeBSD.org Mon Dec 29 03:06:54 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Dec 29 03:07:51 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200812291106.mBTB6rib024427@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs][zfs][panic] NFS v3 Panic when under high load ex o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 29 problems total. From bryanalves at gmail.com Tue Dec 30 01:39:04 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Tue Dec 30 01:39:10 2008 Subject: NFS locking problems with 7.0-RELEASE Message-ID: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> I'm running a FreeBSD Server (7.0-RELEASE, latest patchlevel, problem has existed on previous patchlevels). Running an NFS server, with statd and lockd. Client is an Ubuntu 8.10 machine. Of note is that the FreeBSD server (in a home environment) is also running PF and doing the packet filtering for the house. When I export my home directory and mount it on my linux client, I run into all sorts of problems with file locking. The biggest problem is the inability to run firefox. When stracing an execution of firefox, execution hangs when opening it's .parentlock file for F_GETLK. I also notice messages in /var/log/messages on the client on occasion: Dec 29 20:08:01 balves-ubuntu-desktop kernel: [ 5430.560020] lockd: server 192.168.10.1 not responding, still trying Dec 29 20:08:28 balves-ubuntu-desktop kernel: [ 5457.560725] lockd: server 192.168.10.1 OK 192.168.10.1 is the internal address for the FreeBSD server. Nothing related to NFS appears in /var/log/messages on the FreeBSD server. I've made sure to turn off scrubbing for PF on internal interfaces, because of it's problems with NFS. Of note is that things that don't need locks (for example, video playback with some players, music playback, etc), works fine on the nfs mount. I have a device in my living room (a popcorn hour), that connects to the FreeBSD server and streams via NFS without issue. The only problems I've come across occur with file locks. Restarting various services (rpc, statd, lockd, nfsd) on the server doesn't help, neither does remounting. Rebooting doesn't help either. The only thing that makes the mount usable is using samba instead of nfs. This is unfortunate because samba is much slower on my network (> 20 MB/s drop in throughput using samba instead of NFS). Here is my pf.conf for those of you who want to verify that i've turned off scrubbing correctly: ===BEGIN pf.conf=== ext_if = "em1" int_if = "em0" localnet = $int_if:network torrent_ports = "57100:57199" web_ports = "81" vpn_ports = "1723" gateway = "192.168.10.1" httpd_jail = "192.168.10.200" samba_jail = "192.168.10.201" slimserver_jail = "192.168.10.202" torrent_jail = "192.168.10.203" set skip on { lo0 } set loginterface $ext_if #scrub in all scrub in on $ext_if altq on $ext_if bandwidth 4500Kb hfsc queue { q_high, q_med, q_low } queue q_high bandwidth 25% priority 6 qlimit 250 hfsc queue q_med bandwidth 45% priority 4 qlimit 250 hfsc (default) queue q_low bandwidth 30% priority 3 qlimit 250 hfsc nat on $ext_if from $localnet to any -> ($ext_if) #Port Forwards rdr on $ext_if proto tcp from any to any port ssh -> $gateway port ssh rdr on $ext_if proto tcp from any to any port $web_ports -> $httpd_jail port $web_ports rdr on $ext_if proto tcp from any to any port $torrent_ports -> $torrent_jail port $torrent_ports #Nat Reflection rdr on $int_if proto tcp from $localnet to $ext_if port ssh -> $gateway rdr on $int_if proto tcp from $localnet to $ext_if port $web_ports -> $httpd_jail no nat on $int_if proto tcp from $int_if to $localnet nat on $int_if proto tcp from $localnet to $gateway port ssh -> $int_if nat on $int_if proto tcp from $localnet to $httpd_jail port $web_ports -> $int_if antispoof for $ext_if block all #In on ext_if pass in on $ext_if proto tcp from any to any port $web_ports keep state queue (q_high) pass in on $ext_if proto { tcp, udp } from any to $torrent_jail keep state queue (q_low) pass in on $ext_if proto tcp from any to port ssh modulate state queue (q_high) pass in on $ext_if proto gre from any to any keep state queue (q_high) pass in on $ext_if proto tcp from any to any port $vpn_ports keep state queue (q_high) #Out on ext_if pass out on $ext_if proto tcp all modulate state queue (q_med) pass out on $ext_if proto { udp, icmp } all keep state queue (q_med) pass out on $ext_if proto gre all keep state queue (q_high) pass out on $ext_if proto tcp from $torrent_jail to any keep state queue (q_low) #Allow all LAN traffic pass in on $int_if from $localnet to any keep state pass out on $int_if from any to $localnet keep state ===END pf.conf=== I realize that the linux NFS client implementation isn't spectacular, but the same ubuntu setup works when connected to a netapp, which leads me to believe that the problem is with the freebsd nfs server implementation. If anyone can suggest some additional troubleshooting steps to provide some more information, or propose some suggested solutions, it would be appreciated. --Bryan From rmacklem at uoguelph.ca Tue Dec 30 19:36:04 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Dec 30 19:36:35 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> Message-ID: On Mon, 29 Dec 2008, Bryan Alves wrote: [stuff snipped] > When I export my home directory and mount it on my linux client, I run into > all sorts of problems with file locking. The biggest problem is the > inability to run firefox. When stracing an execution of firefox, execution > hangs when opening it's .parentlock file for F_GETLK. I also notice > messages in /var/log/messages on the client on occasion: > I can't help w.r.t. getting the NKM to work (I've always thought that the NLM protocol was a crock and avoided using it). But, here's a couple of things you could try to avoid using the NLM. - Do the Linux mount with the "nolock" option. (If Ubuntu has a "locallock" option, that would be even better, but I'm not sure what options recent Linux's have for nfs mounts.) - Download my server patches (ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD7) and switch to using nfsv4, which has integral locking in the protocol. Have a good holiday, rick From bryanalves at gmail.com Tue Dec 30 20:46:06 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Tue Dec 30 20:46:13 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> Message-ID: <92f477740812301246k7ed77511oc969c22a3b5aad4d@mail.gmail.com> On Tue, Dec 30, 2008 at 2:00 PM, Rick Macklem wrote: > > > On Mon, 29 Dec 2008, Bryan Alves wrote: > > [stuff snipped] > >> When I export my home directory and mount it on my linux client, I run >> into >> all sorts of problems with file locking. The biggest problem is the >> inability to run firefox. When stracing an execution of firefox, >> execution >> hangs when opening it's .parentlock file for F_GETLK. I also notice >> messages in /var/log/messages on the client on occasion: >> >> I can't help w.r.t. getting the NKM to work (I've always thought that the > NLM protocol was a crock and avoided using it). But, here's a couple of > things you could try to avoid using the NLM. > > - Do the Linux mount with the "nolock" option. (If Ubuntu has a > "locallock" option, that would be even better, but I'm not sure what > options recent Linux's have for nfs mounts.) > > - Download my server patches (ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD7) and > switch to using nfsv4, which has integral locking in the protocol. > > Have a good holiday, rick > > Is there another location where I can get the nfs4 patches? That FTP seems to be down. Also, outside the scope of this list, but since the discussion is opened, I might as well ask: If this NFS is the only remote mount that involves writing (it's opened read-only in other locations), and it's read/write locally, is it safe to use local locking? --Bryan From grafan at gmail.com Tue Dec 30 21:13:20 2008 From: grafan at gmail.com (Rong-en Fan) Date: Tue Dec 30 21:13:27 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> Message-ID: <6eb82e0812301247uaf5eb45v529765e29220fd80@mail.gmail.com> On Tue, Dec 30, 2008 at 9:39 AM, Bryan Alves wrote: > I'm running a FreeBSD Server (7.0-RELEASE, latest patchlevel, problem has > existed on previous patchlevels). Running an NFS server, with statd and > lockd. Client is an Ubuntu 8.10 machine. Of note is that the FreeBSD > server (in a home environment) is also running PF and doing the packet > filtering for the house. > > When I export my home directory and mount it on my linux client, I run into > all sorts of problems with file locking. The biggest problem is the > inability to run firefox. When stracing an execution of firefox, execution > hangs when opening it's .parentlock file for F_GETLK. I also notice > messages in /var/log/messages on the client on occasion: > [...] > > I realize that the linux NFS client implementation isn't spectacular, but > the same ubuntu setup works when connected to a netapp, which leads me to > believe that the problem is with the freebsd nfs server implementation. > > If anyone can suggest some additional troubleshooting steps to provide some > more information, or propose some suggested solutions, it would be > appreciated. You may want to upgrade to latest RELENG_7 and use the rewrote lockd in kernel space (w/ NFS_LOCKD in your kernel configuration). Regards, Rong-En Fan From bryanalves at gmail.com Wed Dec 31 05:09:06 2008 From: bryanalves at gmail.com (Bryan Alves) Date: Wed Dec 31 05:09:12 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: <6eb82e0812301247uaf5eb45v529765e29220fd80@mail.gmail.com> References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> <6eb82e0812301247uaf5eb45v529765e29220fd80@mail.gmail.com> Message-ID: <92f477740812302109n78d9f303y5c49b8ca6ab082c5@mail.gmail.com> On Tue, Dec 30, 2008 at 3:47 PM, Rong-en Fan wrote: > On Tue, Dec 30, 2008 at 9:39 AM, Bryan Alves wrote: > > I'm running a FreeBSD Server (7.0-RELEASE, latest patchlevel, problem has > > existed on previous patchlevels). Running an NFS server, with statd and > > lockd. Client is an Ubuntu 8.10 machine. Of note is that the FreeBSD > > server (in a home environment) is also running PF and doing the packet > > filtering for the house. > > > > When I export my home directory and mount it on my linux client, I run > into > > all sorts of problems with file locking. The biggest problem is the > > inability to run firefox. When stracing an execution of firefox, > execution > > hangs when opening it's .parentlock file for F_GETLK. I also notice > > messages in /var/log/messages on the client on occasion: > > > [...] > > > > I realize that the linux NFS client implementation isn't spectacular, but > > the same ubuntu setup works when connected to a netapp, which leads me to > > believe that the problem is with the freebsd nfs server implementation. > > > > If anyone can suggest some additional troubleshooting steps to provide > some > > more information, or propose some suggested solutions, it would be > > appreciated. > > You may want to upgrade to latest RELENG_7 and use > the rewrote lockd in kernel space (w/ NFS_LOCKD in your > kernel configuration). > > Regards, > Rong-En Fan Where can I find more information about in kernel NFS_LOCKD? It doesn't seem to exist on google at all, and I'm hesistant to upgrade from RELEASE without doing due dilligence in terms of research. Is rpc.lockd_enable still required in rc.conf when using this? Do I need to do anything else besides update to RELENG_7 and installworld/installkernel with this new option? From rwatson at FreeBSD.org Wed Dec 31 10:26:37 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Dec 31 10:26:43 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: <92f477740812302109n78d9f303y5c49b8ca6ab082c5@mail.gmail.com> References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> <6eb82e0812301247uaf5eb45v529765e29220fd80@mail.gmail.com> <92f477740812302109n78d9f303y5c49b8ca6ab082c5@mail.gmail.com> Message-ID: On Wed, 31 Dec 2008, Bryan Alves wrote: > Where can I find more information about in kernel NFS_LOCKD? It doesn't > seem to exist on google at all, and I'm hesistant to upgrade from RELEASE > without doing due dilligence in terms of research. I'm not sure there's a specific web page/etc on it, but you can find the initial patch announcement here: http://lists.freebsd.org/pipermail/freebsd-current/2008-March/084446.html I believe the "-k" described in the original post is no longer required. > Is rpc.lockd_enable still required in rc.conf when using this? Hmm. I believe so. > Do I need to do anything else besides update to RELENG_7 and > installworld/installkernel with this new option? You can do the normal upgrade -- build world, kernel, install kernel, reboot, mergemaster -p, installworld, full mergemaster, reboot. Or you can wait another week and install FreeBSD 7.1-RELEASE, if you want to be running on a release rather than doing incremental updates along the branch. Robert N M Watson Computer Laboratory University of Cambridge From rmacklem at uoguelph.ca Wed Dec 31 19:13:54 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Dec 31 19:14:03 2008 Subject: NFS locking problems with 7.0-RELEASE In-Reply-To: <92f477740812301246k7ed77511oc969c22a3b5aad4d@mail.gmail.com> References: <92f477740812291739o7c0b840bsd1cce4375577c41f@mail.gmail.com> <92f477740812301246k7ed77511oc969c22a3b5aad4d@mail.gmail.com> Message-ID: On Tue, 30 Dec 2008, Bryan Alves wrote: >> - Download my server patches (ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD7) and >> switch to using nfsv4, which has integral locking in the protocol. >> >> Have a good holiday, rick >> >> > Is there another location where I can get the nfs4 patches? That FTP seems > to be down. > Seems to be working here. Just "ftp ftp.cis.uoguelph.ca", login "anonymous", then "cd pub/nfsv4/FreeBSD7". (Is it that you can't find the machine? It's IP# is 131.104.48.112.) > Also, outside the scope of this list, but since the discussion is opened, I > might as well ask: > > If this NFS is the only remote mount that involves writing (it's opened > read-only in other locations), and it's read/write locally, is it safe to > use local locking? > Yes, I believe so. Even if there are multiple clients rw mounting a file system, local locking should be fine unless there are multiple clients writing the same file in the file system. (With a single writer and multiple readers, an application might run into coherency problems if that application was written to use byte range locking to maintain coherency (ie. most recently written data visible to the readers), but that seems unlikely to matter for most applications/environments. (And I'm not sure if the NLM is wired into NFS is such a way as to maintain full coherency for the locked byte ranges anyhow, since normally NFS does not maintain full coherency?) Have a happy new years, rick