From mike at sentex.net Tue Jul 1 00:26:19 2008 From: mike at sentex.net (mike@sentex.net) Date: Tue Jul 1 00:26:23 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080628132632.R1807@kozubik.com> References: <20080628132632.R1807@kozubik.com> Message-ID: <6kti64tf86fshn7s38dfe41jb3oucslsrf@4ax.com> On Sat, 28 Jun 2008 14:56:03 -0700 (PDT), in sentex.lists.freebsd.fs > >So I'll try this instead: > >I will paypal $1000 to whoever can deliver fully clean 64-bit quotas >and userland tools in FreeBSD by July 20, 2008. > We dont need this feature now, but will one day. We will add another $250 to the mix either via paypal or to the FreeBSD foundation... ---Mike From andrew-freebsd at areilly.bpc-users.org Tue Jul 1 13:22:40 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Tue Jul 1 13:22:50 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080630085612.G1807@kozubik.com> References: <20080628132632.R1807@kozubik.com> <864p7bw387.fsf@ds4.des.no> <20080630073539.U1807@kozubik.com> <4868FB2F.7010204@FreeBSD.org> <20080630085612.G1807@kozubik.com> Message-ID: <20080701035755.GA23685@duncan.reilly.home> On Mon, Jun 30, 2008 at 09:05:48AM -0700, John Kozubik wrote: > That point is well taken. However, regardless of the adoption rate, I > _do_ believe that there is still a qualitative difference between quotas > and, for instance, ZFS - in terms of "coreness". One qualitative difference is that lots of people seem to be interested in ZFS. I haven't seen any mention of quotas for many years. In fact, I was under a vague impression that they hadn't worked since UFS2, and that that was still the case because no-one cared. > I believe this because of the historical presence of this functionality > and the reasonable expectation that it represents a basic function of a > unix-based OS (not just FreeBSD). There are lots of historical functionalities that are no longer part of the OS. Things change. Now it may be that quotas are indeed useful enough to be salvaged in a geric fashion (applicable to arbitrary filesystems, as has been mentioned). Not my call: I'm certainly not going to do the work. But with the level of use in recent years, maybe the right answer is to consign them to the bin (or an optional GEOM layer or whatever), along with tty line disciplines, uucp, isdn and X10? Cheers, Andrew From rebehn at ant.uni-bremen.de Tue Jul 1 13:59:06 2008 From: rebehn at ant.uni-bremen.de (Heinrich Rebehn) Date: Tue Jul 1 13:59:10 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701035755.GA23685@duncan.reilly.home> References: <20080628132632.R1807@kozubik.com> <864p7bw387.fsf@ds4.des.no> <20080630073539.U1807@kozubik.com> <4868FB2F.7010204@FreeBSD.org> <20080630085612.G1807@kozubik.com> <20080701035755.GA23685@duncan.reilly.home> Message-ID: <486A3365.7020500@ant.uni-bremen.de> Andrew Reilly wrote: > On Mon, Jun 30, 2008 at 09:05:48AM -0700, John Kozubik wrote: >> That point is well taken. However, regardless of the adoption rate, I >> _do_ believe that there is still a qualitative difference between quotas >> and, for instance, ZFS - in terms of "coreness". > > One qualitative difference is that lots of people seem to be > interested in ZFS. I haven't seen any mention of quotas for > many years. In fact, I was under a vague impression that they > hadn't worked since UFS2, and that that was still the case > because no-one cared. They *do* work and we do use them. You need them if lots of users share a common disk. The fact that they are not mentioned, only means that they "simply work". > >> I believe this because of the historical presence of this functionality >> and the reasonable expectation that it represents a basic function of a >> unix-based OS (not just FreeBSD). > > There are lots of historical functionalities that are no longer > part of the OS. Things change. > > Now it may be that quotas are indeed useful enough to > be salvaged in a geric fashion (applicable to arbitrary > filesystems, as has been mentioned). Not my call: I'm certainly > not going to do the work. But with the level of use in recent > years, maybe the right answer is to consign them to the bin > (or an optional GEOM layer or whatever), along with tty line > disciplines, uucp, isdn and X10? With this reasoning you could also drop the shell and tell people to use kde. BTW, X10 has been replaced by X11 ;-) Cheers, Heinrich From des at des.no Tue Jul 1 14:05:09 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Jul 1 14:05:16 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <486A3365.7020500@ant.uni-bremen.de> (Heinrich Rebehn's message of "Tue\, 01 Jul 2008 15\:38\:45 +0200") References: <20080628132632.R1807@kozubik.com> <864p7bw387.fsf@ds4.des.no> <20080630073539.U1807@kozubik.com> <4868FB2F.7010204@FreeBSD.org> <20080630085612.G1807@kozubik.com> <20080701035755.GA23685@duncan.reilly.home> <486A3365.7020500@ant.uni-bremen.de> Message-ID: <86skutzoml.fsf@ds4.des.no> Heinrich Rebehn writes: > Andrew Reilly writes: > > But with the level of use in recent years, maybe the right answer is > > to consign them to the bin (or an optional GEOM layer or whatever), > > along with tty line disciplines, uucp, isdn and X10? > With this reasoning you could also drop the shell and tell people to > use kde. BTW, X10 has been replaced by X11 ;-) No, the X10 Andrew refers to has been replaced by better standards such as LonWorks. JFGI. DES -- Dag-Erling Sm?rgrav - des@des.no From bakul at bitblocks.com Tue Jul 1 18:08:30 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 1 18:08:34 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: Your message of "Tue, 01 Jul 2008 16:05:06 +0200." <86skutzoml.fsf@ds4.des.no> Message-ID: <20080701175932.0B76F5B4B@mail.bitblocks.com> On Tue, 01 Jul 2008 16:05:06 +0200 =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= wrote: > Heinrich Rebehn writes: > > Andrew Reilly writes: > > > But with the level of use in recent years, maybe the right answer is > > > to consign them to the bin (or an optional GEOM layer or whatever), > > > along with tty line disciplines, uucp, isdn and X10? > > With this reasoning you could also drop the shell and tell people to > > use kde. BTW, X10 has been replaced by X11 ;-) > > No, the X10 Andrew refers to has been replaced by better standards such > as LonWorks. JFGI. Hey, X10 works well enough and you can still get X10 modules and controllers. But you don't need any kernel support; just need one little program. I still do this: x10 switch printer on x10 switch printer off Besides, LonWorks is already pass\xe9 -- from what I hear the latest hot thing is zigbee (but that was last month; surely a new standard is afoot by now). To bring this back on topic, perhaps John Kobuzik can just use the zfs since it already has quota support? For example, # zfs create z/foo # zfs quota=10M z/foo dd < /dev/zero bs=1M count=20 > /z/foo/xx dd: stdout: Disc quota exceeded 11+0 records in 10+0 records out 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) # zfs set quota=10T z/foo # zfs get quota z/foo NAME PROPERTY VALUE SOURCE z/foo quota 10T local From ticso at cicely7.cicely.de Tue Jul 1 20:30:45 2008 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Tue Jul 1 20:30:48 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701175932.0B76F5B4B@mail.bitblocks.com> References: <86skutzoml.fsf@ds4.des.no> <20080701175932.0B76F5B4B@mail.bitblocks.com> Message-ID: <20080701200254.GB17364@cicely7.cicely.de> On Tue, Jul 01, 2008 at 10:59:31AM -0700, Bakul Shah wrote: > On Tue, 01 Jul 2008 16:05:06 +0200 =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= wrote: > > Heinrich Rebehn writes: > > > Andrew Reilly writes: > > > > But with the level of use in recent years, maybe the right answer is > > > > to consign them to the bin (or an optional GEOM layer or whatever), > > > > along with tty line disciplines, uucp, isdn and X10? > > > With this reasoning you could also drop the shell and tell people to > > > use kde. BTW, X10 has been replaced by X11 ;-) > > > > No, the X10 Andrew refers to has been replaced by better standards such > > as LonWorks. JFGI. > > Hey, X10 works well enough and you can still get X10 modules > and controllers. But you don't need any kernel support; just > need one little program. I still do this: > > x10 switch printer on > > x10 switch printer off > > Besides, LonWorks is already pass\xe9 -- from what I hear the > latest hot thing is zigbee (but that was last month; surely a > new standard is afoot by now). > > To bring this back on topic, perhaps John Kobuzik can just > use the zfs since it already has quota support? For example, > > # zfs create z/foo > # zfs quota=10M z/foo > dd < /dev/zero bs=1M count=20 > /z/foo/xx > dd: stdout: Disc quota exceeded > 11+0 records in > 10+0 records out > 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) > # zfs set quota=10T z/foo > # zfs get quota z/foo > NAME PROPERTY VALUE SOURCE > z/foo quota 10T local This is basicly what the partition size is for normal filesystems, with the great ability of course to change it cheaply at any time. But this is in no way a per user quota in the way ufs does. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From bakul at bitblocks.com Tue Jul 1 21:30:07 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 1 21:30:09 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: Your message of "Tue, 01 Jul 2008 22:02:54 +0200." <20080701200254.GB17364@cicely7.cicely.de> Message-ID: <20080701213006.37D675B4B@mail.bitblocks.com> On Tue, 01 Jul 2008 22:02:54 +0200 Bernd Walter wrote: > On Tue, Jul 01, 2008 at 10:59:31AM -0700, Bakul Shah wrote: > > To bring this back on topic, perhaps John Kobuzik can just > > use the zfs since it already has quota support? For example, > > > > # zfs create z/foo > > # zfs quota=10M z/foo > > dd < /dev/zero bs=1M count=20 > /z/foo/xx > > dd: stdout: Disc quota exceeded > > 11+0 records in > > 10+0 records out > > 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) > > # zfs set quota=10T z/foo > > # zfs get quota z/foo > > NAME PROPERTY VALUE SOURCE > > z/foo quota 10T local > > This is basicly what the partition size is for normal filesystems, > with the great ability of course to change it cheaply at any time. > But this is in no way a per user quota in the way ufs does. It is not the same but can serve a similer purpose if each user gets his own filesystem (and yes, I am aware of the rebooting issue with zfs with thousands of filesystems). He wanted support for 2TB+ quota on ufs by July 20. If that doesn't happen at least he can limp along with this. From ticso at cicely7.cicely.de Tue Jul 1 22:13:38 2008 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Tue Jul 1 22:13:41 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701213006.37D675B4B@mail.bitblocks.com> References: <20080701200254.GB17364@cicely7.cicely.de> <20080701213006.37D675B4B@mail.bitblocks.com> Message-ID: <20080701221323.GE17364@cicely7.cicely.de> On Tue, Jul 01, 2008 at 02:30:06PM -0700, Bakul Shah wrote: > On Tue, 01 Jul 2008 22:02:54 +0200 Bernd Walter wrote: > > On Tue, Jul 01, 2008 at 10:59:31AM -0700, Bakul Shah wrote: > > > To bring this back on topic, perhaps John Kobuzik can just > > > use the zfs since it already has quota support? For example, > > > > > > # zfs create z/foo > > > # zfs quota=10M z/foo > > > dd < /dev/zero bs=1M count=20 > /z/foo/xx > > > dd: stdout: Disc quota exceeded > > > 11+0 records in > > > 10+0 records out > > > 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) > > > # zfs set quota=10T z/foo > > > # zfs get quota z/foo > > > NAME PROPERTY VALUE SOURCE > > > z/foo quota 10T local > > > > This is basicly what the partition size is for normal filesystems, > > with the great ability of course to change it cheaply at any time. > > But this is in no way a per user quota in the way ufs does. > > It is not the same but can serve a similer purpose if each > user gets his own filesystem (and yes, I am aware of the > rebooting issue with zfs with thousands of filesystems). He > wanted support for 2TB+ quota on ufs by July 20. If that > doesn't happen at least he can limp along with this. This works for home, but not for /tmp, where you may need support for user quota as well. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From morganw at chemikals.org Wed Jul 2 00:42:52 2008 From: morganw at chemikals.org (Wes Morgan) Date: Wed Jul 2 00:42:56 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701213006.37D675B4B@mail.bitblocks.com> References: <20080701213006.37D675B4B@mail.bitblocks.com> Message-ID: On Tue, 1 Jul 2008, Bakul Shah wrote: > On Tue, 01 Jul 2008 22:02:54 +0200 Bernd Walter wrote: >> On Tue, Jul 01, 2008 at 10:59:31AM -0700, Bakul Shah wrote: >>> To bring this back on topic, perhaps John Kobuzik can just >>> use the zfs since it already has quota support? For example, >>> >>> # zfs create z/foo >>> # zfs quota=10M z/foo >>> dd < /dev/zero bs=1M count=20 > /z/foo/xx >>> dd: stdout: Disc quota exceeded >>> 11+0 records in >>> 10+0 records out >>> 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) >>> # zfs set quota=10T z/foo >>> # zfs get quota z/foo >>> NAME PROPERTY VALUE SOURCE >>> z/foo quota 10T local >> >> This is basicly what the partition size is for normal filesystems, >> with the great ability of course to change it cheaply at any time. >> But this is in no way a per user quota in the way ufs does. > > It is not the same but can serve a similer purpose if each > user gets his own filesystem (and yes, I am aware of the > rebooting issue with zfs with thousands of filesystems). He > wanted support for 2TB+ quota on ufs by July 20. If that > doesn't happen at least he can limp along with this. On a totally spurrious note, I'd love to know the storage environment where a 1 TB quota on a multi-user system is meaningful. If I truly need that much space as a user, and I hit your quota limit, I'll probably be a very, very unhappy user! From brooks at freebsd.org Wed Jul 2 14:59:11 2008 From: brooks at freebsd.org (Brooks Davis) Date: Wed Jul 2 14:59:13 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: References: <20080701213006.37D675B4B@mail.bitblocks.com> Message-ID: <20080702145929.GA33529@lor.one-eyed-alien.net> On Tue, Jul 01, 2008 at 07:25:49PM -0500, Wes Morgan wrote: > On Tue, 1 Jul 2008, Bakul Shah wrote: > >> On Tue, 01 Jul 2008 22:02:54 +0200 Bernd Walter wrote: >>> On Tue, Jul 01, 2008 at 10:59:31AM -0700, Bakul Shah wrote: >>>> To bring this back on topic, perhaps John Kobuzik can just >>>> use the zfs since it already has quota support? For example, >>>> >>>> # zfs create z/foo >>>> # zfs quota=10M z/foo >>>> dd < /dev/zero bs=1M count=20 > /z/foo/xx >>>> dd: stdout: Disc quota exceeded >>>> 11+0 records in >>>> 10+0 records out >>>> 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) >>>> # zfs set quota=10T z/foo >>>> # zfs get quota z/foo >>>> NAME PROPERTY VALUE SOURCE >>>> z/foo quota 10T local >>> >>> This is basicly what the partition size is for normal filesystems, >>> with the great ability of course to change it cheaply at any time. >>> But this is in no way a per user quota in the way ufs does. >> >> It is not the same but can serve a similer purpose if each >> user gets his own filesystem (and yes, I am aware of the >> rebooting issue with zfs with thousands of filesystems). He >> wanted support for 2TB+ quota on ufs by July 20. If that >> doesn't happen at least he can limp along with this. > > On a totally spurrious note, I'd love to know the storage environment where > a 1 TB quota on a multi-user system is meaningful. If I truly need that > much space as a user, and I hit your quota limit, I'll probably be a > very, very unhappy user! That's probably about where we'll set the default quotas (probably more like 5-10TB) on a new system we're deploying at work. It's more that most users will need, but will ensure that a few users can't run us out of space (40TB available). -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080702/53016ad8/attachment.pgp From rwatson at FreeBSD.org Wed Jul 2 22:56:22 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Jul 2 22:56:26 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701035755.GA23685@duncan.reilly.home> References: <20080628132632.R1807@kozubik.com> <864p7bw387.fsf@ds4.des.no> <20080630073539.U1807@kozubik.com> <4868FB2F.7010204@FreeBSD.org> <20080630085612.G1807@kozubik.com> <20080701035755.GA23685@duncan.reilly.home> Message-ID: <20080702235138.W47773@fledge.watson.org> On Tue, 1 Jul 2008, Andrew Reilly wrote: > On Mon, Jun 30, 2008 at 09:05:48AM -0700, John Kozubik wrote: >> That point is well taken. However, regardless of the adoption rate, I _do_ >> believe that there is still a qualitative difference between quotas and, >> for instance, ZFS - in terms of "coreness". > > One qualitative difference is that lots of people seem to be interested in > ZFS. I haven't seen any mention of quotas for many years. In fact, I was > under a vague impression that they hadn't worked since UFS2, and that that > was still the case because no-one cared. You may be thinking of the lag in support for MPSAFE UFS with quotas, which I think we didn't ship until 7.0 (and will also appear in the forthcoming 6.4). Prior to that, UFS was forced to run with the Giant lock if quotas were enabled in the kernel. Other than the recently reported 64-bit quota problem, I believe they have worked fine since UFS2 was introduced. Robert N M Watson Computer Laboratory University of Cambridge From avg at icyb.net.ua Fri Jul 4 20:36:56 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Jul 4 20:37:03 2008 Subject: newfs_msdos and dvd-ram (fwsectors, fwheads) In-Reply-To: <47CC55B0.4020607@icyb.net.ua> References: <889.1203600472@critter.freebsd.dk> <47CC55B0.4020607@icyb.net.ua> Message-ID: <486E89E0.8010301@icyb.net.ua> on 03/03/2008 21:46 Andriy Gapon said the following: > on 21/02/2008 15:27 Poul-Henning Kamp said the following: >> In message <47BD6F39.7080105@icyb.net.ua>, Andriy Gapon writes: >>> 2) fake those properties in newfs_msdof; >>> benefit: this would help with other physical devices that can host >>> FAT; >> This is the way to do it, but it might make sense to make a library >> routine do it, to get consistent behaviour. BTW, the same issue applies to md device too. I.e. if you would like to create a FAT image in a file, newfs_msdos won't let you. > I opened a PR for this approach in a simple form. > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/121182 > > What could be a good place to put thing for re-use/sharing? libutil? -- Andriy Gapon From xcllnt at mac.com Fri Jul 4 21:26:10 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Fri Jul 4 21:26:17 2008 Subject: newfs_msdos and dvd-ram (fwsectors, fwheads) In-Reply-To: <486E89E0.8010301@icyb.net.ua> References: <889.1203600472@critter.freebsd.dk> <47CC55B0.4020607@icyb.net.ua> <486E89E0.8010301@icyb.net.ua> Message-ID: <80837A4B-A49A-44A4-AFBA-819DD1ED9DD4@mac.com> On Jul 4, 2008, at 1:36 PM, Andriy Gapon wrote: > on 03/03/2008 21:46 Andriy Gapon said the following: >> on 21/02/2008 15:27 Poul-Henning Kamp said the following: >>> In message <47BD6F39.7080105@icyb.net.ua>, Andriy Gapon writes: >>>> 2) fake those properties in newfs_msdof; >>>> benefit: this would help with other physical devices that can host >>>> FAT; >>> This is the way to do it, but it might make sense to make a library >>> routine do it, to get consistent behaviour. > > BTW, the same issue applies to md device too. > I.e. if you would like to create a FAT image in a file, newfs_msdos > won't let you. > >> I opened a PR for this approach in a simple form. >> http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/121182 >> What could be a good place to put thing for re-use/sharing? libutil? Note that this problem is already solved with GPart. As long as you have some partitioning scheme on the media, GPart will synthesize CHS parameters. FYI, -- Marcel Moolenaar xcllnt@mac.com From kaluna at gmail.com Sat Jul 5 09:27:13 2008 From: kaluna at gmail.com (Carlos Luna) Date: Sat Jul 5 09:27:19 2008 Subject: Filesystem is not clean - run fsck Message-ID: <11c17ec30807050158t24c88491pe4407e01f6687d72@mail.gmail.com> Hi I'd used freenas about 5 years without any problem. Now I can?t mount my raid volume and in his sourceforge forums seems they cant help me. Hope this list is the right list for my issue. When I try to fsck,I get: casa:/dev# fsck -t ufs -y /dev/pst0s1 ** /dev/pst0s1 ** Last Mounted on /mnt/raid ** Phase 1 - Check Blocks and Sizes -4439300862985009506 BAD I=86 3443570138036206556 BAD I=86 -7476842757969057647 BAD I=86 -8078484667502176485 BAD I=86 2249916482063805839 BAD I=86 -3291681609520367063 BAD I=86 7780434385339928353 BAD I=86 -4372486048108189431 BAD I=86 8774078035736727371 BAD I=86 -2035310265760485777 BAD I=86 6848295312539782814 BAD I=86 EXCESSIVE BAD BLKS I=86 CONTINUE? yes ... .... UNKNOWN FILE TYPE I=7254140 CLEAR? yes UNKNOWN FILE TYPE I=7254141 CLEAR? yes UNKNOWN FILE TYPE I=7254142 CLEAR? yes UNKNOWN FILE TYPE I=7254143 CLEAR? yes fsck_ufs: cannot alloc 3037795832 bytes for inoinfo I have a lot of info there, 1 TB. I will appreciate any help. Regards Mike From bugmaster at FreeBSD.org Mon Jul 7 11:06:58 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 7 11:08:00 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200807071106.m67B6wMk062024@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition 7 problems total. From brooks at freebsd.org Mon Jul 7 15:47:38 2008 From: brooks at freebsd.org (Brooks Davis) Date: Mon Jul 7 15:47:44 2008 Subject: Filesystem is not clean - run fsck In-Reply-To: <11c17ec30807050158t24c88491pe4407e01f6687d72@mail.gmail.com> References: <11c17ec30807050158t24c88491pe4407e01f6687d72@mail.gmail.com> Message-ID: <20080707154805.GA57420@lor.one-eyed-alien.net> On Sat, Jul 05, 2008 at 10:58:33AM +0200, Carlos Luna wrote: > Hi I'd used freenas about 5 years without any problem. Now I can?t mount my > raid volume and in his sourceforge forums seems they cant help me. Hope this > list is the right list for my issue. > > When I try to fsck,I get: > casa:/dev# fsck -t ufs -y /dev/pst0s1 > ** /dev/pst0s1 > ** Last Mounted on /mnt/raid > ** Phase 1 - Check Blocks and Sizes > -4439300862985009506 BAD I=86 > 3443570138036206556 BAD I=86 > -7476842757969057647 BAD I=86 > -8078484667502176485 BAD I=86 > 2249916482063805839 BAD I=86 > -3291681609520367063 BAD I=86 > 7780434385339928353 BAD I=86 > -4372486048108189431 BAD I=86 > 8774078035736727371 BAD I=86 > -2035310265760485777 BAD I=86 > 6848295312539782814 BAD I=86 > EXCESSIVE BAD BLKS I=86 > CONTINUE? yes > > ... > .... > > UNKNOWN FILE TYPE I=7254140 > CLEAR? yes > > UNKNOWN FILE TYPE I=7254141 > CLEAR? yes > > UNKNOWN FILE TYPE I=7254142 > CLEAR? yes > > UNKNOWN FILE TYPE I=7254143 > CLEAR? yes > > fsck_ufs: cannot alloc 3037795832 bytes for inoinfo > I have a lot of info there, 1 TB. I will appreciate any help. It looks like you have a somewhat large file system, apparently with a lot of small files on it. The message indicates that you need to be able to allocate over 3GB of address space to handle this. As such you will need a 64-bit machine, ideally with 4GB or more RAM and probably with a large swap partition. In theory it should be possible to write a constrained memory use version of fsck, but to my knowledge no one has done so and I suspect it would be a time consuming development effort. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080707/5dab7796/attachment.pgp From hartzell at alerce.com Mon Jul 7 17:40:26 2008 From: hartzell at alerce.com (George Hartzell) Date: Mon Jul 7 17:40:32 2008 Subject: using zfs and unionfs together, does zfs need to be extended? Message-ID: <18546.20476.590665.29995@almost.alerce.com> I'd like to be able to set up a large-ish number of very similar jails, with a minimum of fuss and take advantage of zfs' cool features. I'd like to use unionfs to do this, but zfs' lack of whiteout support seems to make it impossible. [jump to the bottom if you want to skip the setup and get to the questions] It seems like the most popular way to set up jails these days uses read-only nullfs mounts of a base system and symbolic links into a read-write nullfs mount for each jail's specific stuff (etc, /usr/local, etc...). These approaches are well described in: http://erdgeist.org/arts/software/ezjail http://www.freebsd.org/doc/en/books/handbook/jails-application.html and they work fine with zfs based storage. It's also possible to use unionfs to layer jail-specific storage over a base system. While this approach gives more per-jail flexibility and avoids having to relocate various directories in the base system, various unionfs problems seem to have pushed it out of favor. The ongoing work of daichi@freebsd.org et al. that fixes various problems with unionfs, http://people.freebsd.org/~daichi/unionfs/ makes it look as if this approach might be now be safe, using something like: mount -t unionfs -o below,noatime /usr/jails/base /usr/jails/www The obvious zfs analog to this: mount -t unionfs -o below,noatime /tank/jails/base /tank/jails/www fails with: mount_unionfs: /tank/jails/www: Operation not supported A bit of digging suggests that the mount fails when the unionfs code checks to see if /tank/jails/www supports whiteouts. The fact that this check doesn't occur if the uniondir is read-only provides a way to superficially check if whiteouts are the only problem, this: mount -t unionfs -o ro,below,noatime /tank/jails/base /tank/jails/www does indeed seem to lead to a working [albeit read-only] union mount. One can work around the problem by creating a ZFS volume, building a UFS filesystem on it, and then using that as the uniondir, e.g.: zfs create -V 5G tank/jail/vol1 newfs /dev/zvol/tank/jail/vol1 mkdir /usr/jail/zvol-www mount /dev/zvol/tank/jail/vol1 /usr/jail/zvol-www/ mount -t unionfs -o below,noatime /tank/jail/base/ /usr/jail/zvol-www The upper layer is still [presumably, I haven't tested these yet] snapshot-able, send-able, etc.... but this approach leaves me with a bunch of UFS filesystems that need care and feeding (fsck, etc...). So finally, the question: How hard would it be to add whiteout support to our ZFS? Is it "just" a matter of understanding the places in the UFS code that do whiteout things, locating the analogous places in the ZFS tree and doing similar things (it seems to be a "simple" matter of creating/destroying a whiteout vnode when necessary and checking for it when appropriate) or is there something fundamentally harder about it? Has anyone already done it? If it were doable/done cleanly, might it get committed? Thanks, g. From kris at FreeBSD.org Mon Jul 7 17:46:20 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jul 7 17:47:47 2008 Subject: using zfs and unionfs together, does zfs need to be extended? In-Reply-To: <18546.20476.590665.29995@almost.alerce.com> References: <18546.20476.590665.29995@almost.alerce.com> Message-ID: <4872566C.6000206@FreeBSD.org> George Hartzell wrote: > I'd like to be able to set up a large-ish number of very similar > jails, with a minimum of fuss and take advantage of zfs' cool > features. I'd like to use unionfs to do this, but zfs' lack of > whiteout support seems to make it impossible. [jump to the bottom if > you want to skip the setup and get to the questions] > > It seems like the most popular way to set up jails these days uses > read-only nullfs mounts of a base system and symbolic links into a > read-write nullfs mount for each jail's specific stuff (etc, > /usr/local, etc...). The "ZFS way" is to just clone your jail filesystem into each jail instance. Kris From hartzell at alerce.com Mon Jul 7 17:59:25 2008 From: hartzell at alerce.com (George Hartzell) Date: Mon Jul 7 17:59:32 2008 Subject: using zfs and unionfs together, does zfs need to be extended? In-Reply-To: <4872566C.6000206@FreeBSD.org> References: <18546.20476.590665.29995@almost.alerce.com> <4872566C.6000206@FreeBSD.org> Message-ID: <18546.22908.193997.709865@almost.alerce.com> Kris Kennaway writes: > George Hartzell wrote: > > I'd like to be able to set up a large-ish number of very similar > > jails, with a minimum of fuss and take advantage of zfs' cool > > features. I'd like to use unionfs to do this, but zfs' lack of > > whiteout support seems to make it impossible. [jump to the bottom if > > you want to skip the setup and get to the questions] > > > > It seems like the most popular way to set up jails these days uses > > read-only nullfs mounts of a base system and symbolic links into a > > read-write nullfs mount for each jail's specific stuff (etc, > > /usr/local, etc...). > > The "ZFS way" is to just clone your jail filesystem into each jail instance. Both the nullfs approach used by ezjail and described in the handbook and the unionfs approach make updates *much* easier. A change/update to the jail base is automatically visible in all of the jails. As I understand a zfs clones (and a quick test backs this up), they're copies of the original filesystem, based on a snapshot. Once they're cloned they no longer "see" updates to the base system. I'm not even sure that you get the space savings, I just did a zfs snapshot and then a zfs clone and du -sH on the two filesystems reports the same size. That seems odd though (with all the copy on write stuff available), but.... g. From kris at FreeBSD.org Mon Jul 7 18:10:34 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jul 7 18:10:40 2008 Subject: using zfs and unionfs together, does zfs need to be extended? In-Reply-To: <18546.22908.193997.709865@almost.alerce.com> References: <18546.20476.590665.29995@almost.alerce.com> <4872566C.6000206@FreeBSD.org> <18546.22908.193997.709865@almost.alerce.com> Message-ID: <48725C19.5040408@FreeBSD.org> George Hartzell wrote: > Kris Kennaway writes: > > George Hartzell wrote: > > > I'd like to be able to set up a large-ish number of very similar > > > jails, with a minimum of fuss and take advantage of zfs' cool > > > features. I'd like to use unionfs to do this, but zfs' lack of > > > whiteout support seems to make it impossible. [jump to the bottom if > > > you want to skip the setup and get to the questions] > > > > > > It seems like the most popular way to set up jails these days uses > > > read-only nullfs mounts of a base system and symbolic links into a > > > read-write nullfs mount for each jail's specific stuff (etc, > > > /usr/local, etc...). > > > > The "ZFS way" is to just clone your jail filesystem into each jail instance. > > Both the nullfs approach used by ezjail and described in the handbook > and the unionfs approach make updates *much* easier. A change/update > to the jail base is automatically visible in all of the jails. > > As I understand a zfs clones (and a quick test backs this up), they're > copies of the original filesystem, based on a snapshot. Once they're > cloned they no longer "see" updates to the base system. That's right. Keep in mind that depending on what you are changing, it can be dangerous to modify files that are in use. Anyway if you require this model then nullfs or unionfs is indeed required. > I'm not even sure that you get the space savings, I just did a zfs > snapshot and then a zfs clone and du -sH on the two filesystems > reports the same size. That seems odd though (with all the copy on > write stuff available), but.... They are copy-on-write, so if you interpret the data correctly you will see that they aren't using additional space until you write to them :) Kris From john at kozubik.com Tue Jul 8 03:25:13 2008 From: john at kozubik.com (John Kozubik) Date: Tue Jul 8 03:25:19 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: References: <20080701213006.37D675B4B@mail.bitblocks.com> Message-ID: <20080707203314.T1807@kozubik.com> On Tue, 1 Jul 2008, Wes Morgan wrote: > > It is not the same but can serve a similer purpose if each > > user gets his own filesystem (and yes, I am aware of the > > rebooting issue with zfs with thousands of filesystems). He > > wanted support for 2TB+ quota on ufs by July 20. If that > > doesn't happen at least he can limp along with this. > > On a totally spurrious note, I'd love to know the storage environment > where a 1 TB quota on a multi-user system is meaningful. If I truly need > that much space as a user, and I hit your quota limit, I'll probably be a > very, very unhappy user! No, you'd be a paying customer. The environment is rsync.net. Users pay a monthly fee for X GB of storage. Some users require more than 2200 GB. It makes me very happy to run a modern enterprise with basic unix tools and methodologies. It's nice to imagine that all manner of normal folks out in the world, in 2008, are being served by the same logic and philosophies that evolved in the days of true multi-user shared unix systems. That is, if those core functions still worked. ----- John Kozubik - john@kozubik.com - http://www.kozubik.com From john at kozubik.com Tue Jul 8 03:28:39 2008 From: john at kozubik.com (John Kozubik) Date: Tue Jul 8 03:28:47 2008 Subject: It's 2008. 1 TB disk drives cost $160. Quotas are 32-bit. In-Reply-To: <20080701175932.0B76F5B4B@mail.bitblocks.com> References: <20080701175932.0B76F5B4B@mail.bitblocks.com> Message-ID: <20080707204943.D1807@kozubik.com> On Tue, 1 Jul 2008, Bakul Shah wrote: > To bring this back on topic, perhaps John Kozubik can just > use the zfs since it already has quota support? For example, > > # zfs create z/foo > # zfs quota=10M z/foo > dd < /dev/zero bs=1M count=20 > /z/foo/xx > dd: stdout: Disc quota exceeded > 11+0 records in > 10+0 records out > 10485760 bytes transferred in 4.718700 secs (2222171 bytes/sec) > # zfs set quota=10T z/foo > # zfs get quota z/foo > NAME PROPERTY VALUE SOURCE > z/foo quota 10T local Thanks - I appreciate this, and am continually impressed by the zfs work being done on FreeBSD. However, ZFS on FreeBSD is still experimental, and given the environment that I am deploying in (see previous post) it is impossible to consider it. ----- John Kozubik - john@kozubik.com - http://www.kozubik.com From mike503 at gmail.com Tue Jul 8 06:42:06 2008 From: mike503 at gmail.com (mike) Date: Tue Jul 8 06:42:13 2008 Subject: Thinking of using ZFS/FBSD for a backup system Message-ID: I administer a handful of servers - both for work and for my side business. Right now I am rsyncing /home each server back each night from the server to my own machine at home. I'd like to add in snapshots, so I want to sanity check here - I wouldn't be doing much more than: - creating a separate zfs filesystem for each server - creating a nightly snapshot after the rsync finishes Since I cannot change the filesystems on the remote machines now (and all run Linux anyway), this essentially gives me the ability to have daily snapshots of each machine at my fingertips should I need it - correct? Just wanted to sanity check here before investing some money and time into this solution... Also if anyone wants to reply to me off list with hardware that works well for FBSD 7 + ZFS I'd be grateful :) Thanks in advance. From phoemix at harmless.hu Tue Jul 8 08:23:48 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Tue Jul 8 08:23:54 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: Message-ID: <20080708100701.57031cda@twoflower.in.publishing.hu> I've already made a backup system exactly in the scheme you've just described, in order to replace dirvish. It worked quite well, but the ZFS port was so experimental that we couldn't go on. No matter how much i've tried to finetune ZFS it kept in randomly rebooting in every 1-2-3 weeks, and a few backup cycles were lost. Regardless of this, the system worked quite well. If ZFS were stable, this easily could be our backup system. ZFS is great, awesome, but a bit unreliable on FreeBSD, still needs some work. On Mon, 7 Jul 2008 23:15:54 -0700 mike wrote: > I administer a handful of servers - both for work and for my side > business. Right now I am rsyncing /home each server back each night > from the server to my own machine at home. > > I'd like to add in snapshots, so I want to sanity check here - I > wouldn't be doing much more than: > > - creating a separate zfs filesystem for each server > - creating a nightly snapshot after the rsync finishes > > Since I cannot change the filesystems on the remote machines now (and > all run Linux anyway), this essentially gives me the ability to have > daily snapshots of each machine at my fingertips should I need it - > correct? > > Just wanted to sanity check here before investing some money and time > into this solution... > > Also if anyone wants to reply to me off list with hardware that works > well for FBSD 7 + ZFS I'd be grateful :) > > Thanks in advance. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- ?dv?lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080708/5cb67664/signature.pgp From mike503 at gmail.com Tue Jul 8 08:31:29 2008 From: mike503 at gmail.com (mike) Date: Tue Jul 8 08:31:36 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708100701.57031cda@twoflower.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> Message-ID: On 7/8/08, CZUCZY Gergely wrote: > Regardless of this, the system worked quite well. If ZFS were stable, this > easily could be our backup system. ZFS is great, awesome, but a bit unreliable > on FreeBSD, still needs some work. Really? I thought ZFS for basic things was not too bad in FBSD now. By basic I mean simple filesystem creation, snapshots and normal devices. Not some crazy SAN LUNs and weird volume management stuff. I would really love to use FBSD as opposed to a Solaris derivative, since I know nothing about them and I'd have to dedicate a machine for it at home. Hrm. I wonder if I could just get by running a Solaris derivative inside of a VM in VMware or something. From phoemix at harmless.hu Tue Jul 8 08:34:41 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Tue Jul 8 08:34:48 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: <20080708100701.57031cda@twoflower.in.publishing.hu> Message-ID: <20080708103437.553af3fb@twoflower.in.publishing.hu> This is the kmem_size overstep issue I'm mostly talking about. No matter how much you tune your system, the chance of a kernel panic due to kmem_size is too small remains. And the time will come, and you will have a random reboot during the backup procedure. It ofcourse happens when ZFS is in use :) So, in my humble opinion you're better off with (open)solaris for now. There were some posts on a @freebsd mailing list about making zfs more stable on amd64 by some VM patching, i don't quite remember the details... On Tue, 8 Jul 2008 01:31:25 -0700 mike wrote: > On 7/8/08, CZUCZY Gergely wrote: > > > Regardless of this, the system worked quite well. If ZFS were stable, this > > easily could be our backup system. ZFS is great, awesome, but a bit > > unreliable on FreeBSD, still needs some work. > > Really? I thought ZFS for basic things was not too bad in FBSD now. > > By basic I mean simple filesystem creation, snapshots and normal > devices. Not some crazy SAN LUNs and weird volume management stuff. > > I would really love to use FBSD as opposed to a Solaris derivative, > since I know nothing about them and I'd have to dedicate a machine for > it at home. Hrm. I wonder if I could just get by running a Solaris > derivative inside of a VM in VMware or something. -- ?dv?lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080708/ddf97959/signature.pgp From mike503 at gmail.com Tue Jul 8 16:22:20 2008 From: mike503 at gmail.com (mike) Date: Tue Jul 8 16:22:27 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708100701.57031cda@twoflower.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> Message-ID: On 7/8/08, CZUCZY Gergely wrote: > Regardless of this, the system worked quite well. If ZFS were stable, this > easily could be our backup system. ZFS is great, awesome, but a bit unreliable > on FreeBSD, still needs some work. I forgot to ask - what are you doing now instead of FBSD+ZFS? From phoemix at harmless.hu Tue Jul 8 19:13:57 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Tue Jul 8 19:14:27 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: <20080708100701.57031cda@twoflower.in.publishing.hu> Message-ID: <20080708211351.069a2bc5@mort.in.publishing.hu> We're still doing what we did. FBSD+ZFS _would_ have been the replacement. We're sucking with linux+dirvish. it's slow, dirvish is kinda retarded, it has many flaws, but it's stable. Well, in a way, you don't lose data like you would do with ZFS :) On Tue, 8 Jul 2008 09:22:20 -0700 mike wrote: > On 7/8/08, CZUCZY Gergely wrote: > > > Regardless of this, the system worked quite well. If ZFS were > > stable, this easily could be our backup system. ZFS is great, > > awesome, but a bit unreliable on FreeBSD, still needs some work. > > I forgot to ask - what are you doing now instead of FBSD+ZFS? -- Sincerely, Gergely CZUCZY, Harmless Digital mailto: gergely.czuczy@harmless.hu Legacy software is software that works. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080708/2df0e7da/signature.pgp From kris at FreeBSD.org Tue Jul 8 19:50:15 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 19:50:21 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: <20080708100701.57031cda@twoflower.in.publishing.hu> Message-ID: <4873C4FA.2020004@FreeBSD.org> mike wrote: > On 7/8/08, CZUCZY Gergely wrote: > >> Regardless of this, the system worked quite well. If ZFS were stable, this >> easily could be our backup system. ZFS is great, awesome, but a bit unreliable >> on FreeBSD, still needs some work. > > Really? I thought ZFS for basic things was not too bad in FBSD now. > > By basic I mean simple filesystem creation, snapshots and normal > devices. Not some crazy SAN LUNs and weird volume management stuff. > > I would really love to use FBSD as opposed to a Solaris derivative, > since I know nothing about them and I'd have to dedicate a machine for > it at home. Hrm. I wonder if I could just get by running a Solaris > derivative inside of a VM in VMware or something. ZFS needs careful memory tuning, but really, it's otherwise stable and it can be done. (ports-i386:~>sysctl hw.ncpu hw.ncpu: 4 (ports-i386:~)> sysctl hw.physmem hw.physmem: 4275478528 (ports-i386:~)> uname -a FreeBSD pointyhat.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #31: Wed Jun 25 19:40:40 UTC 2008 kris@pointyhat.freebsd.org:/usr/src/sys.cvs/amd64/compile/POINTYHAT amd64 (ports-i386:~)> cat /boot/loader.conf vfs.zfs.prefetch_disable=1 vm.kmem_size=1572864000 This machine is highly disk loaded, with 1.08TB of disk, a load average usually between 8-30, currently hosting 94 ZFS filesystems, 898 snapshots, and making heavy use of ZFS features like cloning, incremental snapshot send/receive, etc. The disk workload is highly vnode-intensive, involving concurrent rsyncs over trees containing hundreds of thousands of files, busy NFS exports to about 40 clients, cvs updates, etc, constantly cycling through millions of vnodes. It works just fine. Kris From juri_mian at yahoo.com Tue Jul 8 19:41:41 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Tue Jul 8 19:57:01 2008 Subject: 24 TB UFS2 reality check ? Message-ID: <400492.46414.qm@web45615.mail.sp1.yahoo.com> I am about to attach 24 1 TB drives to a 3ware 9650SE-24 raid card and attach it to a FreeBSD 6.3-RELEASE system. I am going to newfs that raw disk and turn it into one giant 24 TB UFS2 filesystem: newfs -i 65536 -U /dev/da1 I intend to enable quotas on this system BUT I do not intend to set any >2TB quotas for any one particular user. Questions: - anything else I should know ? Any danger ? Other than decreasing inode density, like I am with '-i 65535' are there any other settings I should be considering ? I will set kern.maxdsiz="2572000000" ... which I hope will be enough for fsck. - I have been (sort of) following the recent thread about >2TB quotas - let's say I have a user with _no quota set_ but they amass more than 2 TB of files - is that still a problem ? Or is it only a problem if I actually set a quota for them of >2TB ? Will repquota report correctly, even though they don't have a quota set ? Thanks. From julian at elischer.org Tue Jul 8 20:01:52 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 8 20:01:58 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <400492.46414.qm@web45615.mail.sp1.yahoo.com> References: <400492.46414.qm@web45615.mail.sp1.yahoo.com> Message-ID: <4873C7AE.50809@elischer.org> Juri Mianovich wrote: > I am about to attach 24 1 TB drives to a 3ware 9650SE-24 raid card and attach it to a FreeBSD 6.3-RELEASE system. > > I am going to newfs that raw disk and turn it into one giant 24 TB UFS2 filesystem: > > newfs -i 65536 -U /dev/da1 > > I intend to enable quotas on this system BUT I do not intend to set > any >2TB quotas for any one particular user. > > Questions: > > - anything else I should know ? Any danger ? Other than decreasing inode density, like I am with '-i 65535' are there any other settings I should be considering ? I will set kern.maxdsiz="2572000000" ... which I hope will be enough for fsck. > > - I have been (sort of) following the recent thread about >2TB quotas - let's say I have a user with _no quota set_ but they amass more than 2 TB of files - is that still a problem ? Or is it only a problem if I actually set a quota for them of >2TB ? Will repquota report correctly, even though they don't have a quota set ? > Thanks. > You had better have a lot of memory available ot your processes to be able to fsck this baby.. (it'd better be an amd64).. I don't remember the exact numbers but for 16k blocksize, it was something like 200MB ram for each 100GB of filesystem when populated with 60KB files.. (don't trust those numbers, do some testing (and let us know :-) ) From phoemix at harmless.hu Tue Jul 8 20:13:35 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Tue Jul 8 20:13:42 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873C4FA.2020004@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> Message-ID: <20080708221327.5c1d0e92@mort.in.publishing.hu> Yes Kris, but you've forgot something quite important. What you've just showed is -CURRENT, and how does that thumb-rule is about branches and (semi-)production systems? My faint memories say something like "don't never ever even think of running -CURRENT on a production box", in a polite way. ZFS can be stable on -CURRENT but it's till -CURRENT, with its issues as a production system. So, the last we can go about a backup box is -STABLE, but i also wouldn't prefer that one, if I can. -RELEASE and patches for production, to be safe. Give us a stable ZFS in -RELEASE and -STABLE and we will be statisfied and happy. -CURRENT is still not a way for production boxes, that's asking for trouble. I've finetuned ZFS as much as I could, I've read every little tiny bit of hint/information/whatever that was available and I couldn't get rid of those kmem_size panics in -RELEASE and -STABLE. On Tue, 08 Jul 2008 21:50:18 +0200 Kris Kennaway wrote: > mike wrote: > > On 7/8/08, CZUCZY Gergely wrote: > > > >> Regardless of this, the system worked quite well. If ZFS were > >> stable, this easily could be our backup system. ZFS is great, > >> awesome, but a bit unreliable on FreeBSD, still needs some work. > > > > Really? I thought ZFS for basic things was not too bad in FBSD now. > > > > By basic I mean simple filesystem creation, snapshots and normal > > devices. Not some crazy SAN LUNs and weird volume management stuff. > > > > I would really love to use FBSD as opposed to a Solaris derivative, > > since I know nothing about them and I'd have to dedicate a machine > > for it at home. Hrm. I wonder if I could just get by running a > > Solaris derivative inside of a VM in VMware or something. > > ZFS needs careful memory tuning, but really, it's otherwise stable > and it can be done. > > (ports-i386:~>sysctl hw.ncpu > hw.ncpu: 4 > (ports-i386:~)> sysctl hw.physmem > hw.physmem: 4275478528 > (ports-i386:~)> uname -a > FreeBSD pointyhat.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #31: > Wed Jun 25 19:40:40 UTC 2008 > kris@pointyhat.freebsd.org:/usr/src/sys.cvs/amd64/compile/POINTYHAT > amd64 (ports-i386:~)> cat /boot/loader.conf > vfs.zfs.prefetch_disable=1 > vm.kmem_size=1572864000 > > This machine is highly disk loaded, with 1.08TB of disk, a load > average usually between 8-30, currently hosting 94 ZFS filesystems, > 898 snapshots, and making heavy use of ZFS features like cloning, > incremental snapshot send/receive, etc. The disk workload is highly > vnode-intensive, involving concurrent rsyncs over trees containing > hundreds of thousands of files, busy NFS exports to about 40 clients, > cvs updates, etc, constantly cycling through millions of vnodes. > > It works just fine. > > Kris -- Sincerely, Gergely CZUCZY, Harmless Digital mailto: gergely.czuczy@harmless.hu Legacy software is software that works. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080708/fdf9b61c/signature.pgp From kris at FreeBSD.org Tue Jul 8 20:34:49 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 20:34:54 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708221327.5c1d0e92@mort.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> Message-ID: <4873CF6C.7000205@FreeBSD.org> CZUCZY Gergely wrote: > Yes Kris, but you've forgot something quite important. > What you've just showed is -CURRENT, and how does that thumb-rule is > about branches and (semi-)production systems? > My faint memories say something like "don't never ever even think of > running -CURRENT on a production box", in a polite way. > ZFS can be stable on -CURRENT but it's till -CURRENT, with its issues > as a production system. So, the last we can go about a backup box is > -STABLE, but i also wouldn't prefer that one, if I can. -RELEASE and > patches for production, to be safe. > > Give us a stable ZFS in -RELEASE and -STABLE and we will be statisfied > and happy. -CURRENT is still not a way for production boxes, that's > asking for trouble. It's not relevant that I am running -CURRENT, there have been no changes in ZFS that are not also in -STABLE (and only one bug fix since 7.0-RELEASE, I think -- that was important, but it fixes mmap corruption, not a panic). I run -CURRENT to help debug it, but I am neither making use of ZFS fixes, nor encountering ZFS bugs. > I've finetuned ZFS as much as I could, I've read every little tiny bit > of hint/information/whatever that was available and I couldn't get rid > of those kmem_size panics in -RELEASE and -STABLE. Well, it's still almost certainly because you aren't setting kmem_size high enough. As you saw, that is the only thing I tuned (disabling prefetch is just for performance in my environment). If you can't set it high enough because you don't have enough RAM, that means your system does't have enough RAM to run ZFS, not that ZFS is unstable. Kris From roberto at keltia.freenix.fr Tue Jul 8 20:40:24 2008 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Tue Jul 8 20:40:31 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708221327.5c1d0e92@mort.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> Message-ID: <20080708204021.GA97977@keltia.freenix.fr> According to CZUCZY Gergely: > Yes Kris, but you've forgot something quite important. > What you've just showed is -CURRENT, and how does that thumb-rule is > about branches and (semi-)production systems? There have been no significant change in CURRENT WRT ZFS compared to 7. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr Darwin sidhe.keltia.net Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; i386 From roberto at keltia.freenix.fr Tue Jul 8 20:42:28 2008 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Tue Jul 8 20:42:35 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873C4FA.2020004@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> Message-ID: <20080708204226.GB97977@keltia.freenix.fr> According to Kris Kennaway: > (ports-i386:~)> cat /boot/loader.conf > vfs.zfs.prefetch_disable=1 > vm.kmem_size=1572864000 Hvae you tried w/o the prefetch_disable tunable? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr Darwin sidhe.keltia.net Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; i386 From phoemix at harmless.hu Tue Jul 8 20:54:54 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Tue Jul 8 20:55:01 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873CF6C.7000205@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> Message-ID: <20080708225449.1070252d@mort.in.publishing.hu> On Tue, 08 Jul 2008 22:34:52 +0200 Kris Kennaway wrote: > > I've finetuned ZFS as much as I could, I've read every little tiny > > bit of hint/information/whatever that was available and I couldn't > > get rid of those kmem_size panics in -RELEASE and -STABLE. > > Well, it's still almost certainly because you aren't setting > kmem_size high enough. As you saw, that is the only thing I tuned > (disabling prefetch is just for performance in my environment). > > If you can't set it high enough because you don't have enough RAM, > that means your system does't have enough RAM to run ZFS, not that > ZFS is unstable. I've had a box with 2GB of memory for it, and around 5-6 filesystems. I've set kmem_size as large as it was allowed, not a bit smaller. Where's the guide showing how much memory should I have for a setup? How can "enough memory" be determined for a setup, without having panics? > > Kris > -- Sincerely, Gergely CZUCZY, Harmless Digital mailto: gergely.czuczy@harmless.hu Legacy software is software that works. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080708/949203c9/signature.pgp From juri_mian at yahoo.com Tue Jul 8 20:47:03 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Tue Jul 8 20:59:42 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <4873C7AE.50809@elischer.org> Message-ID: <958164.4787.qm@web45611.mail.sp1.yahoo.com> --- On Tue, 7/8/08, Julian Elischer wrote: > You had better have a lot of memory available ot your > processes to be > able to fsck this baby.. (it'd better be an amd64).. > I don't remember the exact numbers but for 16k > blocksize, > it was something like 200MB ram for each 100GB of > filesystem when > populated with 60KB files.. > (don't trust those numbers, do some testing (and let us > know :-) ) Thank you very much. I currently have a similar system with: /dev/da1 8.0T 1.3T 6.1T 16% /users which was created with 'newfs -i 32768 -U /dev/da1' ... and I can successfully fsck it with my: kern.maxdsiz="2572000000" setting. So perhaps a filesystem 3x that size should be '-i 131072' to maintain the same ability to fsck ? None of these systems are 64-bit - they are all running i386 w/4 GB of ram. If I stuck with '-i 65536' (instead of going all the way to 131072) and things got sticky, I could always temporarily reboot with a maxdsiz closer to 3 GB, right ? From kris at FreeBSD.org Tue Jul 8 21:10:11 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 21:10:17 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708204226.GB97977@keltia.freenix.fr> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708204226.GB97977@keltia.freenix.fr> Message-ID: <4873D7B5.2060901@FreeBSD.org> Ollivier Robert wrote: > According to Kris Kennaway: >> (ports-i386:~)> cat /boot/loader.conf >> vfs.zfs.prefetch_disable=1 >> vm.kmem_size=1572864000 > > Hvae you tried w/o the prefetch_disable tunable? I have not done careful measurements, but casual observation suggests that on my workloads I get better performance without prefetch. I have a pair of mirrored disks on an amr, and my workload is quite random-access so prefetching just introduces latencies and wastes already-saturated disk bandwidth. With more disks or a different workload I would expect different performance characteristics. Kris From julian at elischer.org Tue Jul 8 21:12:28 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 8 21:12:34 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <958164.4787.qm@web45611.mail.sp1.yahoo.com> References: <958164.4787.qm@web45611.mail.sp1.yahoo.com> Message-ID: <4873D83A.2080803@elischer.org> Juri Mianovich wrote: > > > --- On Tue, 7/8/08, Julian Elischer wrote: > > >> You had better have a lot of memory available ot your >> processes to be >> able to fsck this baby.. (it'd better be an amd64).. >> I don't remember the exact numbers but for 16k >> blocksize, >> it was something like 200MB ram for each 100GB of >> filesystem when >> populated with 60KB files.. >> (don't trust those numbers, do some testing (and let us >> know :-) ) > > > Thank you very much. > > I currently have a similar system with: > > /dev/da1 8.0T 1.3T 6.1T 16% /users > > which was created with 'newfs -i 32768 -U /dev/da1' ... and I can > successfully fsck it with my: > > kern.maxdsiz="2572000000" > > setting. > > So perhaps a filesystem 3x that size should be '-i 131072' to > maintain the same ability to fsck ? > > None of these systems are 64-bit - they are all running i386 w/4 GB > of ram. > > If I stuck with '-i 65536' (instead of going all the way to 131072) > and things got sticky, I could always temporarily reboot with a > maxdsiz closer to 3 GB, right ? You will have to do tests to see how big the virtual size of the fsck process gets per TB of disk. you should be able to see it with top. Notice that your 8TB filesystem is only 16% full. The memory will increase to some exgtent when you have more files.. try filling up your 8TB system with files and doing it again. > > From bakul at bitblocks.com Tue Jul 8 21:26:18 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 8 21:26:24 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: Your message of "Tue, 08 Jul 2008 14:17:57 PDT." <336596.22193.qm@web45608.mail.sp1.yahoo.com> Message-ID: <20080708212617.C606C5B75@mail.bitblocks.com> On Tue, 08 Jul 2008 14:17:57 PDT Juri Mianovich wrote: > > I vaguely recall it was more like 700MB of memory per > > Terabyte on a 50% filled UFS2. Things may have improved > > in the three years since I did that. I don't recall the time > > to fsck but it was pretty bad! That was the main reason I > > switched from UFS2. > > Why does fsck need to reserve all that memory in advance and hold it the enti > re fsck ? Is it necessary by definition, or could it be written to not requi > re that ? May be it can but why bother. It just feels wrong to have to check the entire FS state after a crash -- it doesn't scale. From bakul at bitblocks.com Tue Jul 8 21:32:05 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 8 21:32:11 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: Your message of "Tue, 08 Jul 2008 13:01:50 PDT." <4873C7AE.50809@elischer.org> Message-ID: <20080708211344.39FC75B46@mail.bitblocks.com> On Tue, 08 Jul 2008 13:01:50 PDT Julian Elischer wrote: > Juri Mianovich wrote: > > I am about to attach 24 1 TB drives to a 3ware 9650SE-24 raid card > and attach it to a FreeBSD 6.3-RELEASE system. > > > I am going to newfs that raw disk and turn it into one giant 24 TB > UFS2 filesystem: I think Jan is asking for trouble.... At the very least he should consider mirroring or RAID5ing. > You had better have a lot of memory available ot your processes to be > able to fsck this baby.. (it'd better be an amd64).. > I don't remember the exact numbers but for 16k blocksize, > it was something like 200MB ram for each 100GB of filesystem when > populated with 60KB files.. > (don't trust those numbers, do some testing (and let us know :-) ) I vaguely recall it was more like 700MB of memory per Terabyte on a 50% filled UFS2. Things may have improved in the three years since I did that. I don't recall the time to fsck but it was pretty bad! That was the main reason I switched from UFS2. From julian at elischer.org Tue Jul 8 21:44:42 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 8 21:44:49 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <336596.22193.qm@web45608.mail.sp1.yahoo.com> References: <336596.22193.qm@web45608.mail.sp1.yahoo.com> Message-ID: <4873DFC9.6000006@elischer.org> Juri Mianovich wrote: > > > --- On Tue, 7/8/08, Bakul Shah wrote: > > >> I vaguely recall it was more like 700MB of memory per >> Terabyte on a 50% filled UFS2. Things may have improved >> in the three years since I did that. I don't recall >> the time >> to fsck but it was pretty bad! That was the main reason I >> switched from UFS2. > > > Why does fsck need to reserve all that memory in advance and hold it the entire fsck ? Is it necessary by definition, or could it be written to not require that ? > > > it doesn't reserve it .. that's how much data it builds up From juri_mian at yahoo.com Tue Jul 8 21:17:58 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Tue Jul 8 21:45:09 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <20080708211344.39FC75B46@mail.bitblocks.com> Message-ID: <336596.22193.qm@web45608.mail.sp1.yahoo.com> --- On Tue, 7/8/08, Bakul Shah wrote: > I vaguely recall it was more like 700MB of memory per > Terabyte on a 50% filled UFS2. Things may have improved > in the three years since I did that. I don't recall > the time > to fsck but it was pretty bad! That was the main reason I > switched from UFS2. Why does fsck need to reserve all that memory in advance and hold it the entire fsck ? Is it necessary by definition, or could it be written to not require that ? From randy at psg.com Tue Jul 8 21:47:22 2008 From: randy at psg.com (Randy Bush) Date: Tue Jul 8 21:47:28 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873C4FA.2020004@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> Message-ID: <4873E068.8060305@psg.com> light to medium load am64 w 8g hw.ncpu: 2 vm.kmem_size=600M vm.kmem_size_max=600M vfs.zfs.prefetch_disable=1 been trouble free since boot. wondering if it is time to try on i386. anyone with serious experience with i386 zfs? randy From randy at psg.com Tue Jul 8 21:48:32 2008 From: randy at psg.com (Randy Bush) Date: Tue Jul 8 21:48:38 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873E068.8060305@psg.com> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> Message-ID: <4873E0AE.6090305@psg.com> > light to medium load am64 w 8g . first cuppa. it's 4g randy From speedtoys.racing at gmail.com Tue Jul 8 22:03:53 2008 From: speedtoys.racing at gmail.com (Jeff Mohler) Date: Tue Jul 8 22:03:59 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <400492.46414.qm@web45615.mail.sp1.yahoo.com> References: <400492.46414.qm@web45615.mail.sp1.yahoo.com> Message-ID: Wow..the odds of hitting an uncorrectable bit error are actually pretty HIGH in that configuration,. Good luck. You may get to find out why Enterprise arrays are expensive. On Tue, Jul 8, 2008 at 12:28 PM, Juri Mianovich wrote: > I am about to attach 24 1 TB drives to a 3ware 9650SE-24 raid card and attach it to a FreeBSD 6.3-RELEASE system. > > I am going to newfs that raw disk and turn it into one giant 24 TB UFS2 filesystem: > > newfs -i 65536 -U /dev/da1 > > I intend to enable quotas on this system BUT I do not intend to set any >2TB quotas for any one particular user. > > Questions: > > - anything else I should know ? Any danger ? Other than decreasing inode density, like I am with '-i 65535' are there any other settings I should be considering ? I will set kern.maxdsiz="2572000000" ... which I hope will be enough for fsck. > > - I have been (sort of) following the recent thread about >2TB quotas - let's say I have a user with _no quota set_ but they amass more than 2 TB of files - is that still a problem ? Or is it only a problem if I actually set a quota for them of >2TB ? Will repquota report correctly, even though they don't have a quota set ? > > Thanks. > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From mike503 at gmail.com Tue Jul 8 22:31:20 2008 From: mike503 at gmail.com (mike) Date: Tue Jul 8 22:31:31 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873E0AE.6090305@psg.com> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> <4873E0AE.6090305@psg.com> Message-ID: On 7/8/08, Randy Bush wrote: > . first cuppa. it's 4g okay, so it sounds like from multiple people this could be doable and stable enough: dual or quad-core current gen processor (amd64) 4g ram 4 or 6x 1tb disks freebsd 7.0-release these tweaks to /boot/loader.conf: vm.kmem_size=600M vm.kmem_size_max=600M vfs.zfs.prefetch_disable=1 right? I am fine with sacrificing a little bit of speed for stability. This will only run a few rsyncs per day, and then a single snapshot per day per filesystem (with maybe 15 filesystems maximum right now) From kris at FreeBSD.org Tue Jul 8 22:40:28 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 22:40:34 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> <4873E0AE.6090305@psg.com> Message-ID: <4873ECE0.5090706@FreeBSD.org> mike wrote: > On 7/8/08, Randy Bush wrote: > >> . first cuppa. it's 4g > > okay, so it sounds like from multiple people this could be doable and > stable enough: > > dual or quad-core current gen processor (amd64) > 4g ram > 4 or 6x 1tb disks > freebsd 7.0-release > > these tweaks to /boot/loader.conf: > > vm.kmem_size=600M > vm.kmem_size_max=600M > vfs.zfs.prefetch_disable=1 > > right? Might be enough kmem, if not, or if you want better performance from increased caching, increase as high as 1500M Kris From kris at FreeBSD.org Tue Jul 8 22:41:07 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 22:41:15 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873ECE0.5090706@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> <4873E0AE.6090305@psg.com> <4873ECE0.5090706@FreeBSD.org> Message-ID: <4873ED07.8060105@FreeBSD.org> Kris Kennaway wrote: > mike wrote: >> On 7/8/08, Randy Bush wrote: >> >>> . first cuppa. it's 4g >> >> okay, so it sounds like from multiple people this could be doable and >> stable enough: >> >> dual or quad-core current gen processor (amd64) >> 4g ram >> 4 or 6x 1tb disks >> freebsd 7.0-release >> >> these tweaks to /boot/loader.conf: >> >> vm.kmem_size=600M >> vm.kmem_size_max=600M >> vfs.zfs.prefetch_disable=1 >> >> right? > > Might be enough kmem, if not, or if you want better performance from > increased caching, increase as high as 1500M > > Kris > > Also test with/without prefetch to see which is faster on your workload. Kris From randy at psg.com Tue Jul 8 22:47:26 2008 From: randy at psg.com (Randy Bush) Date: Tue Jul 8 22:47:32 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873ECE0.5090706@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> <4873E0AE.6090305@psg.com> <4873ECE0.5090706@FreeBSD.org> Message-ID: <4873EE7C.6030207@psg.com> >> vm.kmem_size=600M >> vm.kmem_size_max=600M >> vfs.zfs.prefetch_disable=1 > Might be enough kmem and where is the 'nuf-a-mometer? randy From kris at FreeBSD.org Tue Jul 8 23:14:45 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 23:14:51 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080708225449.1070252d@mort.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> Message-ID: <4873F4E9.3040203@FreeBSD.org> CZUCZY Gergely wrote: > On Tue, 08 Jul 2008 22:34:52 +0200 > Kris Kennaway wrote: >>> I've finetuned ZFS as much as I could, I've read every little tiny >>> bit of hint/information/whatever that was available and I couldn't >>> get rid of those kmem_size panics in -RELEASE and -STABLE. >> Well, it's still almost certainly because you aren't setting >> kmem_size high enough. As you saw, that is the only thing I tuned >> (disabling prefetch is just for performance in my environment). >> >> If you can't set it high enough because you don't have enough RAM, >> that means your system does't have enough RAM to run ZFS, not that >> ZFS is unstable. > I've had a box with 2GB of memory for it, and around 5-6 filesystems. > I've set kmem_size as large as it was allowed, not a bit smaller. > > Where's the guide showing how much memory should I have for a setup? > How can "enough memory" be determined for a setup, without having > panics? I don't know; empirically my setup is an upper bound. How large was "as large as it was allowed" for you? Kris From kris at FreeBSD.org Tue Jul 8 23:21:31 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 8 23:21:37 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873EE7C.6030207@psg.com> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <4873E068.8060305@psg.com> <4873E0AE.6090305@psg.com> <4873ECE0.5090706@FreeBSD.org> <4873EE7C.6030207@psg.com> Message-ID: <4873F67F.6050202@FreeBSD.org> Randy Bush wrote: >>> vm.kmem_size=600M >>> vm.kmem_size_max=600M >>> vfs.zfs.prefetch_disable=1 >> Might be enough kmem > > and where is the 'nuf-a-mometer? The point at which it no longer panics from memory exhaustion ;) I've put no work into finding out exactly where this is, because my servers have gigabytes of memory and it is a performance optimization for me to set kmem as high as possible, which was previously just over 1500MB on amd64 but has now been increased in 8.0. Kris From speedtoys.racing at gmail.com Wed Jul 9 02:25:54 2008 From: speedtoys.racing at gmail.com (Jeff Mohler) Date: Wed Jul 9 02:26:05 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <96359.64292.qm@web45601.mail.sp1.yahoo.com> References: <96359.64292.qm@web45601.mail.sp1.yahoo.com> Message-ID: One drive has a what..maybe a 1 per 1.0 E15 bits transferred uBER, and you have 24x that of one drive, as each drive it it's statistical crap shoot. Each drive may NEVER hit uBER for you, but one may do it tomorrow. Plus, you have commodity firmware levels on those drives and commodity BER mechanisms, so you COULD argue you have another 2x liability WRT losing it all without HEFTY raid, at least 5+1. Cuz..if you have RAID, and you lose 1 drive, you have to touch a lot of bits to recover that drive, which drives you quickly in to the mathematical region of yet another BER, then you have a double drive failure. Just saying, good luck And even back in 2002, uBER was only a point shorter, and that math was a LOT harder to hit on the much smaller drives/arrays. Let say you go bling for high end SATA drives, you only get about E16 at best, which still isnt good math with a 24TB FS...theoretically. And, not to mention the cubic inches of RAM required to manage it once you start to really fill it, much less fsck it. On Tue, Jul 8, 2008 at 7:04 PM, Juri Mianovich wrote: > > > > --- On Tue, 7/8/08, Jeff Mohler wrote: > > >> Wow..the odds of hitting an uncorrectable bit error are >> actually >> pretty HIGH in that configuration,. >> >> Good luck. >> >> You may get to find out why Enterprise arrays are >> expensive. > > > Can you elaborate ? Why do you say that ? > > Did people say the same thing circa 2002 when the first 1-2TB configurations were being put together with off the shelf parts, or is there something special in particular about >20 TB that I don't understand? > > Thanks. > > > > > From juri_mian at yahoo.com Wed Jul 9 02:04:47 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Wed Jul 9 02:28:12 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: Message-ID: <96359.64292.qm@web45601.mail.sp1.yahoo.com> --- On Tue, 7/8/08, Jeff Mohler wrote: > Wow..the odds of hitting an uncorrectable bit error are > actually > pretty HIGH in that configuration,. > > Good luck. > > You may get to find out why Enterprise arrays are > expensive. Can you elaborate ? Why do you say that ? Did people say the same thing circa 2002 when the first 1-2TB configurations were being put together with off the shelf parts, or is there something special in particular about >20 TB that I don't understand? Thanks. From marck at rinet.ru Wed Jul 9 02:37:02 2008 From: marck at rinet.ru (Dmitry Morozovsky) Date: Wed Jul 9 02:37:10 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873C4FA.2020004@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> Message-ID: <20080709062533.J58331@woozle.rinet.ru> On Tue, 8 Jul 2008, Kris Kennaway wrote: KK> (ports-i386:~)> uname -a KK> FreeBSD pointyhat.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #31: Wed Jun KK> 25 19:40:40 UTC 2008 Wow! I did't realize you switched package building infrastructure to ZFS. Nice and promising! Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From phoemix at harmless.hu Wed Jul 9 05:44:26 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Wed Jul 9 05:44:33 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <4873F4E9.3040203@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> Message-ID: <20080709074420.24df3be4@mort.in.publishing.hu> On Wed, 09 Jul 2008 01:14:49 +0200 Kris Kennaway wrote: > CZUCZY Gergely wrote: > I don't know; empirically my setup is an upper bound. How large was > "as large as it was allowed" for you? Well, we cannot buy "upper bounds" all over, just because some developer is unable to figure out things. I think you can't expect FreeBSD users to spend as much money as possible, just because the devs can't tell how much is enough... It seems more like a twilight zone then a stable feature now ;) It was exactly as much as an amd64 installation would allow with 2GB of physical memory. We've dismissed the setup around february, and I don't have the configs anymore. It was an amd64 setup with 2GB of physical memory. > > Kris -- Sincerely, Gergely CZUCZY, Harmless Digital mailto: gergely.czuczy@harmless.hu Legacy software is software that works. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080709/453822cc/signature.pgp From koitsu at FreeBSD.org Wed Jul 9 05:56:46 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Jul 9 05:56:52 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709074420.24df3be4@mort.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> <20080709074420.24df3be4@mort.in.publishing.hu> Message-ID: <20080709055645.GA40076@eos.sc1.parodius.com> On Wed, Jul 09, 2008 at 07:44:20AM +0200, CZUCZY Gergely wrote: > On Wed, 09 Jul 2008 01:14:49 +0200 > Kris Kennaway wrote: > > > CZUCZY Gergely wrote: > > I don't know; empirically my setup is an upper bound. How large was > > "as large as it was allowed" for you? > Well, we cannot buy "upper bounds" all over, just because some > developer is unable to figure out things. I think you can't expect > FreeBSD users to spend as much money as possible, just because the devs > can't tell how much is enough... > It seems more like a twilight zone then a stable feature now ;) > > It was exactly as much as an amd64 installation would allow with 2GB of > physical memory. We've dismissed the setup around february, and I don't > have the configs anymore. It was an amd64 setup with 2GB of physical > memory. The bottom line here is that i386 and amd64 both have a kmem_size limit of 2GB. You can throw 32GB of RAM into an amd64 box, but FreeBSD will only utilise up to 2GB of that for kmem. That is purely a FreeBSD limitation, and is being dealt with in HEAD by Alan Cox. I believe he has a patch, or it may have been committed -- I don't follow HEAD. I can point people to a mailing list URL, if needed. This is one of the limitations Gergely is referring to. Since ZFS is incredibly memory-hungry, you're forced to tune ZFS to try and get it to "play nice" with that 2GB limit on STABLE/RELEASE systems. You also need to keep in mind that you can't just set kmem_size and kmem_size_max to 2048M, because the kernel needs memory for other things. The tuning parameters I use on my 2GB amd64 and 4GB amd64 boxes are: vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.arc_min="16M" vfs.zfs.arc_max="64M" If you set kmem_size and kmem_size_max any higher than that, the machine will panic on boot, stating (indirectly) that there isn't enough memory available for the kernel to allocate for other things. Until I added the arc_min and arc_max setting, I could occasionally panic the machines under very heavy load (heavy zpool I/O), caused by kmem exhaustion. Since adding the arc_* tunings, I've tried very hard to crash the machines, and I cannot. But there's absolutely no guarantee those tuning parameters above will ensure FreeBSD won't panic due to kmem exhaustion. I believe this is the point Gergely is making about the "stability" of the whole thing. Now, with regards to prefetch_disable, folks can disable that if they want. I disable it on my above systems because for what they do, the overall performance appears better with prefetching disabled. I hope this helps shed some light here... -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From kris at FreeBSD.org Wed Jul 9 10:18:50 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Wed Jul 9 10:19:00 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709062533.J58331@woozle.rinet.ru> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080709062533.J58331@woozle.rinet.ru> Message-ID: <48749087.4070802@FreeBSD.org> Dmitry Morozovsky wrote: > On Tue, 8 Jul 2008, Kris Kennaway wrote: > > KK> (ports-i386:~)> uname -a > KK> FreeBSD pointyhat.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #31: Wed Jun > KK> 25 19:40:40 UTC 2008 > > Wow! I did't realize you switched package building infrastructure to ZFS. > > Nice and promising! Yeah, the server has been using ZFS since some time last year, but recently I went much further and made it make use of (i.e. require) ZFS features like snapshots and clones. Kris From kris at FreeBSD.org Wed Jul 9 10:26:10 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Wed Jul 9 10:26:16 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709055645.GA40076@eos.sc1.parodius.com> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> <20080709074420.24df3be4@mort.in.publishing.hu> <20080709055645.GA40076@eos.sc1.parodius.com> Message-ID: <4874923F.8080303@FreeBSD.org> Jeremy Chadwick wrote: > On Wed, Jul 09, 2008 at 07:44:20AM +0200, CZUCZY Gergely wrote: >> On Wed, 09 Jul 2008 01:14:49 +0200 >> Kris Kennaway wrote: >> >>> CZUCZY Gergely wrote: >>> I don't know; empirically my setup is an upper bound. How large was >>> "as large as it was allowed" for you? >> Well, we cannot buy "upper bounds" all over, just because some >> developer is unable to figure out things. I think you can't expect >> FreeBSD users to spend as much money as possible, just because the devs >> can't tell how much is enough... >> It seems more like a twilight zone then a stable feature now ;) >> >> It was exactly as much as an amd64 installation would allow with 2GB of >> physical memory. We've dismissed the setup around february, and I don't >> have the configs anymore. It was an amd64 setup with 2GB of physical >> memory. > > The bottom line here is that i386 and amd64 both have a kmem_size limit > of 2GB. No, it's the limit on KVA (address space), not kmem_size. On i386 there is only 1GB of KVA by default, so it's even harder to fit ZFS in. I thought you could tune it higher than 2GB if you liked, although this comes out of address space available to user programs (4GB total for user + kernel). > You can throw 32GB of RAM into an amd64 box, but FreeBSD will > only utilise up to 2GB of that for kmem. That is purely a FreeBSD > limitation, and is being dealt with in HEAD by Alan Cox. I believe he > has a patch, or it may have been committed -- I don't follow HEAD. I > can point people to a mailing list URL, if needed. > > This is one of the limitations Gergely is referring to. No it's not, since he has only 2GB of physical memory. > Since ZFS is incredibly memory-hungry, you're forced to tune ZFS to try > and get it to "play nice" with that 2GB limit on STABLE/RELEASE systems. > You also need to keep in mind that you can't just set kmem_size and > kmem_size_max to 2048M, because the kernel needs memory for other > things. > > The tuning parameters I use on my 2GB amd64 and 4GB amd64 boxes are: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.arc_min="16M" > vfs.zfs.arc_max="64M" > > If you set kmem_size and kmem_size_max any higher than that, the machine > will panic on boot, stating (indirectly) that there isn't enough memory > available for the kernel to allocate for other things. Yes, I said this earlier :) > Until I added the arc_min and arc_max setting, I could occasionally > panic the machines under very heavy load (heavy zpool I/O), caused by > kmem exhaustion. Since adding the arc_* tunings, I've tried very hard > to crash the machines, and I cannot. Good to hear. > But there's absolutely no guarantee those tuning parameters above will > ensure FreeBSD won't panic due to kmem exhaustion. I believe this is > the point Gergely is making about the "stability" of the whole thing. Not having the resources to run a very memory-intensive filesystem does not make it "unstable", it makes it "too memory intensive". Kris From gergely.czuczy at harmless.hu Wed Jul 9 10:37:37 2008 From: gergely.czuczy at harmless.hu (CZUCZY Gergely) Date: Wed Jul 9 10:37:45 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709055645.GA40076@eos.sc1.parodius.com> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> <20080709074420.24df3be4@mort.in.publishing.hu> <20080709055645.GA40076@eos.sc1.parodius.com> Message-ID: <20080709123729.60d2431a@twoflower.in.publishing.hu> Thank you for your thought, Jeremy. Yes, I was trying to refer to these things. Putting more memory into the system then ZFS's need doesn't prove anything. There's no _proven_ _garantee_ that it won't panic, it just makes it more difficult (lowers the possibilty) to panic the system. As you said tuning ARC to a lower and kmem_size{,_max} to a higher value makes it less likely to panic, but this won't garantee anything, just makes panicing bigger. "Stable ZFS" would mean, that these circumstances are cleared, and there's a proven garantee (either mathematically) that it's _unable_ to panic due to this memory allocation issue. It's still there, but with a bigger amount of memory it's less likely to happen. And I haven't tried prefetch_disable back then. So i've got no experiences on the effects of prefetch_disable. On Tue, 8 Jul 2008 22:56:45 -0700 Jeremy Chadwick wrote: > On Wed, Jul 09, 2008 at 07:44:20AM +0200, CZUCZY Gergely wrote: > > On Wed, 09 Jul 2008 01:14:49 +0200 > > Kris Kennaway wrote: > > > > > CZUCZY Gergely wrote: > > > I don't know; empirically my setup is an upper bound. How large was > > > "as large as it was allowed" for you? > > Well, we cannot buy "upper bounds" all over, just because some > > developer is unable to figure out things. I think you can't expect > > FreeBSD users to spend as much money as possible, just because the devs > > can't tell how much is enough... > > It seems more like a twilight zone then a stable feature now ;) > > > > It was exactly as much as an amd64 installation would allow with 2GB of > > physical memory. We've dismissed the setup around february, and I don't > > have the configs anymore. It was an amd64 setup with 2GB of physical > > memory. > > The bottom line here is that i386 and amd64 both have a kmem_size limit > of 2GB. You can throw 32GB of RAM into an amd64 box, but FreeBSD will > only utilise up to 2GB of that for kmem. That is purely a FreeBSD > limitation, and is being dealt with in HEAD by Alan Cox. I believe he > has a patch, or it may have been committed -- I don't follow HEAD. I > can point people to a mailing list URL, if needed. > > This is one of the limitations Gergely is referring to. > > Since ZFS is incredibly memory-hungry, you're forced to tune ZFS to try > and get it to "play nice" with that 2GB limit on STABLE/RELEASE systems. > You also need to keep in mind that you can't just set kmem_size and > kmem_size_max to 2048M, because the kernel needs memory for other > things. > > The tuning parameters I use on my 2GB amd64 and 4GB amd64 boxes are: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.arc_min="16M" > vfs.zfs.arc_max="64M" > > If you set kmem_size and kmem_size_max any higher than that, the machine > will panic on boot, stating (indirectly) that there isn't enough memory > available for the kernel to allocate for other things. > > Until I added the arc_min and arc_max setting, I could occasionally > panic the machines under very heavy load (heavy zpool I/O), caused by > kmem exhaustion. Since adding the arc_* tunings, I've tried very hard > to crash the machines, and I cannot. > > But there's absolutely no guarantee those tuning parameters above will > ensure FreeBSD won't panic due to kmem exhaustion. I believe this is > the point Gergely is making about the "stability" of the whole thing. > > Now, with regards to prefetch_disable, folks can disable that if they > want. I disable it on my above systems because for what they do, the > overall performance appears better with prefetching disabled. > > I hope this helps shed some light here... > -- ?dv?lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080709/469167cb/signature.pgp From marck at rinet.ru Wed Jul 9 10:53:04 2008 From: marck at rinet.ru (Dmitry Morozovsky) Date: Wed Jul 9 10:53:10 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <48749087.4070802@FreeBSD.org> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080709062533.J58331@woozle.rinet.ru> <48749087.4070802@FreeBSD.org> Message-ID: <20080709145010.Q58331@woozle.rinet.ru> On Wed, 9 Jul 2008, Kris Kennaway wrote: KK> > KK> (ports-i386:~)> uname -a KK> > KK> FreeBSD pointyhat.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #31: Wed KK> > Jun KK> > KK> 25 19:40:40 UTC 2008 KK> > KK> > Wow! I did't realize you switched package building infrastructure to ZFS. KK> > KK> > Nice and promising! KK> KK> Yeah, the server has been using ZFS since some time last year, but recently KK> I went much further and made it make use of (i.e. require) ZFS features like KK> snapshots and clones. Is it documented somewhere? Hmm, well. marck@wizzle:/usr/ports> grep -Ril zfs Tools/ Tools/portbuild/scripts/claim-chroot Tools/portbuild/scripts/clean-chroot Tools/portbuild/scripts/cleanup-chroots ;-) Any pitfalls while using this? Thanks! Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From kris at FreeBSD.org Wed Jul 9 11:17:10 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Wed Jul 9 11:17:16 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709123729.60d2431a@twoflower.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> <20080709074420.24df3be4@mort.in.publishing.hu> <20080709055645.GA40076@eos.sc1.parodius.com> <20080709123729.60d2431a@twoflower.in.publishing.hu> Message-ID: <48749E2E.40308@FreeBSD.org> CZUCZY Gergely wrote: > "Stable ZFS" would mean, that these circumstances are cleared, and there's a > proven garantee (either mathematically) that it's _unable_ to panic due to this > memory allocation issue. I suppose you can choose to use this definition if you like, but it must be kind of terrifying to live in a world where all but the most trivial of programs are "unstable" and MIGHT CRASH AT ANY MOMENT OH GOD NO. While technically true, I don't think it's a functionally useful definition to equate "stable" with "proven to be perfect", so I won't continue to debate the point. ZFS is what it is, several of us have shown that it is possible to tune memory parameters to make it fit into a FreeBSD kernel, and users can either take that for what it's worth, or decide that ZFS is not for them. Kris From randy at psg.com Wed Jul 9 11:50:54 2008 From: randy at psg.com (Randy Bush) Date: Wed Jul 9 11:51:01 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: <20080709123729.60d2431a@twoflower.in.publishing.hu> References: <20080708100701.57031cda@twoflower.in.publishing.hu> <4873C4FA.2020004@FreeBSD.org> <20080708221327.5c1d0e92@mort.in.publishing.hu> <4873CF6C.7000205@FreeBSD.org> <20080708225449.1070252d@mort.in.publishing.hu> <4873F4E9.3040203@FreeBSD.org> <20080709074420.24df3be4@mort.in.publishing.hu> <20080709055645.GA40076@eos.sc1.parodius.com> <20080709123729.60d2431a@twoflower.in.publishing.hu> Message-ID: <4874A61C.5060805@psg.com> > There's no _proven_ _garantee_ that it won't panic go back to computability 301. the complexity of the systems we are all talking about are many many orders of magnitude beyond anything for which we can [dis]prove simple termination. so wind the rhetoric down a notch, please if you want the only guaranteed state people here can give you, unplug your machine. otherwise, you'll just have to accept experience, empirical testing, vetted advice, and wisdom. randy From lopez.on.the.lists at yellowspace.net Wed Jul 9 13:34:17 2008 From: lopez.on.the.lists at yellowspace.net (Lorenzo Perone) Date: Wed Jul 9 13:34:23 2008 Subject: Thinking of using ZFS/FBSD for a backup system In-Reply-To: References: Message-ID: <743AA903-03CB-4F43-A30B-9D06C58A4EAC@yellowspace.net> On 08.07.2008, at 08:15, mike wrote: > Just wanted to sanity check here before investing some money and time > into this solution... > > Also if anyone wants to reply to me off list with hardware that works > well for FBSD 7 + ZFS I'd be grateful :) On 09.07.2008, at 13:17, Kris Kennaway wrote: > ZFS is what it is, several of us have shown that it is possible to > tune memory parameters to make it fit into a FreeBSD kernel, and > users can either take that for what it's worth, or decide that ZFS > is not for them. > > Kris I know I'm definitively NOT going to make any friends here this way ;) BUT: At the present time, my impression is that if you need/want to put anything business-critical on a ZFS pool: go opensolaris. I got to know it on a SUN box, at least the bit to run and compile a few things, and it was definitively a pain compared to the good old FreeBSD hier(7). But for as far as zfs goes, I can sleep at night. Not that I wouldn't prefer the other way around, I'm one of those whom others have to stop before he starts installing FreeBSD on everything that has a CPU and runs as a server ;) I'm sure zfs on FreeBSD will be at least as stable (and, easilly, even better performing) as soon as pjb has the time to share his newest patches. I'll be one of the first csup'ping on RELENG_7 as soon as it's done. It's a way too sexy filesystem not to do so, and that's the reason why this kind of threads pop up regularly... As for my current experience, I have a box that backs up two offices every night and snapshots several filesystems, which is now running stably a zfs pool since about 2 months (which is almost a record for that box). It's a 7.0-STABLE #8 Sat Apr 26 10:10:53 CEST 2008, amd64 with 2 GB of RAM and the following in /boot/loader.conf: vm.kmem_size=900M vm.kmem_size_max=900M vfs.zfs.arc_max=300M vfs.zfs.prefetch_disable=1 However I'm happy that it is down the corridor, so if the whole thing gets stuck again, or panics, I can still watch it while it reboots on the console... BTW I can't recommend to rely on ufs snapshots while waiting for zfs to become stable - I tried and it almost destroyed that filesystem after a few runs (besides, it takes ages to finish a snapshot, compared to zfs). Regards, (and long live FreeBSD, it's half my life atm) Lorenzo From olli at lurza.secnetix.de Wed Jul 9 14:24:02 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Wed Jul 9 14:24:07 2008 Subject: Filesystem is not clean - run fsck In-Reply-To: <20080707154805.GA57420@lor.one-eyed-alien.net> Message-ID: <200807091423.m69ENtJM075767@lurza.secnetix.de> Brooks Davis wrote: > On Sat, Jul 05, 2008 at 10:58:33AM +0200, Carlos Luna wrote: > > Hi I'd used freenas about 5 years without any problem. Now I can?t mount my > > raid volume and in his sourceforge forums seems they cant help me. Hope this > > list is the right list for my issue. > > > > When I try to fsck,I get: > > casa:/dev# fsck -t ufs -y /dev/pst0s1 > > ** /dev/pst0s1 > > ** Last Mounted on /mnt/raid > > ** Phase 1 - Check Blocks and Sizes > > -4439300862985009506 BAD I=86 > > 3443570138036206556 BAD I=86 > > -7476842757969057647 BAD I=86 > > -8078484667502176485 BAD I=86 > > 2249916482063805839 BAD I=86 > > -3291681609520367063 BAD I=86 > > 7780434385339928353 BAD I=86 > > -4372486048108189431 BAD I=86 > > 8774078035736727371 BAD I=86 > > -2035310265760485777 BAD I=86 > > 6848295312539782814 BAD I=86 > > EXCESSIVE BAD BLKS I=86 > > CONTINUE? yes > > > > ... > > .... > > > > UNKNOWN FILE TYPE I=7254140 > > CLEAR? yes > > > > UNKNOWN FILE TYPE I=7254141 > > CLEAR? yes > > > > UNKNOWN FILE TYPE I=7254142 > > CLEAR? yes > > > > UNKNOWN FILE TYPE I=7254143 > > CLEAR? yes > > > > fsck_ufs: cannot alloc 3037795832 bytes for inoinfo > > I have a lot of info there, 1 TB. I will appreciate any help. > > It looks like you have a somewhat large file system, apparently with a > lot of small files on it. The message indicates that you need to be > able to allocate over 3GB of address space to handle this. As such you > will need a 64-bit machine, ideally with 4GB or more RAM and probably > with a large swap partition. > > In theory it should be possible to write a constrained memory use version of > fsck, but to my knowledge no one has done so and I suspect it would be a > time consuming development effort. There's another possibility. I remember cases where the FS structures were damaged in a way that fsck picked up wrong size information, and then tried to allocate ridiculously large amounts of memory, even for a small file system. That could be the case here, too. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "And believe me, as a C++ programmer, I don't hesitate to question the decisions of language designers. After a decent amount of C++ exposure, Python's flaws seem ridiculously small." -- Ville Vainio From juri_mian at yahoo.com Wed Jul 9 19:13:05 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Wed Jul 9 19:41:37 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: Message-ID: <947384.22013.qm@web45601.mail.sp1.yahoo.com> Hello Jeff, --- On Tue, 7/8/08, Jeff Mohler wrote: > One drive has a what..maybe a 1 per 1.0 E15 bits transferred > uBER, and > you have 24x that of one drive, as each drive it it's > statistical crap > shoot. Each drive may NEVER hit uBER for you, but one may > do it > tomorrow. > > Plus, you have commodity firmware levels on those drives > and commodity > BER mechanisms, so you COULD argue you have another 2x > liability WRT > losing it all without HEFTY raid, at least 5+1. Thank you - I understand. You are worried because of the lack of redundancy. I didn't want to make my questions any more complicated than they were, but since we are on the topic, I will tell you that _in reality_ I will not make a 24 TB array, I will in fact use the raid-6 functionality (two parity drives) of my card and make a ~22 TB array. Does that address the concerns you were raising ? Does 22 data and 2 parity (raid 6) still make you very nervous, or does that completely change the scenario you were worried about ? Thanks. From speedtoys.racing at gmail.com Thu Jul 10 01:02:19 2008 From: speedtoys.racing at gmail.com (Jeff Mohler) Date: Thu Jul 10 01:02:26 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <8e10486b0807091759m7cf4a04dsa4538594bc4a9304@mail.gmail.com> References: <947384.22013.qm@web45601.mail.sp1.yahoo.com> <8e10486b0807091759m7cf4a04dsa4538594bc4a9304@mail.gmail.com> Message-ID: Lets see..a peak of maybe 25-30 random drive IOPS/sec at 15ms MINIMAL latency per IO (likely more like 35-40)..gonna be ugly. Complicated by normal load IOPS..you could expect it all to simply "dissapear" for a day while it reconstructs. On Wed, Jul 9, 2008 at 5:59 PM, Alexandre Biancalana wrote: > On 7/9/08, Juri Mianovich wrote: >> >> Hello Jeff, >> >> >> --- On Tue, 7/8/08, Jeff Mohler wrote: >> >> >> >> > One drive has a what..maybe a 1 per 1.0 E15 bits transferred >> > uBER, and >> > you have 24x that of one drive, as each drive it it's >> > statistical crap >> > shoot. Each drive may NEVER hit uBER for you, but one may >> > do it >> > tomorrow. >> > >> > Plus, you have commodity firmware levels on those drives >> > and commodity >> > BER mechanisms, so you COULD argue you have another 2x >> > liability WRT >> > losing it all without HEFTY raid, at least 5+1. >> >> >> >> Thank you - I understand. You are worried because of the lack of redundancy. >> >> I didn't want to make my questions any more complicated than they were, but since we are on the topic, I will tell you that _in reality_ I will not make a 24 TB array, I will in fact use the raid-6 functionality (two parity drives) of my card and make a ~22 TB array. >> >> Does that address the concerns you were raising ? Does 22 data and 2 parity (raid 6) still make you very nervous, or does that completely change the scenario you were worried about ? > > > I did be fewer nervous if you do 2 arrays of 11 disks... what`s the > time that it will take to do a rebuild of a failed drive in your > normal load ?? > From biancalana at gmail.com Thu Jul 10 01:28:56 2008 From: biancalana at gmail.com (Alexandre Biancalana) Date: Thu Jul 10 01:29:03 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <947384.22013.qm@web45601.mail.sp1.yahoo.com> References: <947384.22013.qm@web45601.mail.sp1.yahoo.com> Message-ID: <8e10486b0807091759m7cf4a04dsa4538594bc4a9304@mail.gmail.com> On 7/9/08, Juri Mianovich wrote: > > Hello Jeff, > > > --- On Tue, 7/8/08, Jeff Mohler wrote: > > > > > One drive has a what..maybe a 1 per 1.0 E15 bits transferred > > uBER, and > > you have 24x that of one drive, as each drive it it's > > statistical crap > > shoot. Each drive may NEVER hit uBER for you, but one may > > do it > > tomorrow. > > > > Plus, you have commodity firmware levels on those drives > > and commodity > > BER mechanisms, so you COULD argue you have another 2x > > liability WRT > > losing it all without HEFTY raid, at least 5+1. > > > > Thank you - I understand. You are worried because of the lack of redundancy. > > I didn't want to make my questions any more complicated than they were, but since we are on the topic, I will tell you that _in reality_ I will not make a 24 TB array, I will in fact use the raid-6 functionality (two parity drives) of my card and make a ~22 TB array. > > Does that address the concerns you were raising ? Does 22 data and 2 parity (raid 6) still make you very nervous, or does that completely change the scenario you were worried about ? I did be fewer nervous if you do 2 arrays of 11 disks... what`s the time that it will take to do a rebuild of a failed drive in your normal load ?? From juri_mian at yahoo.com Thu Jul 10 15:55:19 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Thu Jul 10 16:12:44 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: Message-ID: <223496.96060.qm@web45607.mail.sp1.yahoo.com> Jeff, --- On Wed, 7/9/08, Jeff Mohler wrote: > Lets see..a peak of maybe 25-30 random drive IOPS/sec at > 15ms MINIMAL > latency per IO (likely more like 35-40)..gonna be ugly. > > Complicated by normal load IOPS..you could expect it all to > simply > "dissapear" for a day while it reconstructs. Once again, thank you very much - your comments are very helpful. So we've moved from "dangerous" (24 TB with no raid) to "inconvenient" (24 TB with raid 6). Two final questions: 1. What would _you_ do with 24 1 TB disks and a 24 port 3ware card ? Assume an i386, 4 GB machine, and that fsck is workable because of "newfs -i 131072" 2. What number should I ask my vendor (3ware) to do the rebuild calculations ? You are talking about IOPS/s - I think I should ask them how many IOPS/s the card does when rebuilding a 24 disk raid-6 array, and then combine that with the IOPS/s I see in my normal workload. How do you measure IOPS/s in FreeBSD on a running machine ? And, of course, any other comments appreciated. Thanks. From juri_mian at yahoo.com Thu Jul 10 16:00:40 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Thu Jul 10 16:12:56 2008 Subject: the quota question ... one user with >2 TB owned files (but no quota set) Message-ID: <806386.22750.qm@web45604.mail.sp1.yahoo.com> I am going to be running a large array. I will have quotas in the kernel and enabled BUT all users I set quotas on will be nowhere near the 2TB barrier I see people talking about recently. HOWEVER, at some point in the future, root or www (or both) users will _own more than_ 2 TB of files. They will not have a quota set on them, but they will in fact own >2 TB of files. Is this also a problem ? Or is the only problem actually _setting_ a quota larger than 2TB ? I assume the output in "repquota /my/fs" will be broken, and that is fine with me - I just don't want to corrupt or damage my filesystem (or existing quotas) the day that my www user goes over 2TB of owned files. Also, I am distrustful of merely testing this - just because things run fine for a day with quotas turned on and some user owning more than 2 TB of files doesn't mean it won't blow up at some future date in some interesting scenario - and that is why I am asking for opinions here rather than just creating >2 TB of files and turning on quotas. Does anyone out there already do this and can reassure me ? Thanks a lot. From gpalmer at freebsd.org Thu Jul 10 17:25:25 2008 From: gpalmer at freebsd.org (Gary Palmer) Date: Thu Jul 10 17:25:31 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <223496.96060.qm@web45607.mail.sp1.yahoo.com> References: <223496.96060.qm@web45607.mail.sp1.yahoo.com> Message-ID: <20080710172522.GA92945@in-addr.com> On Thu, Jul 10, 2008 at 08:55:17AM -0700, Juri Mianovich wrote: > > Jeff, > > > --- On Wed, 7/9/08, Jeff Mohler wrote: > > > Lets see..a peak of maybe 25-30 random drive IOPS/sec at > > 15ms MINIMAL > > latency per IO (likely more like 35-40)..gonna be ugly. > > > > Complicated by normal load IOPS..you could expect it all to > > simply > > "dissapear" for a day while it reconstructs. > > > Once again, thank you very much - your comments are very helpful. > > So we've moved from "dangerous" (24 TB with no raid) to "inconvenient" (24 TB with raid 6). > > Two final questions: > > 1. What would _you_ do with 24 1 TB disks and a 24 port 3ware card ? > Assume an i386, 4 GB machine, and that fsck is workable because of > "newfs -i 131072" > > 2. What number should I ask my vendor (3ware) to do the rebuild > calculations ? You are talking about IOPS/s - I think I should ask > them how many IOPS/s the card does when rebuilding a 24 disk raid-6 > array, and then combine that with the IOPS/s I see in my normal workload. > At least on older 3Ware cards, and I suspect the one you're talking about will do it also, you can control the rebuild rate. There are 5 different options provided on my card (8506-4LP) for "background task rate" which either increases or decreases the IOPS used for the rebuild. > How do you measure IOPS/s in FreeBSD on a running machine ? iostat and/or systat are the two I use. Regards, Gary From speedtoys.racing at gmail.com Thu Jul 10 17:38:56 2008 From: speedtoys.racing at gmail.com (Jeff Mohler) Date: Thu Jul 10 17:39:02 2008 Subject: 24 TB UFS2 reality check ? In-Reply-To: <223496.96060.qm@web45607.mail.sp1.yahoo.com> References: <223496.96060.qm@web45607.mail.sp1.yahoo.com> Message-ID: > Two final questions: > > 1. What would _you_ do with 24 1 TB disks and a 24 port 3ware card ? Assume an i386, 4 GB machine, and that fsck is workable because of "newfs -i 131072" --- Im biased. If I had to have 24TB online, id get a netapp. 24TB of data has gotta be worth some real money, so I'd spend it. No..youre not wrong for NOT doing that, but, I wouldnt consider the 24TB nightmare myself, ive been there before. I never full healed from that mess. But if I had to steer you in a specific direction, plan for total failure. HDD's are built to do one thing. Fail. They occasionally hold data, but in reality, theyre just built to fail. Some are designed to fail sooner than others. You have 24 of them racing for failure. Sounds pessimistic, but, good backup & DR strategies are built around this. > 2. What number should I ask my vendor (3ware) to do the rebuild calculations ? You are talking about IOPS/s - I think I should ask them how many IOPS/s the card does when rebuilding a 24 disk raid-6 array, and then combine that with the IOPS/s I see in my normal workload. --- Rebuild calculations are based around how fast the drives are, cheap SATA is about 14ms track to track (longer full seek) and its highly random, and it has to compete with user/system workload. Thats just not possible to state, too many variables. IOPS are not a card issue, its a physical drive issue. 24TB of FCAL would rebuild faster than SATA, for example. The intelligence of the card itself could come into play, in case it is able to use command queuing to the drives/etc...and if the SATA drives fully support it as well. Depending what you want the system to do for users during the rebuild, prioritize the card appropriately. > How do you measure IOPS/s in FreeBSD on a running machine ? --- iostat -x is a pretty good way to measure that, for the most part. Im prepared to hear about different/better ways. From dan at audilis.com Thu Jul 10 19:18:54 2008 From: dan at audilis.com (Daniel E. Lynn) Date: Thu Jul 10 19:19:01 2008 Subject: Storing UFS snapshots externally? Message-ID: <48765802.9060502@audilis.com> Greetings all, First, let me apologize if this has been asked before. If it has then my search skills must be lacking because I couldn't seem to find it in the lists anywhere. I'm wondering if it is feasible to store snapshots for UFS on a separate drive, and if so if there is an advisable way of doing it. The basic idea is that I'd like to be able to mitigate any write overhead of using a lot of snapshots by using a separate disk for them entirely. Here's the proposed setup: FreeBSD (/) is on ad0 Homedirs and userdata (/data) is on gm0 (ad2+ad3 mirrored) I've been successfully using snapshots for /data, and the overhead on this system doesn't seem too bad (yet) but if I have more than a dozen snapshots, I get the feeling it could get messy. It'd be great if I could store the snapshots for /data on / someplace. I just don't know if there are any "gotchyas" that might keep me from doing this. Obviously I could symlink the /data/.snap dir to someplace on /, but I'm a little weary it might not be that easy. Ideas? From kris at FreeBSD.org Thu Jul 10 19:23:11 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Thu Jul 10 19:23:17 2008 Subject: Storing UFS snapshots externally? In-Reply-To: <48765802.9060502@audilis.com> References: <48765802.9060502@audilis.com> Message-ID: <4876619C.4050108@FreeBSD.org> Daniel E. Lynn wrote: > Greetings all, > > First, let me apologize if this has been asked before. If it has then my > search skills must be lacking because I couldn't seem to find it in the > lists anywhere. > > I'm wondering if it is feasible to store snapshots for UFS on a separate > drive, and if so if there is an advisable way of doing it. The basic > idea is that I'd like to be able to mitigate any write overhead of using > a lot of snapshots by using a separate disk for them entirely. Here's > the proposed setup: No, the point of snapshots is that they are copy-on-write, so you don't copy the data until it changes. If you want to store them up externally, you have to copy the whole thing, and you should use a tool like dump, rsync, etc. Kris From ivoras at freebsd.org Thu Jul 10 20:05:03 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Jul 10 20:05:09 2008 Subject: Storing UFS snapshots externally? In-Reply-To: <48765802.9060502@audilis.com> References: <48765802.9060502@audilis.com> Message-ID: Daniel E. Lynn wrote: > Greetings all, > > First, let me apologize if this has been asked before. If it has then my > search skills must be lacking because I couldn't seem to find it in the > lists anywhere. > > I'm wondering if it is feasible to store snapshots for UFS on a separate > drive, and if so if there is an advisable way of doing it. The basic > idea is that I'd like to be able to mitigate any write overhead of using > a lot of snapshots by using a separate disk for them entirely. Here's > the proposed setup: > > FreeBSD (/) is on ad0 > Homedirs and userdata (/data) is on gm0 (ad2+ad3 mirrored) > > I've been successfully using snapshots for /data, and the overhead on > this system doesn't seem too bad (yet) but if I have more than a dozen > snapshots, I get the feeling it could get messy. It'd be great if I > could store the snapshots for /data on / someplace. UFS snapshots don't copy the data into the "snapshot" file - they just adjust internal references in the file system. The big file you get when you create the snapshot isn't really a file in the traditional sense - it consists of file system internal pointers to real data. Simplified, when data gets changed on the "real" file system, *then* the old data gets a separate copy in the snapshot. In short, there's no way other than manually copying (dd, tar) the data from the snapshot to wherever. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080710/1fdb75fa/signature.pgp From kaluna at gmail.com Fri Jul 11 08:40:22 2008 From: kaluna at gmail.com (Carlos Luna) Date: Fri Jul 11 08:40:29 2008 Subject: Filesystem is not clean - run fsck In-Reply-To: <200807091423.m69ENtJM075767@lurza.secnetix.de> References: <20080707154805.GA57420@lor.one-eyed-alien.net> <200807091423.m69ENtJM075767@lurza.secnetix.de> Message-ID: <11c17ec30807110140xbeec510ke54eaf1829fc4894@mail.gmail.com> Yes, that would be the case. There is a patch made by someone to the fsck to "solve" this, but who made the pacth said too "expect lot loose of date", son I'm just looking for a harmless solution. By the way I'm just triyng to mount it read-only. Regards Mike 2008/7/9 Oliver Fromme : > Brooks Davis wrote: > > On Sat, Jul 05, 2008 at 10:58:33AM +0200, Carlos Luna wrote: > > > Hi I'd used freenas about 5 years without any problem. Now I can?t > mount my > > > raid volume and in his sourceforge forums seems they cant help me. > Hope this > > > list is the right list for my issue. > > > > > > When I try to fsck,I get: > > > casa:/dev# fsck -t ufs -y /dev/pst0s1 > > > ** /dev/pst0s1 > > > ** Last Mounted on /mnt/raid > > > ** Phase 1 - Check Blocks and Sizes > > > -4439300862985009506 BAD I=86 > > > 3443570138036206556 BAD I=86 > > > -7476842757969057647 BAD I=86 > > > -8078484667502176485 BAD I=86 > > > 2249916482063805839 BAD I=86 > > > -3291681609520367063 BAD I=86 > > > 7780434385339928353 BAD I=86 > > > -4372486048108189431 BAD I=86 > > > 8774078035736727371 BAD I=86 > > > -2035310265760485777 BAD I=86 > > > 6848295312539782814 BAD I=86 > > > EXCESSIVE BAD BLKS I=86 > > > CONTINUE? yes > > > > > > ... > > > .... > > > > > > UNKNOWN FILE TYPE I=7254140 > > > CLEAR? yes > > > > > > UNKNOWN FILE TYPE I=7254141 > > > CLEAR? yes > > > > > > UNKNOWN FILE TYPE I=7254142 > > > CLEAR? yes > > > > > > UNKNOWN FILE TYPE I=7254143 > > > CLEAR? yes > > > > > > fsck_ufs: cannot alloc 3037795832 bytes for inoinfo > > > I have a lot of info there, 1 TB. I will appreciate any help. > > > > It looks like you have a somewhat large file system, apparently with a > > lot of small files on it. The message indicates that you need to be > > able to allocate over 3GB of address space to handle this. As such you > > will need a 64-bit machine, ideally with 4GB or more RAM and probably > > with a large swap partition. > > > > In theory it should be possible to write a constrained memory use > version of > > fsck, but to my knowledge no one has done so and I suspect it would be a > > time consuming development effort. > > There's another possibility. I remember cases where the FS > structures were damaged in a way that fsck picked up wrong > size information, and then tried to allocate ridiculously > large amounts of memory, even for a small file system. > > That could be the case here, too. > > Best regards > Oliver > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- > chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > "And believe me, as a C++ programmer, I don't hesitate to question > the decisions of language designers. After a decent amount of C++ > exposure, Python's flaws seem ridiculously small." -- Ville Vainio > From olli at lurza.secnetix.de Fri Jul 11 10:25:47 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Fri Jul 11 10:25:54 2008 Subject: Filesystem is not clean - run fsck In-Reply-To: <11c17ec30807110140xbeec510ke54eaf1829fc4894@mail.gmail.com> Message-ID: <200807111025.m6BAPiIY010156@lurza.secnetix.de> Carlos Luna wrote: > Yes, that would be the case. There is a patch made by someone to the fsck to > "solve" this, but who made the pacth said too "expect lot loose of date", > son I'm just looking for a harmless solution. By the way I'm just triyng to > mount it read-only. You used fsck -y, so you should already expect to have lost data. Quoting from the manual page's description of the -y option: "this should be used with great caution as this is a free license to continue after essentially unlimited trouble has been encountered." Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "People still program in C. People keep writing shell scripts. *Most* people don't realize the shortcomings of the tools they are using because they a) don't reflect on their workflows and they are b) too lazy to check out alternatives to realize there is help." -- Simon 'corecode' Schubert From juri_mian at yahoo.com Fri Jul 11 12:35:51 2008 From: juri_mian at yahoo.com (Juri Mianovich) Date: Fri Jul 11 13:06:13 2008 Subject: the quota question ... one user with >2 TB owned files (but no quota set) Message-ID: <964824.75951.qm@web45615.mail.sp1.yahoo.com> > I am going to be running a large array. > > I will have quotas in the kernel and enabled BUT all users I set quotas > on will be nowhere near the 2TB barrier I see people talking about > recently. > > HOWEVER, at some point in the future, root or www (or both) users will > _own more than_ 2 TB of files. They will not have a quota set on them, > but they will in fact own >2 TB of files. > > Is this also a problem ? Or is the only problem actually _setting_ a > quota larger than 2TB ? > > I assume the output in "repquota /my/fs" will be broken, and that is > fine with me - I just don't want to corrupt or damage my filesystem (or > existing quotas) the day that my www user goes over 2TB of owned files. > > Also, I am distrustful of merely testing this - just because things run > fine for a day with quotas turned on and some user owning more than 2 TB > of files doesn't mean it won't blow up at some future date in some > interesting scenario - and that is why I am asking for opinions here > rather than just creating >2 TB of files and turning on quotas. > > Does anyone out there already do this and can reassure me ? I haven't heard anything - which is not surprising, since it doesn't sound like many people are using quotas these days. Does anyone have any general thoughts as to whether this will be dangerous or not ? I know (I assume) that repquota output for my root and www users will be broken, and that's fine - I just want to make sure that as soon as one user goes over 2TB of owned files the filesystem doesn't trash itself. Can the quota subsystem failing in some way cause data loss / filesystem inconsistencies ? From 000.fbsd at quip.cz Fri Jul 11 14:53:23 2008 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Fri Jul 11 14:53:43 2008 Subject: the quota question ... one user with >2 TB owned files (but no quota set) In-Reply-To: <964824.75951.qm@web45615.mail.sp1.yahoo.com> References: <964824.75951.qm@web45615.mail.sp1.yahoo.com> Message-ID: <48776FE6.3060307@quip.cz> Juri Mianovich wrote: [...] > I haven't heard anything - which is not surprising, since it doesn't sound like many people are using quotas these days. > > Does anyone have any general thoughts as to whether this will be dangerous or not ? I know (I assume) that repquota output for my root and www users will be broken, and that's fine - I just want to make sure that as soon as one user goes over 2TB of owned files the filesystem doesn't trash itself. > > Can the quota subsystem failing in some way cause data loss / filesystem inconsistencies ? I think there are not so many users running similar configuration, it means not much experiences. But you can try your own test easily. Just install some test machine and try it with large sparse files (something like dd if=/dev/zero of=sparse-file bs=1 count=1 seek=1024k for 1M sparse file) Miroslav Lachman From olli at lurza.secnetix.de Fri Jul 11 17:52:14 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Fri Jul 11 17:52:20 2008 Subject: the quota question ... one user with >2 TB owned files (but no quota set) In-Reply-To: <48776FE6.3060307@quip.cz> Message-ID: <200807111752.m6BHq8n2031070@lurza.secnetix.de> Miroslav Lachman wrote: > Juri Mianovich wrote: > [...] > > I haven't heard anything - which is not surprising, since it > > doesn't sound like many people are using quotas these days. > > > > Does anyone have any general thoughts as to whether this will be > > dangerous or not ? I know (I assume) that repquota output for my > > root and www users will be broken, and that's fine - I just want to > > make sure that as soon as one user goes over 2TB of owned files the > > filesystem doesn't trash itself. > > > > Can the quota subsystem failing in some way cause data loss / > > filesystem inconsistencies ? > > I think there are not so many users running similar configuration, it > means not much experiences. But you can try your own test easily. Just > install some test machine and try it with large sparse files That doesn't help, because only physically allocated space accounts towards quotas. An sparse file of 1 GB that consists entirely of zeros uses only 48 KB of physical disk space (with default UFS2 newfs parameters), so it uses only 48 KB from your quota, not 1 GB. > (something like dd if=/dev/zero of=sparse-file bs=1 count=1 seek=1024k > for 1M sparse file) There's an easier way to do that: truncate -s 1m sparse-file Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Clear perl code is better than unclear awk code; but NOTHING comes close to unclear perl code" (taken from comp.lang.awk FAQ) From salehoo at tradingtoolslist.com Fri Jul 11 19:29:48 2008 From: salehoo at tradingtoolslist.com (Wholesale Suppliers) Date: Fri Jul 11 19:29:54 2008 Subject: Find Fantastic Suppliers Message-ID: SaleHoo gives you the most in-depth list of wholesale suppliers and distributors available today. They have gone through the internet and searched for all of the major suppliers, which are now at your fingertips. They are determined to help each and every single person as if their life depended on it, and we find this wonderful. If you are looking for goods at a price you can't give up, Click the link below! http://tradingtoolslist.com/c/oUHhGKu_uxQBSl3mmD4RQA.html?0 SaleHoo should be your first choice no questions asked. They even offer on-line chat for those of you who may be a little unsure of what it offers. To remove yourself from this list, click here http://tradingtoolslist.com/u/oUHhGKu_uxQBSl3mmD4RQA.html or write to us at: 1173 A. Second Ave, PNB 147New York NY 10065 From kaluna at gmail.com Sat Jul 12 10:17:48 2008 From: kaluna at gmail.com (Carlos Luna) Date: Sat Jul 12 10:17:54 2008 Subject: Filesystem is not clean - run fsck In-Reply-To: <200807111025.m6BAPiIY010156@lurza.secnetix.de> References: <11c17ec30807110140xbeec510ke54eaf1829fc4894@mail.gmail.com> <200807111025.m6BAPiIY010156@lurza.secnetix.de> Message-ID: <11c17ec30807120317t1f344396rcd3cfd7ca6abaedd@mail.gmail.com> Is there any other option? Regards Mike 2008/7/11 Oliver Fromme : > Carlos Luna wrote: > > Yes, that would be the case. There is a patch made by someone to the > fsck to > > "solve" this, but who made the pacth said too "expect lot loose of > date", > > son I'm just looking for a harmless solution. By the way I'm just triyng > to > > mount it read-only. > > You used fsck -y, so you should already expect to have > lost data. Quoting from the manual page's description > of the -y option: "this should be used with great caution > as this is a free license to continue after essentially > unlimited trouble has been encountered." > > Best regards > Oliver > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- > chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > "People still program in C. People keep writing shell scripts. *Most* > people don't realize the shortcomings of the tools they are using because > they a) don't reflect on their workflows and they are b) too lazy to check > out alternatives to realize there is help." -- Simon 'corecode' Schubert > From linimon at FreeBSD.org Sat Jul 12 22:04:36 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sat Jul 12 22:04:41 2008 Subject: kern/125536: [ext2fs] ext 2 mounts cleanly but fails on commands like ls Message-ID: <200807122204.m6CM4aEA048110@freefall.freebsd.org> Old Synopsis: ext 2 mounts cleanly but fails on commands like ls New Synopsis: [ext2fs] ext 2 mounts cleanly but fails on commands like ls Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Jul 12 22:04:20 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=125536 From bugmaster at FreeBSD.org Mon Jul 14 11:06:58 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 14 11:07:40 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200807141106.m6EB6v0f014402@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li 8 problems total. From rmacklem at uoguelph.ca Tue Jul 15 19:50:03 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Jul 15 19:50:09 2008 Subject: executable open until unmount Message-ID: I'm testing my nfsv4 client and I've run into this issue under FreeBSD7.0. When I execute a file on the nfs mounted volume, the file remains open until the vnode gets cleared out, usually when I unmount. For NFSv4, this isn't a particularily good thing, since these Opens tie up resources on the NFS server, etc. Anyone know if there is something I'm doing incorrectly that causes this or a way to get the close to happen when the executable terminates? Thanks in advance for any help, rick From kostikbel at gmail.com Tue Jul 15 20:36:46 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Jul 15 20:36:53 2008 Subject: executable open until unmount In-Reply-To: References: Message-ID: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> On Tue, Jul 15, 2008 at 02:57:23PM -0400, Rick Macklem wrote: > I'm testing my nfsv4 client and I've run into this issue under FreeBSD7.0. > > When I execute a file on the nfs mounted volume, the file remains open > until the vnode gets cleared out, usually when I unmount. For NFSv4, this > isn't a particularily good thing, since these Opens tie up resources on > the NFS server, etc. > > Anyone know if there is something I'm doing incorrectly that causes this > or a way to get the close to happen when the executable terminates? > > Thanks in advance for any help, rick Try this: diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index f4335a2..c3ef0e9 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -496,6 +496,7 @@ interpret: interplabel = mac_vnode_label_alloc(); mac_vnode_copy_label(binvp->v_label, interplabel); #endif + VOP_CLOSE(binvp, FREAD, td->td_ucred, td); vput(binvp); vm_object_deallocate(imgp->object); imgp->object = NULL; @@ -845,6 +846,7 @@ exec_fail_dealloc: if (imgp->vp != NULL) { if (args->fname) NDFREE(ndp, NDF_ONLY_PNBUF); + VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); vput(imgp->vp); } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080715/43ae86c6/attachment.pgp From rmacklem at uoguelph.ca Wed Jul 16 15:22:04 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Jul 16 15:22:10 2008 Subject: executable open until unmount In-Reply-To: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> References: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> Message-ID: Patch looks good. It fixed my problem and hasn't crashed the system yet;-) Thanks, rick On Tue, 15 Jul 2008, Kostik Belousov wrote: > On Tue, Jul 15, 2008 at 02:57:23PM -0400, Rick Macklem wrote: >> I'm testing my nfsv4 client and I've run into this issue under FreeBSD7.0. >> >> When I execute a file on the nfs mounted volume, the file remains open >> until the vnode gets cleared out, usually when I unmount. For NFSv4, this >> isn't a particularily good thing, since these Opens tie up resources on >> the NFS server, etc. >> >> Anyone know if there is something I'm doing incorrectly that causes this >> or a way to get the close to happen when the executable terminates? >> >> Thanks in advance for any help, rick > > Try this: > > diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c > index f4335a2..c3ef0e9 100644 > --- a/sys/kern/kern_exec.c > +++ b/sys/kern/kern_exec.c > @@ -496,6 +496,7 @@ interpret: > interplabel = mac_vnode_label_alloc(); > mac_vnode_copy_label(binvp->v_label, interplabel); > #endif > + VOP_CLOSE(binvp, FREAD, td->td_ucred, td); > vput(binvp); > vm_object_deallocate(imgp->object); > imgp->object = NULL; > @@ -845,6 +846,7 @@ exec_fail_dealloc: > if (imgp->vp != NULL) { > if (args->fname) > NDFREE(ndp, NDF_ONLY_PNBUF); > + VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); > vput(imgp->vp); > } > > From kostikbel at gmail.com Wed Jul 16 15:44:13 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Jul 16 15:44:19 2008 Subject: executable open until unmount In-Reply-To: References: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> Message-ID: <20080716154407.GG17123@deviant.kiev.zoral.com.ua> On Wed, Jul 16, 2008 at 11:32:28AM -0400, Rick Macklem wrote: > Patch looks good. It fixed my problem and hasn't crashed the system yet;-) Did you tested both elf executables and #!-scripts ? > > Thanks, rick > > On Tue, 15 Jul 2008, Kostik Belousov wrote: > > >On Tue, Jul 15, 2008 at 02:57:23PM -0400, Rick Macklem wrote: > >>I'm testing my nfsv4 client and I've run into this issue under FreeBSD7.0. > >> > >>When I execute a file on the nfs mounted volume, the file remains open > >>until the vnode gets cleared out, usually when I unmount. For NFSv4, this > >>isn't a particularily good thing, since these Opens tie up resources on > >>the NFS server, etc. > >> > >>Anyone know if there is something I'm doing incorrectly that causes this > >>or a way to get the close to happen when the executable terminates? > >> > >>Thanks in advance for any help, rick > > > >Try this: > > > >diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c > >index f4335a2..c3ef0e9 100644 > >--- a/sys/kern/kern_exec.c > >+++ b/sys/kern/kern_exec.c > >@@ -496,6 +496,7 @@ interpret: > > interplabel = mac_vnode_label_alloc(); > > mac_vnode_copy_label(binvp->v_label, interplabel); > >#endif > >+ VOP_CLOSE(binvp, FREAD, td->td_ucred, td); > > vput(binvp); > > vm_object_deallocate(imgp->object); > > imgp->object = NULL; > >@@ -845,6 +846,7 @@ exec_fail_dealloc: > > if (imgp->vp != NULL) { > > if (args->fname) > > NDFREE(ndp, NDF_ONLY_PNBUF); > >+ VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); > > vput(imgp->vp); > > } > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080716/b7eb2d4c/attachment.pgp From rmacklem at uoguelph.ca Wed Jul 16 22:24:41 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Jul 16 22:24:47 2008 Subject: Which GSSAPI library does FreeBSD use? Message-ID: Hope this isn't too simplistic for this list, but I need to know which GSSAPI library sources are being used. They don't appear to be either vanilla MIT nor Heimdal. It appears the sources aren't in the generic source tree, either. Happen to know where they can be grabbed? Thanks in advance for any help, rick From kostikbel at gmail.com Thu Jul 17 11:02:53 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Jul 17 11:03:00 2008 Subject: executable open until unmount In-Reply-To: <20080716154407.GG17123@deviant.kiev.zoral.com.ua> References: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> <20080716154407.GG17123@deviant.kiev.zoral.com.ua> Message-ID: <20080717110247.GI17123@deviant.kiev.zoral.com.ua> On Wed, Jul 16, 2008 at 06:44:07PM +0300, Kostik Belousov wrote: > On Wed, Jul 16, 2008 at 11:32:28AM -0400, Rick Macklem wrote: > > Patch looks good. It fixed my problem and hasn't crashed the system yet;-) > Did you tested both elf executables and #!-scripts ? > > > > > Thanks, rick And, in fact, the patch has a problem. Namely, it does not properly track the opened status of the text vnode, because exec_check_permission() could not opened it in case of error. Please, retest the change below. diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index f4335a2..e31ca37 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -369,6 +369,7 @@ do_execve(td, args, mac_p) imgp->entry_addr = 0; imgp->vmspace_destroyed = 0; imgp->interpreted = 0; + imgp->opened = 0; imgp->interpreter_name = args->buf + PATH_MAX + ARG_MAX; imgp->auxargs = NULL; imgp->vp = NULL; @@ -496,6 +497,10 @@ interpret: interplabel = mac_vnode_label_alloc(); mac_vnode_copy_label(binvp->v_label, interplabel); #endif + if (imgp->opened) { + VOP_CLOSE(binvp, FREAD, td->td_ucred, td); + imgp->opened = 0; + } vput(binvp); vm_object_deallocate(imgp->object); imgp->object = NULL; @@ -845,6 +850,8 @@ exec_fail_dealloc: if (imgp->vp != NULL) { if (args->fname) NDFREE(ndp, NDF_ONLY_PNBUF); + if (imgp->opened) + VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); vput(imgp->vp); } @@ -1326,6 +1333,8 @@ exec_check_permissions(imgp) * general case). */ error = VOP_OPEN(vp, FREAD, td->td_ucred, td, NULL); + if (error == 0) + imgp->opened = 1; return (error); } diff --git a/sys/sys/imgact.h b/sys/sys/imgact.h index 85eaea8..011a7ae 100644 --- a/sys/sys/imgact.h +++ b/sys/sys/imgact.h @@ -58,6 +58,7 @@ struct image_params { unsigned long entry_addr; /* entry address of target executable */ char vmspace_destroyed; /* flag - we've blown away original vm space */ char interpreted; /* flag - this executable is interpreted */ + char opened; /* flag - we have opened executable vnode */ char *interpreter_name; /* name of the interpreter */ void *auxargs; /* ELF Auxinfo structure pointer */ struct sf_buf *firstpage; /* first page that we mapped */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080717/b7161f5c/attachment.pgp From rmacklem at uoguelph.ca Thu Jul 17 14:56:53 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Jul 17 14:57:00 2008 Subject: executable open until unmount In-Reply-To: <20080716154407.GG17123@deviant.kiev.zoral.com.ua> References: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> <20080716154407.GG17123@deviant.kiev.zoral.com.ua> Message-ID: On Wed, 16 Jul 2008, Kostik Belousov wrote: > On Wed, Jul 16, 2008 at 11:32:28AM -0400, Rick Macklem wrote: >> Patch looks good. It fixed my problem and hasn't crashed the system yet;-) > Did you tested both elf executables and #!-scripts ? > Yep. (At least I have now. I had only tested Elf when I posted:-) Working fine without crashes sofar, rick >> >> Thanks, rick >> >> On Tue, 15 Jul 2008, Kostik Belousov wrote: >> >>> On Tue, Jul 15, 2008 at 02:57:23PM -0400, Rick Macklem wrote: >>>> I'm testing my nfsv4 client and I've run into this issue under FreeBSD7.0. >>>> >>>> When I execute a file on the nfs mounted volume, the file remains open >>>> until the vnode gets cleared out, usually when I unmount. For NFSv4, this >>>> isn't a particularily good thing, since these Opens tie up resources on >>>> the NFS server, etc. >>>> >>>> Anyone know if there is something I'm doing incorrectly that causes this >>>> or a way to get the close to happen when the executable terminates? >>>> >>>> Thanks in advance for any help, rick >>> >>> Try this: >>> >>> diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c >>> index f4335a2..c3ef0e9 100644 >>> --- a/sys/kern/kern_exec.c >>> +++ b/sys/kern/kern_exec.c >>> @@ -496,6 +496,7 @@ interpret: >>> interplabel = mac_vnode_label_alloc(); >>> mac_vnode_copy_label(binvp->v_label, interplabel); >>> #endif >>> + VOP_CLOSE(binvp, FREAD, td->td_ucred, td); >>> vput(binvp); >>> vm_object_deallocate(imgp->object); >>> imgp->object = NULL; >>> @@ -845,6 +846,7 @@ exec_fail_dealloc: >>> if (imgp->vp != NULL) { >>> if (args->fname) >>> NDFREE(ndp, NDF_ONLY_PNBUF); >>> + VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); >>> vput(imgp->vp); >>> } >>> >>> > From rmacklem at uoguelph.ca Thu Jul 17 15:20:34 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Jul 17 15:20:41 2008 Subject: executable open until unmount In-Reply-To: <20080717110247.GI17123@deviant.kiev.zoral.com.ua> References: <20080715203641.GA17123@deviant.kiev.zoral.com.ua> <20080716154407.GG17123@deviant.kiev.zoral.com.ua> <20080717110247.GI17123@deviant.kiev.zoral.com.ua> Message-ID: Retested modified patch and seems fine both ways. I also tried one that couldn't be opened. Thanks again, rick On Thu, 17 Jul 2008, Kostik Belousov wrote: > On Wed, Jul 16, 2008 at 06:44:07PM +0300, Kostik Belousov wrote: >> On Wed, Jul 16, 2008 at 11:32:28AM -0400, Rick Macklem wrote: >>> Patch looks good. It fixed my problem and hasn't crashed the system yet;-) >> Did you tested both elf executables and #!-scripts ? >> >>> >>> Thanks, rick > > And, in fact, the patch has a problem. Namely, it does not properly > track the opened status of the text vnode, because exec_check_permission() > could not opened it in case of error. > > Please, retest the change below. > > diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c > index f4335a2..e31ca37 100644 > --- a/sys/kern/kern_exec.c > +++ b/sys/kern/kern_exec.c > @@ -369,6 +369,7 @@ do_execve(td, args, mac_p) > imgp->entry_addr = 0; > imgp->vmspace_destroyed = 0; > imgp->interpreted = 0; > + imgp->opened = 0; > imgp->interpreter_name = args->buf + PATH_MAX + ARG_MAX; > imgp->auxargs = NULL; > imgp->vp = NULL; > @@ -496,6 +497,10 @@ interpret: > interplabel = mac_vnode_label_alloc(); > mac_vnode_copy_label(binvp->v_label, interplabel); > #endif > + if (imgp->opened) { > + VOP_CLOSE(binvp, FREAD, td->td_ucred, td); > + imgp->opened = 0; > + } > vput(binvp); > vm_object_deallocate(imgp->object); > imgp->object = NULL; > @@ -845,6 +850,8 @@ exec_fail_dealloc: > if (imgp->vp != NULL) { > if (args->fname) > NDFREE(ndp, NDF_ONLY_PNBUF); > + if (imgp->opened) > + VOP_CLOSE(imgp->vp, FREAD, td->td_ucred, td); > vput(imgp->vp); > } > > @@ -1326,6 +1333,8 @@ exec_check_permissions(imgp) > * general case). > */ > error = VOP_OPEN(vp, FREAD, td->td_ucred, td, NULL); > + if (error == 0) > + imgp->opened = 1; > return (error); > } > > diff --git a/sys/sys/imgact.h b/sys/sys/imgact.h > index 85eaea8..011a7ae 100644 > --- a/sys/sys/imgact.h > +++ b/sys/sys/imgact.h > @@ -58,6 +58,7 @@ struct image_params { > unsigned long entry_addr; /* entry address of target executable */ > char vmspace_destroyed; /* flag - we've blown away original vm space */ > char interpreted; /* flag - this executable is interpreted */ > + char opened; /* flag - we have opened executable vnode */ > char *interpreter_name; /* name of the interpreter */ > void *auxargs; /* ELF Auxinfo structure pointer */ > struct sf_buf *firstpage; /* first page that we mapped */ > From mark at legios.org Fri Jul 18 06:03:41 2008 From: mark at legios.org (mark@legios.org) Date: Fri Jul 18 06:03:48 2008 Subject: ZFS, SHA-256 and crypto accelerators. Message-ID: <2f89f278568aef731caf1d9874d846f7.squirrel@www.legios.org> Hey all, I'm just wondering if I have a crypto device (in my case, it's on a VIA CPU), and I select sha256 for the checksum algorithm, is it being accelerated by the crypto device? Cheers! Mark From mark at legios.org Fri Jul 18 07:29:53 2008 From: mark at legios.org (mark@legios.org) Date: Fri Jul 18 07:29:59 2008 Subject: ZFS, SHA-256 and crypto accelerators. In-Reply-To: <20080718064948.GB1976@garage.freebsd.pl> References: <2f89f278568aef731caf1d9874d846f7.squirrel@www.legios.org> <20080718064948.GB1976@garage.freebsd.pl> Message-ID: > On Fri, Jul 18, 2008 at 04:02:55PM +1000, mark@legios.org wrote: >> Hey all, >> >> I'm just wondering if I have a crypto device (in my case, it's on a VIA >> CPU), and I select sha256 for the checksum algorithm, is it being >> accelerated by the crypto device? > > No, currently not, but this is a very good idea. Would you mind moving > this idea into PR, so I won't forget about it? > > -- Not a problem - done, raised as misc/125738 Cheers! Mark From pjd at FreeBSD.org Fri Jul 18 09:50:21 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Fri Jul 18 09:50:28 2008 Subject: ZFS, SHA-256 and crypto accelerators. In-Reply-To: <2f89f278568aef731caf1d9874d846f7.squirrel@www.legios.org> References: <2f89f278568aef731caf1d9874d846f7.squirrel@www.legios.org> Message-ID: <20080718064948.GB1976@garage.freebsd.pl> On Fri, Jul 18, 2008 at 04:02:55PM +1000, mark@legios.org wrote: > Hey all, > > I'm just wondering if I have a crypto device (in my case, it's on a VIA > CPU), and I select sha256 for the checksum algorithm, is it being > accelerated by the crypto device? No, currently not, but this is a very good idea. Would you mind moving this idea into PR, so I won't forget about it? -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080718/b8d4e077/attachment.pgp From pjd at FreeBSD.org Fri Jul 18 09:50:23 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Fri Jul 18 09:50:55 2008 Subject: ZFS, SHA-256 and crypto accelerators. In-Reply-To: References: <2f89f278568aef731caf1d9874d846f7.squirrel@www.legios.org> <20080718064948.GB1976@garage.freebsd.pl> Message-ID: <20080718073704.GE1976@garage.freebsd.pl> On Fri, Jul 18, 2008 at 05:29:07PM +1000, mark@legios.org wrote: > > On Fri, Jul 18, 2008 at 04:02:55PM +1000, mark@legios.org wrote: > >> Hey all, > >> > >> I'm just wondering if I have a crypto device (in my case, it's on a VIA > >> CPU), and I select sha256 for the checksum algorithm, is it being > >> accelerated by the crypto device? > > > > No, currently not, but this is a very good idea. Would you mind moving > > this idea into PR, so I won't forget about it? > > > > -- > > Not a problem - done, raised as misc/125738 Got it, thanks. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080718/c04af2d9/attachment.pgp From bugmaster at FreeBSD.org Mon Jul 21 11:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 21 11:07:34 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200807211106.m6LB6sea031856@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li 8 problems total. From jh at saunalahti.fi Tue Jul 22 08:16:56 2008 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Tue Jul 22 08:17:03 2008 Subject: birthtime initialization In-Reply-To: <200806020800.m528038T072838@freefall.freebsd.org> References: <200806020800.m528038T072838@freefall.freebsd.org> Message-ID: <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> On 2008-06-02, Bruce Evans wrote: [about patch for ext2fs in PR kern/122047] > % + vap->va_birthtime.tv_sec = 0; > % + vap->va_birthtime.tv_nsec = 0; > > This is unrelated and should be handled centrally. Almost all file > systems get this wrong. Most fail to set va_birthtime, so stat() > returns kernel stack garbage for st_birthtime. ffs1 does the same > as the above. msdosfs does the above correctly, by setting tv_sec to > (time_t)-1 in unsupported cases. How about this patch? %%% Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 180588) +++ sys/kern/vfs_vnops.c (working copy) @@ -703,6 +703,9 @@ vn_stat(vp, sb, active_cred, file_cred, #endif vap = &vattr; + /* Not all file systems initialize birthtime. */ + VATTR_NULL(vap); + error = VOP_GETATTR(vp, vap, active_cred, td); if (error) return (error); Index: sys/ufs/ufs/ufs_vnops.c =================================================================== --- sys/ufs/ufs/ufs_vnops.c (revision 180588) +++ sys/ufs/ufs/ufs_vnops.c (working copy) @@ -410,8 +410,8 @@ ufs_getattr(ap) vap->va_mtime.tv_nsec = ip->i_din1->di_mtimensec; vap->va_ctime.tv_sec = ip->i_din1->di_ctime; vap->va_ctime.tv_nsec = ip->i_din1->di_ctimensec; - vap->va_birthtime.tv_sec = 0; - vap->va_birthtime.tv_nsec = 0; + vap->va_birthtime.tv_sec = (time_t)-1; + vap->va_birthtime.tv_nsec = -1; vap->va_bytes = dbtob((u_quad_t)ip->i_din1->di_blocks); } else { vap->va_rdev = ip->i_din2->di_rdev; Index: sys/fs/msdosfs/msdosfs_vnops.c =================================================================== --- sys/fs/msdosfs/msdosfs_vnops.c (revision 180588) +++ sys/fs/msdosfs/msdosfs_vnops.c (working copy) @@ -345,8 +345,8 @@ msdosfs_getattr(ap) 0, &vap->va_birthtime); } else { vap->va_atime = vap->va_mtime; - vap->va_birthtime.tv_sec = -1; - vap->va_birthtime.tv_nsec = 0; + vap->va_birthtime.tv_sec = (time_t)-1; + vap->va_birthtime.tv_nsec = -1; } vap->va_flags = 0; if ((dep->de_Attributes & ATTR_ARCHIVE) == 0) Index: sys/nfsclient/nfs_subs.c =================================================================== --- sys/nfsclient/nfs_subs.c (revision 180588) +++ sys/nfsclient/nfs_subs.c (working copy) @@ -628,6 +628,8 @@ nfs_loadattrcache(struct vnode **vpp, st vap->va_rdev = rdev; mtime_save = vap->va_mtime; vap->va_mtime = mtime; + vap->va_birthtime.tv_sec = (time_t)-1; + vap->va_birthtime.tv_nsec = -1; vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; if (v3) { vap->va_nlink = fxdr_unsigned(u_short, fp->fa_nlink); %%% The patch adds VATTR_NULL() call to vn_stat() to initialize the vattr structure before VOP_GETATTR() call. VATTR_NULL() initializes va_birthtime.tv_sec and va_birthtime.tv_nsec to -1 (VNOVAL). I also changed UFS1 and msdosfs to use consistent values. NFS needs explicit initialization because otherwise values would be set to 0 due to memory obtained with M_ZERO flag. I have tested the patch with UFS2, UFS1, cd9660, nfs, ext2fs and smbfs. (There's also more information about the problem in this message: http://lists.freebsd.org/pipermail/freebsd-bugs/2008-March/029682.html) -- Jaakko From phk at phk.freebsd.dk Tue Jul 22 11:41:17 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Jul 22 11:41:30 2008 Subject: birthtime initialization In-Reply-To: Your message of "Tue, 22 Jul 2008 10:57:19 +0300." <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> Message-ID: <37898.1216725417@critter.freebsd.dk> In message <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi>, Jaakko Heinonen writes: >On 2008-06-02, Bruce Evans wrote: >[about patch for ext2fs in PR kern/122047] >> % + vap->va_birthtime.tv_sec = 0; >> % + vap->va_birthtime.tv_nsec = 0; >> >> This is unrelated and should be handled centrally. Almost all file >> systems get this wrong. Most fail to set va_birthtime, so stat() >> returns kernel stack garbage for st_birthtime. ffs1 does the same >> as the above. msdosfs does the above correctly, by setting tv_sec to >> (time_t)-1 in unsupported cases. > >How about this patch? Looks like something Kirk forgot to me. We want to macroize the NOVAL for timespec instead of spreading -1 casts all over. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From brde at optusnet.com.au Tue Jul 22 14:56:58 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Tue Jul 22 14:57:06 2008 Subject: birthtime initialization In-Reply-To: <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> References: <200806020800.m528038T072838@freefall.freebsd.org> <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> Message-ID: <20080722215249.K17453@delplex.bde.org> On Tue, 22 Jul 2008, Jaakko Heinonen wrote: > On 2008-06-02, Bruce Evans wrote: > [about patch for ext2fs in PR kern/122047] >> % + vap->va_birthtime.tv_sec = 0; >> % + vap->va_birthtime.tv_nsec = 0; >> >> This is unrelated and should be handled centrally. Almost all file >> systems get this wrong. Most fail to set va_birthtime, so stat() >> returns kernel stack garbage for st_birthtime. ffs1 does the same >> as the above. msdosfs does the above correctly, by setting tv_sec to >> (time_t)-1 in unsupported cases. > > How about this patch? > > %%% > Index: sys/kern/vfs_vnops.c > =================================================================== > --- sys/kern/vfs_vnops.c (revision 180588) > +++ sys/kern/vfs_vnops.c (working copy) > @@ -703,6 +703,9 @@ vn_stat(vp, sb, active_cred, file_cred, > #endif > > vap = &vattr; > + /* Not all file systems initialize birthtime. */ > + VATTR_NULL(vap); > + > error = VOP_GETATTR(vp, vap, active_cred, td); > if (error) > return (error); I want to initialize va_birthtime to { -1, 0 } here only. Don't initialize the whole vattr here. VOP_GETTATR() is supposed to initalize everything, but doesn't for va_birthtime. If there any other fields that VOP_GETTATR() doesn't initialize, then these should be searched for and fixed instead of setting them to the garbage value given by vattr_null. Similarly, if there are any fields that aren't supported by most file systems, then they should be searched for and defaulted like va_birthtime instead of requiring indivual file systems to invent a default value for them. > Index: sys/ufs/ufs/ufs_vnops.c > ... > Index: sys/fs/msdosfs/msdosfs_vnops.c > ... > Index: sys/nfsclient/nfs_subs.c There are a probably more file systems that have missing or slightly incorrect (all zero) settings of va_birthtime. > The patch adds VATTR_NULL() call to vn_stat() to initialize the vattr > structure before VOP_GETATTR() call. VATTR_NULL() initializes > va_birthtime.tv_sec and va_birthtime.tv_nsec to -1 (VNOVAL). I also > changed UFS1 and msdosfs to use consistent values. NFS needs explicit > initialization because otherwise values would be set to 0 due to memory > obtained with M_ZERO flag. VNOVAL = -1 only accidentally gives the correct value for va_birthtime.tv_sec. It gives a wrong value for va_birthtime.tv_nsec. It is better to set va_birthtime.tv_sec explicitly to -1. This -1 is only accidentantally equal to VNOVAL. Fortunately, this accident doesn't prevent VOP_GETATTR() from setting va_birthtime, since VNOVAL is only magic for VOP_SETATTR(). phk replied (but didn't quote enough, so I merged this manually): >> Looks like something Kirk forgot to me. >> We want to macroize the NOVAL for timespec instead of spreading >> -1 casts all over. This isn't a problem for the "GET" interface since VNOVAL doesn't apply to it. Also, the casts of -1 aren't really needed. ufs_settattr() doesn't have them for time_t's, and vattr_null() doesn't have them for anything. The correctness of this depends on the type of time_t (and the other va field times). In userland we're supposed to cast -1 to time_t for error detection in mktime() etc. In userland, time_t can be any arithmetic time so it is possible for (time_t)-1 != -1. Even there, I think there is only a problem if time_t is an unsigned intergral type shorter than int. Compilers may warn about other cases. ufs_settatr() has the casts for va_bytes (bogus cast of va_bytes to int, which breaks its value), va_uid, va_gid and va_mode. For va_mode, there is a problem -- the same one as in my example for time_t above -- va_mode is u_short so it cannot equal -1 (after the default promotions) except on exotic systems. For va_uid and va_gid, the casts were needed 15 years ago when uid_t and gid_t were 16 bits. I can't see any problem with omitting the cast for va_bytes -- va_bytes is u_quad_t, which is certainly at least as large as int, so it can equal VNOVAL = -1 after the default promotions though it cannot represent any negative value (now C's conversion rules requires (uquad_t)-1 == -1, and it would be a compiler bug to warn about expressions that depend on these rules). In vattr_null(), the assignments go the other way and VNOVAL = -1 always gets converted to the intended value (which is not always -1). C's conversion rules are depended on even more here to do something reasonable with (foo_t)-1. I wouldn't like VNOVAL being replaced by VNOTIMESPECVAL, VNOUIDVAL, ... etc. Recently I noticed a commit that replaced (struct foo *)0 by NULL together with less contentions replacements of plain 0 by NULL. Old code that tries to be careful uses (struct foo *)0 (or a macro NULLFOO for this) too much. Now that NULL is Standard we can just use plain NULL. Similarly for plain VNOVAL except in a few cases where -1 doesn't get converted right. Bruce From lists at jnielsen.net Tue Jul 22 15:45:17 2008 From: lists at jnielsen.net (John Nielsen) Date: Tue Jul 22 15:45:23 2008 Subject: NFS writes and ZFS Message-ID: <200807221128.27592.lists@jnielsen.net> I have a FreeBSD server (which I use as a NAS device, among other things) and a FreeBSD deskop. The desktop is running 7-STABLE from a couple days ago and the server is running 8-CURRENT from yesterday. The server has several NFS-exported ZFS'es which I mount from the desktop. Since moving the shares to ZFS I've been having trouble writing to them from the desktop--the mount hangs after the first or second attempt. This is similar if not identical to what's described in the thread (from -current) I partially copied below. Today I discovered that the problem seems to go away if I change the NFS mount options on the desktop. The following is a summary/timeline of what I've tried: 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; works 7-STABLE client, no NFS options (defaults); 7-STABLE server, ZFS; broken 7-STABLE client, no NFS options (defaults); 8-CURRENT server, ZFS; broken 7-STABLE client, tcp,nfsv3,-r32768,-w32768; 8-CURRENT server, ZFS, works My litmus test is to run fetch in the NFS directory a couple times since in my typical usage the failure is most apparent when fetching distfiles to the shared ports tree. I didn't do a thorough search but I don't see any open PR's about this issue (though I remember the thread below and other discussions about the same time). Should I submit one? Other than that I just wanted to report that 1) this is apparently (still) an issue and 2) the NFS flags above seem like a good workaround so far. Thanks, JN > Newsgroups: muc.lists.freebsd.current > From: d...@des.no (Dag-Erling Sm?rgrav) > Date: Sun, 07 Oct 2007 10:48:49 +0200 > Local: Sun, Oct 7 2007 4:48 am > Subject: Re: ZFS & NFS integration... > > Darren Reed writes: > > Dag-Erling Sm?rgrav wrote: > > > Darren Reed writes: > > > > Whats the planned status for ZFS+NFS with 7.0? > > > Don't Do It, basically. > > This sounds like a "shoot yourself in the foot" comment. > > > Why? > > I haven't figured out the exact details yet, but apparently when the > client closes a file that was opened read / write, the server stops > responding to that client. > > DES From pfgshield-freebsd at yahoo.com Tue Jul 22 16:15:12 2008 From: pfgshield-freebsd at yahoo.com (Pedro Giffuni) Date: Tue Jul 22 16:22:14 2008 Subject: birthtime initialization Message-ID: <984489.39243.qm@web32706.mail.mud.yahoo.com> Hi; Tim has some patches I made to add support for birthtime in libarchive (only in extended pax format) as a LIBARCHIVE.creationtime attribute. Since birthtime is set by modifying mtime twice with utimes(2), the only criteria I used to determine if birthtime should be stored is if it was less than mtime. I hope something can be done to make that behavior consistent with UFS2 in all other filesystems. cheers, Pedro. Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina From brde at optusnet.com.au Tue Jul 22 17:31:45 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Tue Jul 22 17:31:51 2008 Subject: birthtime initialization In-Reply-To: <984489.39243.qm@web32706.mail.mud.yahoo.com> References: <984489.39243.qm@web32706.mail.mud.yahoo.com> Message-ID: <20080723032929.F18594@delplex.bde.org> On Tue, 22 Jul 2008, Pedro Giffuni wrote: > Tim has some patches I made to add support for birthtime in libarchive (only in extended pax format) as a LIBARCHIVE.creationtime attribute. > > Since birthtime is set by modifying mtime twice with utimes(2), the only criteria I used to determine if birthtime should be stored is if it was less than mtime. I hope something can be done to make that behavior consistent with UFS2 in all other filesystems. Can't it check for st_birthtime.tv_sec being != 0 or -1? The erroneous default of 0 might interact badly with file systems written by buggy versions of tar that set times to 0. Bruce From pfgshield-freebsd at yahoo.com Tue Jul 22 18:11:47 2008 From: pfgshield-freebsd at yahoo.com (Pedro Giffuni) Date: Tue Jul 22 18:11:54 2008 Subject: birthtime initialization In-Reply-To: <20080723032929.F18594@delplex.bde.org> Message-ID: <232373.60220.qm@web32706.mail.mud.yahoo.com> --- Mar 22/7/08, Bruce Evans ha scritto: ... > > > Tim has some patches I made to add support for > birthtime in libarchive (only in extended pax format) as a > LIBARCHIVE.creationtime attribute. > > > > Since birthtime is set by modifying mtime twice with > utimes(2), the only criteria I used to determine if > birthtime should be stored is if it was less than mtime. I > hope something can be done to make that behavior consistent > with UFS2 in all other filesystems. > > Can't it check for st_birthtime.tv_sec being != 0 or > -1? OK, I can do that, in fact I had it like that originally but then strictly speaking those values are valid and I had to check for birthtime==mtime anyways. Admittedly no BSD system was available before Jan 1st 1970 so I will modify the check to avoid those times. Pedro. Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina From matt at corp.spry.com Tue Jul 22 21:25:45 2008 From: matt at corp.spry.com (Matt Simerson) Date: Tue Jul 22 21:25:52 2008 Subject: ZFS hang issue and prefetch_disable Message-ID: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> Symptoms Deadlocks under heavy IO load on the ZFS file system with prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a stable system. Configuration Two machines. Identically built. Both exhibit identical behavior. 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. FreeBSD 7.0 amd64 dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt Boot disk is a read only 1GB compact flash # cat /etc/fstab /dev/ad0s1a / ufs ro,noatime 2 2 # df -h / Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 939M 555M 309M 64% / RAM has been boosted as suggested in ZFS Tuning Guide # cat /boot/loader.conf vm.kmem_size= 1610612736 vm.kmem_size_max= 1610612736 vfs.zfs.prefetch_disable=1 I haven't mucked much with the other memory settings as I'm using amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. I've tried higher settings for kmem but that resulted in a failed boot. I have ample RAM And would love to use as much as possible for network and disk I/O buffers as that's principally all this system does. Disks & ZFS options Sun's "Best Practices" suggests limiting the number of disks in a raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: http://matt.simerson.net/computing/zfs/zpool.txt I'm using all of the ZFS default properties except: atime=off, compression=on. Environment I'm using these machines as backup servers. I wrote an application that generates a list of the thousands of VPS accounts we host. For each host, it generates a rsnapshot configuration file and backs up up their VPS to these systems via rsync. The application manages concurrency and will span additional rsync processes if system i/o load is below a defined thresh-hold. Which is to say, I can crank up or down the amount of network and disk IO the system sees. With vfs.zfs.prefetch_disable=1, a hang will occur within a few hours (no more than a day). If I keep the i/o load (measured via iostat) down to a low level (< 200 iops) then I still get hangs but less frequently (1-6 days). The only way I have found to prevent the hangs is by setting vfs.zfs.prefetch_disable=1. Matt Simerson From 000.fbsd at quip.cz Tue Jul 22 22:08:22 2008 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Tue Jul 22 22:08:29 2008 Subject: ZFS hang issue and prefetch_disable In-Reply-To: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> References: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> Message-ID: <48865A68.1010504@quip.cz> Matt Simerson wrote: > Symptoms > > Deadlocks under heavy IO load on the ZFS file system with > prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a > stable system. [...] > With vfs.zfs.prefetch_disable=1, a hang will occur within a few hours > (no more than a day). If I keep the i/o load (measured via iostat) down > to a low level (< 200 iops) then I still get hangs but less frequently > (1-6 days). The only way I have found to prevent the hangs is by > setting vfs.zfs.prefetch_disable=1. "With vfs.zfs.prefetch_disable=1, a hang will occur within...", did you realy mean prefetch_disable=1 in this sentence? Your whole e-mail seems that prefetch_disable=1 is good workaround, so I expect you have prefetch_disable=0 previously which causes hangs... Miroslav Lachman From matt at corp.spry.com Tue Jul 22 22:22:24 2008 From: matt at corp.spry.com (Matt Simerson) Date: Tue Jul 22 22:22:30 2008 Subject: ZFS hang issue and prefetch_disable In-Reply-To: <48865A68.1010504@quip.cz> References: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> <48865A68.1010504@quip.cz> Message-ID: On Jul 22, 2008, at 3:08 PM, Miroslav Lachman wrote: > Matt Simerson wrote: >> Symptoms >> Deadlocks under heavy IO load on the ZFS file system with >> prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in >> a stable system. > > [...] > >> With vfs.zfs.prefetch_disable=1, a hang will occur within a few >> hours (no more than a day). If I keep the i/o load (measured via >> iostat) down to a low level (< 200 iops) then I still get hangs >> but less frequently (1-6 days). The only way I have found to >> prevent the hangs is by setting vfs.zfs.prefetch_disable=1. > > "With vfs.zfs.prefetch_disable=1, a hang will occur within...", did > you realy mean prefetch_disable=1 in this sentence? Your whole e- > mail seems that prefetch_disable=1 is good workaround, so I expect > you have prefetch_disable=0 previously which causes hangs... Aye. That is exactly what I meant. With vfs.zfs.prefetch_disable=1, I get a stable system. With vfs.zfs.prefetch_disable=0 (the default) I have frequent deadlocks. Matt Rant: I really wish that variable wasn't named in the negative, creating a double negative when prefetch_disable=0. IE, it should be named vfs.zfs.prefetch_enable instead. It's much easier to express in English that prefetch_enable=1 means ON and prefetch_enable=0 means OFF. There's also the matter than in some languages, a double or triple negative still means the negative case. %-\. I'd rather not have to guess what prefetch_disable=1 means. From pjd at FreeBSD.org Wed Jul 23 07:50:39 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Jul 23 07:50:46 2008 Subject: ZFS hang issue and prefetch_disable In-Reply-To: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> References: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> Message-ID: <20080723075030.GA3603@garage.freebsd.pl> On Tue, Jul 22, 2008 at 01:57:27PM -0700, Matt Simerson wrote: > Symptoms > > Deadlocks under heavy IO load on the ZFS file system with > prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a > stable system. > > Configuration > > Two machines. Identically built. Both exhibit identical behavior. > 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. > FreeBSD 7.0 amd64 > dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt Very nice:) > Boot disk is a read only 1GB compact flash > # cat /etc/fstab > /dev/ad0s1a / ufs ro,noatime 2 2 > > # df -h / > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad0s1a 939M 555M 309M 64% / > > RAM has been boosted as suggested in ZFS Tuning Guide > # cat /boot/loader.conf > vm.kmem_size= 1610612736 > vm.kmem_size_max= 1610612736 > vfs.zfs.prefetch_disable=1 > > I haven't mucked much with the other memory settings as I'm using > amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. > I've tried higher settings for kmem but that resulted in a failed > boot. I have ample RAM And would love to use as much as possible for > network and disk I/O buffers as that's principally all this system does. > > Disks & ZFS options > > Sun's "Best Practices" suggests limiting the number of disks in a > raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: > http://matt.simerson.net/computing/zfs/zpool.txt > > I'm using all of the ZFS default properties except: atime=off, > compression=on. > > Environment > > I'm using these machines as backup servers. I wrote an application > that generates a list of the thousands of VPS accounts we host. For > each host, it generates a rsnapshot configuration file and backs up up > their VPS to these systems via rsync. The application manages > concurrency and will span additional rsync processes if system i/o > load is below a defined thresh-hold. Which is to say, I can crank up > or down the amount of network and disk IO the system sees. > > With vfs.zfs.prefetch_disable=1, a hang will occur within a few hours I guess you wanted '0' here? > (no more than a day). If I keep the i/o load (measured via iostat) > down to a low level (< 200 iops) then I still get hangs but less > frequently (1-6 days). The only way I have found to prevent the hangs > is by setting vfs.zfs.prefetch_disable=1. This is more or less a known problem. It is related to low memory/kva conditions. Alan Cox is working on vm.kmem_size limitation. I saw Kris using ZFS with some very large vm.kmem_size. Not sure if all the code is already committed, but this would be something you should definiatelly try on your hardware. I've also the most recent ZFS version in perforce that is beeing tested by few other guys and I'd like to commit it to HEAD soon (depends on test results of course). There are plenty improvements and some may fix your problem too. BTW. Do you see prefetch helpful for your workloads? I always turn it off on my systems, because it has negative impact on performance, but maybe my hardware is too weak to take advantage out of it. One more thing. There was a small bug in prefetch code, but I've no idea if it is related to hangs you are seeing. If that's not a problem for you, can you try this patch: http://people.freebsd.org/~pjd/patches/dmu_zfetch.c.patch If you want to play with tunning ZFS prefetch, you might find this patches useful (taken from perforce version): http://people.freebsd.org/~pjd/patches/dmu_zfetch.c.2.patch http://people.freebsd.org/~pjd/patches/quad.patch -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080723/2e9d5f88/attachment.pgp From pjd at FreeBSD.org Wed Jul 23 08:19:33 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Jul 23 08:19:41 2008 Subject: ZFS hang issue and prefetch_disable In-Reply-To: References: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> <48865A68.1010504@quip.cz> Message-ID: <20080723081930.GB3603@garage.freebsd.pl> On Tue, Jul 22, 2008 at 03:22:17PM -0700, Matt Simerson wrote: > > On Jul 22, 2008, at 3:08 PM, Miroslav Lachman wrote: > > >Matt Simerson wrote: > >>Symptoms > >>Deadlocks under heavy IO load on the ZFS file system with > >>prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in > >>a stable system. > > > >[...] > > > >>With vfs.zfs.prefetch_disable=1, a hang will occur within a few > >>hours (no more than a day). If I keep the i/o load (measured via > >>iostat) down to a low level (< 200 iops) then I still get hangs > >>but less frequently (1-6 days). The only way I have found to > >>prevent the hangs is by setting vfs.zfs.prefetch_disable=1. > > > >"With vfs.zfs.prefetch_disable=1, a hang will occur within...", did > >you realy mean prefetch_disable=1 in this sentence? Your whole e- > >mail seems that prefetch_disable=1 is good workaround, so I expect > >you have prefetch_disable=0 previously which causes hangs... > > Aye. That is exactly what I meant. With vfs.zfs.prefetch_disable=1, > I get a stable system. With vfs.zfs.prefetch_disable=0 (the default) I > have frequent deadlocks. > > Matt > > Rant: I really wish that variable wasn't named in the negative, > creating a double negative when prefetch_disable=0. IE, it should be > named vfs.zfs.prefetch_enable instead. It's much easier to express in > English that prefetch_enable=1 means ON and prefetch_enable=0 means > OFF. There's also the matter than in some languages, a double or > triple negative still means the negative case. %-\. I'd rather not > have to guess what prefetch_disable=1 means. I agree. We even discussed sysctl naming in the past AFAIR to use exactly 'enable', not 'disable' variants. Although I want to track Solaris as close as possible, that's why I'm using what they have. The intent is to make it easier for people to use ZFS on both Solaris and FreeBSD by not introducing small, but anoying differences. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080723/8f9a264e/attachment.pgp From pjd at FreeBSD.org Wed Jul 23 08:43:43 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Jul 23 08:43:49 2008 Subject: NFS writes and ZFS In-Reply-To: <200807221128.27592.lists@jnielsen.net> References: <200807221128.27592.lists@jnielsen.net> Message-ID: <20080723082401.GC3603@garage.freebsd.pl> On Tue, Jul 22, 2008 at 11:28:27AM -0400, John Nielsen wrote: > I have a FreeBSD server (which I use as a NAS device, among other things) > and a FreeBSD deskop. The desktop is running 7-STABLE from a couple days > ago and the server is running 8-CURRENT from yesterday. The server has > several NFS-exported ZFS'es which I mount from the desktop. Since moving > the shares to ZFS I've been having trouble writing to them from the > desktop--the mount hangs after the first or second attempt. This is > similar if not identical to what's described in the thread > (from -current) I partially copied below. > > Today I discovered that the problem seems to go away if I change the NFS > mount options on the desktop. The following is a summary/timeline of what > I've tried: > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; works > 7-STABLE client, no NFS options (defaults); 7-STABLE server, ZFS; broken > 7-STABLE client, no NFS options (defaults); 8-CURRENT server, ZFS; broken > 7-STABLE client, tcp,nfsv3,-r32768,-w32768; 8-CURRENT server, ZFS, works Do you need all the options here? If not, could you try to find the smallest subset of options that are needed to make ZFS work? Maybe 'nfsv3' is all that is needed, or 'tcp' alone fixes it? At work we use many NFS exported ZFS file systems, mostly accessed from MacOS X and we see no problems. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080723/210a84b3/attachment.pgp From ticso at cicely7.cicely.de Wed Jul 23 09:33:26 2008 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Wed Jul 23 09:33:33 2008 Subject: NFS writes and ZFS In-Reply-To: <20080723082401.GC3603@garage.freebsd.pl> References: <200807221128.27592.lists@jnielsen.net> <20080723082401.GC3603@garage.freebsd.pl> Message-ID: <20080723090450.GV58113@cicely7.cicely.de> On Wed, Jul 23, 2008 at 10:24:01AM +0200, Pawel Jakub Dawidek wrote: > On Tue, Jul 22, 2008 at 11:28:27AM -0400, John Nielsen wrote: > > I have a FreeBSD server (which I use as a NAS device, among other things) > > and a FreeBSD deskop. The desktop is running 7-STABLE from a couple days > > ago and the server is running 8-CURRENT from yesterday. The server has > > several NFS-exported ZFS'es which I mount from the desktop. Since moving > > the shares to ZFS I've been having trouble writing to them from the > > desktop--the mount hangs after the first or second attempt. This is > > similar if not identical to what's described in the thread > > (from -current) I partially copied below. > > > > Today I discovered that the problem seems to go away if I change the NFS > > mount options on the desktop. The following is a summary/timeline of what > > I've tried: > > > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; works > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, ZFS; broken > > 7-STABLE client, no NFS options (defaults); 8-CURRENT server, ZFS; broken > > 7-STABLE client, tcp,nfsv3,-r32768,-w32768; 8-CURRENT server, ZFS, works > > Do you need all the options here? If not, could you try to find the > smallest subset of options that are needed to make ZFS work? Maybe > 'nfsv3' is all that is needed, or 'tcp' alone fixes it? At work we use > many NFS exported ZFS file systems, mostly accessed from MacOS X and > we see no problems. Whenever changing NFS transport options has an influence on reliability my first task is to verify the network. Especially there were often hardware problems with some NIC lately, of which some have worked around in the drivers and some not. Disabling TSO and checksum offloading typically helps. This kind of problem is typical on both the client and server, but also on routers. Of course network problems can also be on any cable, switch in between as well, but are less typical to produce complete NFS hangs. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From kometen at gmail.com Wed Jul 23 10:27:45 2008 From: kometen at gmail.com (Claus Guttesen) Date: Wed Jul 23 10:27:55 2008 Subject: NFS writes and ZFS In-Reply-To: <20080723082401.GC3603@garage.freebsd.pl> References: <200807221128.27592.lists@jnielsen.net> <20080723082401.GC3603@garage.freebsd.pl> Message-ID: >> Today I discovered that the problem seems to go away if I change the NFS >> mount options on the desktop. The following is a summary/timeline of what >> I've tried: >> >> 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; works >> 7-STABLE client, no NFS options (defaults); 7-STABLE server, ZFS; broken >> 7-STABLE client, no NFS options (defaults); 8-CURRENT server, ZFS; broken >> 7-STABLE client, tcp,nfsv3,-r32768,-w32768; 8-CURRENT server, ZFS, works > > Do you need all the options here? If not, could you try to find the > smallest subset of options that are needed to make ZFS work? Maybe > 'nfsv3' is all that is needed, or 'tcp' alone fixes it? At work we use > many NFS exported ZFS file systems, mostly accessed from MacOS X and > we see no problems. Good to hear. I've just started testing a setup with an areca arc-1680 sas card and an external sas-cabinet. It currently has a zpool with ten 1 TB-drives in raidz2. It may grow to a two-digit TB system if the testing goes fine. It will nfs-share the partitions to my FreeBSD webservers. I'm using nfs v.3 and tcp with a read- and write-size at 32768. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare From jh at saunalahti.fi Wed Jul 23 10:34:31 2008 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Wed Jul 23 10:34:38 2008 Subject: birthtime initialization In-Reply-To: <20080722215249.K17453@delplex.bde.org> References: <200806020800.m528038T072838@freefall.freebsd.org> <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> <20080722215249.K17453@delplex.bde.org> Message-ID: <20080723103424.GA1856@a91-153-120-204.elisa-laajakaista.fi> On 2008-07-22, Bruce Evans wrote: > > + VATTR_NULL(vap); > > I want to initialize va_birthtime to { -1, 0 } here only. Don't > initialize the whole vattr here. VOP_GETTATR() is supposed to initalize > everything, but doesn't for va_birthtime. If there any other fields > that VOP_GETTATR() doesn't initialize, then these should be searched > for and fixed instead of setting them to the garbage value given by > vattr_null. At least xfs gets it wrong for several fields. /* * Fields with no direct equivalent in XFS * leave initialized by VATTR_NULL */ #if 0 vap->va_filerev = 0; vap->va_birthtime = va.va_ctime; vap->va_vaflags = 0; vap->va_flags = 0; vap->va_spare = 0; #endif > > Index: sys/ufs/ufs/ufs_vnops.c > > ... > > Index: sys/fs/msdosfs/msdosfs_vnops.c > > ... > > Index: sys/nfsclient/nfs_subs.c > > There are a probably more file systems that have missing or slightly > incorrect (all zero) settings of va_birthtime. Many file systems misses settings of va_birthtime. That's the reason why I initialized it in vn_stat(). I have seen four types of initializations: 1) Support and set birthtime. (UFS2, tmpfs, msdosfs (not all variants of msdosfs support birthtime), nfs4?) 2) Set birthtime to zero. (UFS1, nfs (nfs zeroes the vattr structure)) 3) Initialize vattr with VATTR_NULL() but not birthtime explicitly. Thus tv_sec and tv_nsec are set to -1 (VNOVAL). (devfs, xfs, portalfs, pseudofs) 4) Not initialize birthtime at all. Those would be fixed by initializing the birthtime in vn_stat(). (cd9660, hpfs, ntfs, smbfs, udf, ext2fs, reiserfs) I couldn't test but I suspect that also coda belongs to this group. So I see two ways to fix: - initialize birthtime in vn_stat() and add/fix explicit setting for group 2 and 3 file systems or - add explicit initialization to all file systems missing it (groups 3 and 4) and fix group 2 to initialize birthtime to correct value > I wouldn't like VNOVAL being replaced by VNOTIMESPECVAL, VNOUIDVAL, > ... etc. I agree with this. I have updated the patch per your comments and checked more file systems. I have verified that with this patch these file systems return correct birthtime values (real birthtime or {-1, 0} if not supported): UFS2, UFS1, cd9660, nfs, ext2fs, smbfs, reiserfs, xfs, ntfs, devfs, procfs, linprocfs, tmpfs, msdosfs, portalfs, udf. For pseudofs I set birthtime to current time. %%% Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 180729) +++ sys/kern/vfs_vnops.c (working copy) @@ -703,6 +703,13 @@ vn_stat(vp, sb, active_cred, file_cred, #endif vap = &vattr; + + /* + * Not all file systems initialize birthtime. + */ + vap->va_birthtime.tv_sec = -1; + vap->va_birthtime.tv_nsec = 0; + error = VOP_GETATTR(vp, vap, active_cred, td); if (error) return (error); Index: sys/ufs/ufs/ufs_vnops.c =================================================================== --- sys/ufs/ufs/ufs_vnops.c (revision 180729) +++ sys/ufs/ufs/ufs_vnops.c (working copy) @@ -410,7 +410,7 @@ ufs_getattr(ap) vap->va_mtime.tv_nsec = ip->i_din1->di_mtimensec; vap->va_ctime.tv_sec = ip->i_din1->di_ctime; vap->va_ctime.tv_nsec = ip->i_din1->di_ctimensec; - vap->va_birthtime.tv_sec = 0; + vap->va_birthtime.tv_sec = -1; vap->va_birthtime.tv_nsec = 0; vap->va_bytes = dbtob((u_quad_t)ip->i_din1->di_blocks); } else { Index: sys/nfsclient/nfs_subs.c =================================================================== --- sys/nfsclient/nfs_subs.c (revision 180729) +++ sys/nfsclient/nfs_subs.c (working copy) @@ -628,6 +628,8 @@ nfs_loadattrcache(struct vnode **vpp, st vap->va_rdev = rdev; mtime_save = vap->va_mtime; vap->va_mtime = mtime; + vap->va_birthtime.tv_sec = -1; + vap->va_birthtime.tv_nsec = 0; vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; if (v3) { vap->va_nlink = fxdr_unsigned(u_short, fp->fa_nlink); Index: sys/fs/pseudofs/pseudofs_vnops.c =================================================================== --- sys/fs/pseudofs/pseudofs_vnops.c (revision 180729) +++ sys/fs/pseudofs/pseudofs_vnops.c (working copy) @@ -200,7 +200,7 @@ pfs_getattr(struct vop_getattr_args *va) vap->va_fsid = vn->v_mount->mnt_stat.f_fsid.val[0]; vap->va_nlink = 1; nanotime(&vap->va_ctime); - vap->va_atime = vap->va_mtime = vap->va_ctime; + vap->va_atime = vap->va_mtime = vap->va_birthtime = vap->va_ctime; switch (pn->pn_type) { case pfstype_procdir: Index: sys/fs/portalfs/portal_vnops.c =================================================================== --- sys/fs/portalfs/portal_vnops.c (revision 180729) +++ sys/fs/portalfs/portal_vnops.c (working copy) @@ -462,6 +462,8 @@ portal_getattr(ap) nanotime(&vap->va_atime); vap->va_mtime = vap->va_atime; vap->va_ctime = vap->va_mtime; + vap->va_birthtime.tv_sec = -1; + vap->va_birthtime.tv_nsec = 0; vap->va_gen = 0; vap->va_flags = 0; vap->va_rdev = 0; Index: sys/fs/devfs/devfs_vnops.c =================================================================== --- sys/fs/devfs/devfs_vnops.c (revision 180729) +++ sys/fs/devfs/devfs_vnops.c (working copy) @@ -543,6 +543,8 @@ devfs_getattr(struct vop_getattr_args *a vap->va_rdev = cdev2priv(dev)->cdp_inode; } + vap->va_birthtime.tv_sec = -1; + vap->va_birthtime.tv_nsec = 0; vap->va_gen = 0; vap->va_flags = 0; vap->va_nlink = de->de_links; Index: sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c =================================================================== --- sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c (revision 180729) +++ sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c (working copy) @@ -263,6 +263,8 @@ _xfs_getattr( vap->va_atime = va.va_atime; vap->va_mtime = va.va_mtime; vap->va_ctime = va.va_ctime; + vap->va_birthtime.tv_sec = -1; + vap->va_birthtime.tv_nsec = 0; vap->va_gen = va.va_gen; vap->va_rdev = va.va_rdev; vap->va_bytes = (va.va_nblocks << BBSHIFT); %%% -- Jaakko From brde at optusnet.com.au Wed Jul 23 15:42:54 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 23 15:43:20 2008 Subject: birthtime initialization In-Reply-To: <20080723103424.GA1856@a91-153-120-204.elisa-laajakaista.fi> References: <200806020800.m528038T072838@freefall.freebsd.org> <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> <20080722215249.K17453@delplex.bde.org> <20080723103424.GA1856@a91-153-120-204.elisa-laajakaista.fi> Message-ID: <20080724000618.Q16961@besplex.bde.org> On Wed, 23 Jul 2008, Jaakko Heinonen wrote: > On 2008-07-22, Bruce Evans wrote: >>> + VATTR_NULL(vap); >> >> I want to initialize va_birthtime to { -1, 0 } here only. Don't >> initialize the whole vattr here. VOP_GETTATR() is supposed to initalize >> everything, but doesn't for va_birthtime. If there any other fields >> that VOP_GETTATR() doesn't initialize, then these should be searched >> for and fixed instead of setting them to the garbage value given by >> vattr_null. > > At least xfs gets it wrong for several fields. > > /* > * Fields with no direct equivalent in XFS > * leave initialized by VATTR_NULL > */ > #if 0 > vap->va_filerev = 0; > vap->va_birthtime = va.va_ctime; > vap->va_vaflags = 0; > vap->va_flags = 0; > vap->va_spare = 0; > #endif That's amazingly bad. First, the fields shouldn't be initialized using VATTR_NULL() in VOP_GETATTR(). Doing so breaks the preinitialization that we want to add (maybe also layering). This bug seems to be present in only the following file systems: fdescfs, mqfs, pseudofs, tmpfs, xfs The uninitialized fields should give stack garbage. Second, VNOVAL is an extremly bogus default value. For va_flags, it gives all flags set, so ls -lo output would be weird (and wrong since the flags aren't actually there). Third, va_vaflags and va_spare aren't part of the VOP_GETATTR() API. va_vaflags is an input parameter for VOP_SETATTR(). va_spare is just spare (unused). VATTR_NULL() initializes va_vaflags to 0, not VNOVAL (as is required for the usual case in VOP_SETTATR()), and it knows better than to initialize unused fields (it also doesn't initialize unnamed padding -- stack garbage in this is OK since vattr is never copied directly to userland). After deleting the bogus initializations, we're left with va_filerev, va_birthtime and va_flags. Most file systems don't support these, so they could usefully all be handled by defaulting them as in the proposed changes for va_birthtime. >>> Index: sys/ufs/ufs/ufs_vnops.c >>> ... >>> Index: sys/fs/msdosfs/msdosfs_vnops.c >>> ... >>> Index: sys/nfsclient/nfs_subs.c >> >> There are a probably more file systems that have missing or slightly >> incorrect (all zero) settings of va_birthtime. > > Many file systems misses settings of va_birthtime. That's the reason why > I initialized it in vn_stat(). I have seen four types of > initializations: > > 1) Support and set birthtime. (UFS2, tmpfs, msdosfs (not all > variants of msdosfs support birthtime), nfs4?) > > 2) Set birthtime to zero. (UFS1, nfs (nfs zeroes the vattr structure)) Zeroing it is almost as bad as VATTR_NULL()ing it. > 3) Initialize vattr with VATTR_NULL() but not birthtime explicitly. Thus > tv_sec and tv_nsec are set to -1 (VNOVAL). (devfs, xfs, portalfs, > pseudofs) > > 4) Not initialize birthtime at all. Those would be fixed by initializing the > birthtime in vn_stat(). (cd9660, hpfs, ntfs, smbfs, udf, ext2fs, > reiserfs) > I couldn't test but I suspect that also coda belongs to this group. > > So I see two ways to fix: > > - initialize birthtime in vn_stat() and add/fix explicit setting for group 2 > and 3 file systems or > - add explicit initialization to all file systems missing it > (groups 3 and 4) and fix group 2 to initialize birthtime to correct value (3) and (4) are only different due to bugs. I want to initialize va_birthtime and maybe a couple of other fields in vn_stat(), and depend on this and not initialize to the same or a worse value in case (3). This requires removing VATTR_NULL() or zeroing in some cases and checking that everything is still initialized. All old fields should be handled by explicit initialization as in ffs1, and all new fields should have defaults. > I have updated the patch per your comments and checked more file > systems. I have verified that with this patch these file systems return > correct birthtime values (real birthtime or {-1, 0} if not supported): > > UFS2, UFS1, cd9660, nfs, ext2fs, smbfs, reiserfs, xfs, ntfs, devfs, > procfs, linprocfs, tmpfs, msdosfs, portalfs, udf. I don't want the case (3). Otherwise good. > > For pseudofs I set birthtime to current time. I don't like this. birthtime should be <= all other file times. If a file system doesn't support birthtime, then it could also set birthtime = mtime, but that isn't useful either. Better set it to -1 as in ffs1 (exept ffs1 set it to 0). > > %%% > Index: sys/kern/vfs_vnops.c > =================================================================== > --- sys/kern/vfs_vnops.c (revision 180729) > +++ sys/kern/vfs_vnops.c (working copy) > @@ -703,6 +703,13 @@ vn_stat(vp, sb, active_cred, file_cred, > #endif > > vap = &vattr; > + > + /* > + * Not all file systems initialize birthtime. > + */ Change to something like: /* * Initialize defaults for new and/or unusual fields, so that file * systems which don't support these fields don't need to know * about them. */ > + vap->va_birthtime.tv_sec = -1; > + vap->va_birthtime.tv_nsec = 0; > + > error = VOP_GETATTR(vp, vap, active_cred, td); > if (error) > return (error); > Index: sys/ufs/ufs/ufs_vnops.c > =================================================================== > --- sys/ufs/ufs/ufs_vnops.c (revision 180729) > +++ sys/ufs/ufs/ufs_vnops.c (working copy) > @@ -410,7 +410,7 @@ ufs_getattr(ap) > vap->va_mtime.tv_nsec = ip->i_din1->di_mtimensec; > vap->va_ctime.tv_sec = ip->i_din1->di_ctime; > vap->va_ctime.tv_nsec = ip->i_din1->di_ctimensec; > - vap->va_birthtime.tv_sec = 0; > + vap->va_birthtime.tv_sec = -1; > vap->va_birthtime.tv_nsec = 0; > vap->va_bytes = dbtob((u_quad_t)ip->i_din1->di_blocks); > } else { Can just delete birthtime references here. Unless I've missed a bzero. > Index: sys/nfsclient/nfs_subs.c > =================================================================== > --- sys/nfsclient/nfs_subs.c (revision 180729) > +++ sys/nfsclient/nfs_subs.c (working copy) > @@ -628,6 +628,8 @@ nfs_loadattrcache(struct vnode **vpp, st > vap->va_rdev = rdev; > mtime_save = vap->va_mtime; > vap->va_mtime = mtime; > + vap->va_birthtime.tv_sec = -1; > + vap->va_birthtime.tv_nsec = 0; > vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; > if (v3) { > vap->va_nlink = fxdr_unsigned(u_short, fp->fa_nlink); Need to remove the zeroing and check other fields before defaulting birthtime here. > Index: sys/fs/pseudofs/pseudofs_vnops.c > =================================================================== > --- sys/fs/pseudofs/pseudofs_vnops.c (revision 180729) > +++ sys/fs/pseudofs/pseudofs_vnops.c (working copy) > @@ -200,7 +200,7 @@ pfs_getattr(struct vop_getattr_args *va) > vap->va_fsid = vn->v_mount->mnt_stat.f_fsid.val[0]; > vap->va_nlink = 1; > nanotime(&vap->va_ctime); > - vap->va_atime = vap->va_mtime = vap->va_ctime; > + vap->va_atime = vap->va_mtime = vap->va_birthtime = vap->va_ctime; > > switch (pn->pn_type) { > case pfstype_procdir: I don't understand why it doesn't have _any_ persistent storage for times. > Index: sys/fs/portalfs/portal_vnops.c > =================================================================== > --- sys/fs/portalfs/portal_vnops.c (revision 180729) > +++ sys/fs/portalfs/portal_vnops.c (working copy) > @@ -462,6 +462,8 @@ portal_getattr(ap) > nanotime(&vap->va_atime); > vap->va_mtime = vap->va_atime; > vap->va_ctime = vap->va_mtime; > + vap->va_birthtime.tv_sec = -1; > + vap->va_birthtime.tv_nsec = 0; > vap->va_gen = 0; > vap->va_flags = 0; > vap->va_rdev = 0; This uses both bzero() and vattr_null(). Oops, I only grepped for use of VATTR_NULL() when I looked for bogus initializations above. VATTR_NULL() is the public interface and vattr_null() is an implementation detail. Add the following file systems to the list of file systems with bogusly initialized vattr's in VOP_GETATTR(): devfs, portalfs These both misuse bzero() and vattr_null(). There are no other misuses of vattr_null(). > Index: sys/fs/devfs/devfs_vnops.c > =================================================================== > --- sys/fs/devfs/devfs_vnops.c (revision 180729) > +++ sys/fs/devfs/devfs_vnops.c (working copy) > @@ -543,6 +543,8 @@ devfs_getattr(struct vop_getattr_args *a > > vap->va_rdev = cdev2priv(dev)->cdp_inode; > } > + vap->va_birthtime.tv_sec = -1; > + vap->va_birthtime.tv_nsec = 0; > vap->va_gen = 0; > vap->va_flags = 0; > vap->va_nlink = de->de_links; See above. > Index: sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c > =================================================================== > --- sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c (revision 180729) > +++ sys/gnu/fs/xfs/FreeBSD/xfs_vnops.c (working copy) > @@ -263,6 +263,8 @@ _xfs_getattr( > vap->va_atime = va.va_atime; > vap->va_mtime = va.va_mtime; > vap->va_ctime = va.va_ctime; > + vap->va_birthtime.tv_sec = -1; > + vap->va_birthtime.tv_nsec = 0; > vap->va_gen = va.va_gen; > vap->va_rdev = va.va_rdev; > vap->va_bytes = (va.va_nblocks << BBSHIFT); See above (need to do somethign about the VATTR_NULL()). > %%% Grepping for va_.*flags in only sys/fs/ shows the following problems in VOP_SETATTR(): - coda: sets va_vaflags in a macro but never uses va_vaflags (needed for layering?) - ntfs: sets va_flags to ip->i_flag -- nonsense -- i_flag is an internal flag that has nothing to do with va_flags - nwfs: sets va_vaflags in nwfs_attr_cacheenter(), but I think nothing uses this setting. - smbfs: sets va_vaflags in smbfs_attrcachelookup() ... - tmpfs: sets va_vaflags and also va_spare. and the following non-problems: - all except msdosfs set va_flags to 0, so defaulting va_flags to 0 and deleting most settings of it would work well. Bruce From lists at jnielsen.net Wed Jul 23 17:23:21 2008 From: lists at jnielsen.net (John Nielsen) Date: Wed Jul 23 17:23:38 2008 Subject: NFS writes and ZFS In-Reply-To: <20080723090450.GV58113@cicely7.cicely.de> References: <200807221128.27592.lists@jnielsen.net> <20080723082401.GC3603@garage.freebsd.pl> <20080723090450.GV58113@cicely7.cicely.de> Message-ID: <200807231323.37358.lists@jnielsen.net> On Wednesday 23 July 2008, Bernd Walter wrote: > On Wed, Jul 23, 2008 at 10:24:01AM +0200, Pawel Jakub Dawidek wrote: > > On Tue, Jul 22, 2008 at 11:28:27AM -0400, John Nielsen wrote: > > > I have a FreeBSD server (which I use as a NAS device, among other > > > things) and a FreeBSD deskop. The desktop is running 7-STABLE from a > > > couple days ago and the server is running 8-CURRENT from yesterday. > > > The server has several NFS-exported ZFS'es which I mount from the > > > desktop. Since moving the shares to ZFS I've been having trouble > > > writing to them from the desktop--the mount hangs after the first or > > > second attempt. This is similar if not identical to what's described > > > in the thread > > > (from -current) I partially copied below. > > > > > > Today I discovered that the problem seems to go away if I change the > > > NFS mount options on the desktop. The following is a summary/timeline > > > of what I've tried: > > > > > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; > > > works 7-STABLE client, no NFS options (defaults); 7-STABLE server, > > > ZFS; broken 7-STABLE client, no NFS options (defaults); 8-CURRENT > > > server, ZFS; broken 7-STABLE client, tcp,nfsv3,-r32768,-w32768; > > > 8-CURRENT server, ZFS, works > > > > Do you need all the options here? If not, could you try to find the > > smallest subset of options that are needed to make ZFS work? Maybe > > 'nfsv3' is all that is needed, or 'tcp' alone fixes it? At work we use > > many NFS exported ZFS file systems, mostly accessed from MacOS X and > > we see no problems. > > Whenever changing NFS transport options has an influence on reliability > my first task is to verify the network. > Especially there were often hardware problems with some NIC lately, > of which some have worked around in the drivers and some not. > Disabling TSO and checksum offloading typically helps. > This kind of problem is typical on both the client and server, but also > on routers. > Of course network problems can also be on any cable, switch in between > as well, but are less typical to produce complete NFS hangs. A good strategy I'm sure. However in this case the whole network is within arm's reach, the switch and cables are brand new and I haven't had any other issues that would point to a network fault. Further, I saw the exact same behavior on a completely different set of hardware around the time of 7-BETA. In both cases the NFS shares worked fine prior to my moving the shared ports tree to ZFS. PJD- I'll try to narrow the options needed this afternoon or tomorrow and let you know what I find. JN From lists at jnielsen.net Wed Jul 23 18:02:41 2008 From: lists at jnielsen.net (John Nielsen) Date: Wed Jul 23 18:02:52 2008 Subject: NFS writes and ZFS In-Reply-To: <20080723082401.GC3603@garage.freebsd.pl> References: <200807221128.27592.lists@jnielsen.net> <20080723082401.GC3603@garage.freebsd.pl> Message-ID: <200807231402.59893.lists@jnielsen.net> On Wednesday 23 July 2008, Pawel Jakub Dawidek wrote: > On Tue, Jul 22, 2008 at 11:28:27AM -0400, John Nielsen wrote: > > I have a FreeBSD server (which I use as a NAS device, among other > > things) and a FreeBSD deskop. The desktop is running 7-STABLE from a > > couple days ago and the server is running 8-CURRENT from yesterday. The > > server has several NFS-exported ZFS'es which I mount from the desktop. > > Since moving the shares to ZFS I've been having trouble writing to them > > from the desktop--the mount hangs after the first or second attempt. > > This is similar if not identical to what's described in the thread > > (from -current) I partially copied below. > > > > Today I discovered that the problem seems to go away if I change the > > NFS mount options on the desktop. The following is a summary/timeline > > of what I've tried: > > > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, UFS; works > > 7-STABLE client, no NFS options (defaults); 7-STABLE server, ZFS; > > broken 7-STABLE client, no NFS options (defaults); 8-CURRENT server, > > ZFS; broken 7-STABLE client, tcp,nfsv3,-r32768,-w32768; 8-CURRENT > > server, ZFS, works > > Do you need all the options here? If not, could you try to find the > smallest subset of options that are needed to make ZFS work? Maybe > 'nfsv3' is all that is needed, or 'tcp' alone fixes it? At work we use > many NFS exported ZFS file systems, mostly accessed from MacOS X and > we see no problems. No. "tcp" alone fixes it. That's not too surprising since nfsv3 should be a no-op. With everything _but_ "tcp" it took only slightly longer to hang the mount (not scientifically measured). With the default NFS mount mode changed to TCP in -CURRENT the workaround is already in place for FreeBSD clients, and the issue apparently never popped up on other clients--there are a few people (yourself included) who say they've never had a problem with Mac OS, e.g.. I haven't come across reports either way about Solaris or Linux. Are we the last ones to use UDP by default? Anyway, I hope this is helpful. Let me know if I should file a PR or anything. Thanks, JN From jh at saunalahti.fi Fri Jul 25 07:23:19 2008 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Fri Jul 25 07:23:26 2008 Subject: birthtime initialization In-Reply-To: <20080724000618.Q16961@besplex.bde.org> References: <200806020800.m528038T072838@freefall.freebsd.org> <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> <20080722215249.K17453@delplex.bde.org> <20080723103424.GA1856@a91-153-120-204.elisa-laajakaista.fi> <20080724000618.Q16961@besplex.bde.org> Message-ID: <20080725072314.GA807@a91-153-120-204.elisa-laajakaista.fi> On 2008-07-24, Bruce Evans wrote: > First, the fields shouldn't be initialized using VATTR_NULL() in > VOP_GETATTR(). > Second, VNOVAL is an extremly bogus default value. Except for va_fsid because there's this check in vn_stat(): if (vap->va_fsid != VNOVAL) sb->st_dev = vap->va_fsid; else sb->st_dev = vp->v_mount->mnt_stat.f_fsid.val[0]; What do you think that is a proper default value for va_rdev? Some file systems set it to 0 and some to VNOVAL. > After deleting the bogus initializations, we're left with va_filerev, > va_birthtime and va_flags. Most file systems don't support these, so > they could usefully all be handled by defaulting them as in the proposed > changes for va_birthtime. Unfortunately moving initializations to vn_stat() breaks things. For example vm_mmap_vnode() uses VOP_GETATTR() to determine which file flags are set. Thus moving va_flags initialization to vn_stat() breaks mmap. In theory this could be a potential problem for birthtime too. > > 3) Initialize vattr with VATTR_NULL() but not birthtime explicitly. Thus > > tv_sec and tv_nsec are set to -1 (VNOVAL). (devfs, xfs, portalfs, > > pseudofs) > > I don't want the case (3). Otherwise good. Thank you for your valuable comments. I will try to update the patch. -- Jaakko From brde at optusnet.com.au Fri Jul 25 10:00:19 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Fri Jul 25 10:00:25 2008 Subject: birthtime initialization In-Reply-To: <20080725072314.GA807@a91-153-120-204.elisa-laajakaista.fi> References: <200806020800.m528038T072838@freefall.freebsd.org> <20080722075718.GA1881@a91-153-120-204.elisa-laajakaista.fi> <20080722215249.K17453@delplex.bde.org> <20080723103424.GA1856@a91-153-120-204.elisa-laajakaista.fi> <20080724000618.Q16961@besplex.bde.org> <20080725072314.GA807@a91-153-120-204.elisa-laajakaista.fi> Message-ID: <20080725192315.D27178@delplex.bde.org> On Fri, 25 Jul 2008, Jaakko Heinonen wrote: > On 2008-07-24, Bruce Evans wrote: >> First, the fields shouldn't be initialized using VATTR_NULL() in >> VOP_GETATTR(). > >> Second, VNOVAL is an extremly bogus default value. > > Except for va_fsid because there's this check in vn_stat(): > > if (vap->va_fsid != VNOVAL) > sb->st_dev = vap->va_fsid; > else > sb->st_dev = vp->v_mount->mnt_stat.f_fsid.val[0]; Hmm, this is remarkably broken too. In VOP_GETATTR() for file systems under sys/fs: - the following file systems set va_fsid to dev2udev() (and thus defeat the better default above): cd9660, hpfs, msdosfs, ntfs, udf, unionfs - the following file systems don't seem to set va_fsid (and thus set st_dev to stack garbage) - the following file systems set va_fsid to VNOVAL via VATTR_NULL(): fdescfs - the following file systems set va_fsid to VNOVAL via vattr_null(): devfs, portalfs - the following file systems set va_fsid to VNOVAL via obscure means: coda (?) - the following file systems set va_fsid to mnt_stat.f_fsid.val[0] directly: nullfs, nwfs (?), pseudofs, smbfs (?), tmpfs > What do you think that is a proper default value for va_rdev? Some file > systems set it to 0 and some to VNOVAL. Either NODEV or VNOVAL explicitly translated late to NODEV. NODEV is (dev_t)(-1) (bug: this has parentheses in all the wrong places -- it should be ((dev_t)-1), so VNOVAL = -1 assigned to va_rdev of type dev_t equals NODEV and the identity translation works. >> After deleting the bogus initializations, we're left with va_filerev, >> va_birthtime and va_flags. Most file systems don't support these, so >> they could usefully all be handled by defaulting them as in the proposed >> changes for va_birthtime. > > Unfortunately moving initializations to vn_stat() breaks things. For > example vm_mmap_vnode() uses VOP_GETATTR() to determine which file flags > are set. Thus moving va_flags initialization to vn_stat() breaks > mmap. Oops. > In theory this could be a potential problem for birthtime too. It's a bit dangerous, but most callers to VOP_GETATTR() except vn_stat() only want a couple of fields, and hopefully none want new fields. Maybe the public interface should be vop_getattr() which sets defaults and calls VOP_GETATTR(). Does this work with layering? There is negative point to inlining most VOPs, and for VOP_GETATTR(), no one cares about the much higher overhead of setting all fields in it when only a couple are wanted. Bruce From peter.schuller at infidyne.com Sat Jul 26 18:22:28 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sat Jul 26 18:22:36 2008 Subject: Asynchronous writing to zvols (ZFS) Message-ID: <200807262005.54235.peter.schuller@infidyne.com> Hello, I was finally playing around with iSCSI, having never used it before. For convenience, and also because it may be a future use case for real use, I used zvols for my targets. I could not get write speed above roughly 1 MB/second even in simple cases like dd:ing with an 8 MB block size, with the zvol:s on a 6-disk raidz2. The individual disk utilization of constituent drives remains small (< 3%). Switching to a memory disk target yielded expected performance characteristics. I notice that there were confirmed issues with writes to zvol:s: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496356 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496344 The problem is that I'm not really sure how to translate "snv_59" into something that I can compare with the version of ZFS in FreeBSD. Do the above "bugs" still apply to the ZFS version in FreeBSD, or am I hitting something else? -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080726/aba07c76/attachment.pgp From randy at psg.com Sat Jul 26 19:20:03 2008 From: randy at psg.com (Randy Bush) Date: Sat Jul 26 19:20:14 2008 Subject: work0 drive Message-ID: <488B78DE.108@psg.com> hardware problem caused looping page fault problems. scrub cored i zpool remove'd the offending drive for the moment and the system works. randy --- acd0: CDROM at ata0-slave UDMA33 ad4: 305245MB at ata2-master SATA150 ad6: 305245MB at ata3-master SATA150 ad8: 305245MB at ata4-master SATA150 ad10: 305245MB at ata5-master SATA150 SMP: AP CPU #1 Launched! GEOM_MIRROR: Device mirror/boot launched (1/2). GEOM_MIRROR: Device boot: rebuilding provider ad4s1. Trying to mount root from ufs:/dev/mirror/boota WARNING: / was not properly dismounted Loading configuration files. dumpon: /dev/ad4s1b: No such file or directory Entropy harvesting: interrupts ethernet point_to_point kickstart. swapon: adding /dev/mirror/bootb as swap device Starting file system checks: /dev/mirror/boota: 2690 files, 185913 used, 3875150 free (3686 frags, 483933 blocks, 0.1% fragmentation) Setting hostuuid: 7634a964-b127-0430-c299-00304891d708. Setting hostid: 0xeebf67d9. Mounting local file systems:. ad6: FAILURE - READ_DMA status=51 error=40 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff801aaa06 stack pointer = 0x10:0xffffffff80a64bc0 frame pointer = 0x10:0x51 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) trap number = 12 panic: page fault cpuid = 1 From pjd at FreeBSD.org Sat Jul 26 20:51:21 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sat Jul 26 20:51:26 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <200807262005.54235.peter.schuller@infidyne.com> References: <200807262005.54235.peter.schuller@infidyne.com> Message-ID: <20080726205118.GB1345@garage.freebsd.pl> On Sat, Jul 26, 2008 at 08:05:46PM +0200, Peter Schuller wrote: > Hello, > > I was finally playing around with iSCSI, having never used it before. For > convenience, and also because it may be a future use case for real use, I > used zvols for my targets. > > I could not get write speed above roughly 1 MB/second even in simple cases > like dd:ing with an 8 MB block size, with the zvol:s on a 6-disk raidz2. The > individual disk utilization of constituent drives remains small (< 3%). > Switching to a memory disk target yielded expected performance > characteristics. > > I notice that there were confirmed issues with writes to zvol:s: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496356 > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496344 > > The problem is that I'm not really sure how to translate "snv_59" into > something that I can compare with the version of ZFS in FreeBSD. Do the > above "bugs" still apply to the ZFS version in FreeBSD, or am I hitting > something else? The problem is that we don't between async and sync I/O request on GEOM level, that's why I decided to commit a ZIL log after each write, which wasn't very smart it seems. This is handled differently in version I've in perforce. Could you try the below patch and see how it performs now? http://people.freebsd.org/~pjd/patches/zvol.c.patch -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080726/88d17408/attachment.pgp From tom.hurst at clara.net Sat Jul 26 21:27:03 2008 From: tom.hurst at clara.net (Thomas Hurst) Date: Sat Jul 26 21:27:10 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <200807262005.54235.peter.schuller@infidyne.com> References: <200807262005.54235.peter.schuller@infidyne.com> Message-ID: <20080726210319.GA57383@voi.aagh.net> * Peter Schuller (peter.schuller@infidyne.com) wrote: > I notice that there were confirmed issues with writes to zvol:s: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496356 > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6496344 > > The problem is that I'm not really sure how to translate "snv_59" into > something that I can compare with the version of ZFS in FreeBSD. Do > the above "bugs" still apply to the ZFS version in FreeBSD, or am I > hitting something else? WARNING: ZFS is considered to be an experimental feature in FreeBSD. ZFS filesystem version 6 http://opensolaris.org/os/community/zfs/version/6/ This feature is available in: Solaris Nevada Build 62 However, some of this looks faimilar from recent Perforce activity: This means that zvol needs to support this special command (DKIOCFLUSHWRITECACHE), and that it should save zil_commit() for only the times it's necessary. http://perforce.freebsd.org/changeList.cgi?CMD=changes&FSPC=//depot/user/pjd/zfs/... Change 145289 2008/07/15 by pjd@pjd_zoo Improve ZVOL performance by only committing ZIL on BIO_FLUSH request, not on every BIO_WRITE request. Which looks like a good candidate. You could see if ZIL is the problem by setting vfs.zfs.zil_disable=1 -- Thomas 'Freaky' Hurst http://hur.st/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080726/390d770c/attachment.pgp From pjd at FreeBSD.org Sun Jul 27 12:54:15 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sun Jul 27 12:54:22 2008 Subject: ZFS patches. Message-ID: <20080727125413.GG1345@garage.freebsd.pl> Hi. http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 The patch above contains the most recent ZFS version that could be found in OpenSolaris as of today. Apart for large amount of new functionality, I belive there are many stability (and also performance) improvements compared to the version from the base system. Check out OpenSolaris website to find out the differences between base system version and patch version. Please test, test, test. If I get enough positive feedback, I may be able to squeeze it into 7.1-RELEASE, but this might be hard. If you have any questions, please use mailing lists (freebsd-fs@FreeBSD.org would be the best). Thank you in advance! -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/70de97b0/attachment.pgp From arne at rfc2549.org Sun Jul 27 14:05:15 2008 From: arne at rfc2549.org (Arne Schwabe) Date: Sun Jul 27 14:05:23 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488C7B30.3020503@rfc2549.org> Pawel Jakub Dawidek schrieb: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > So this build could import zfs version 3/zpool version 10? Just asking because I have a opensolaris box where could try FreeBSD in this case. Arne From freebsd-listen at fabiankeil.de Sun Jul 27 14:20:22 2008 From: freebsd-listen at fabiankeil.de (Fabian Keil) Date: Sun Jul 27 14:20:29 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080727161006.1f453d55@fabiankeil.de> Pawel Jakub Dawidek wrote: > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. Awesome, thanks. The patch applied cleanly, but unfortunately buildkernel fails for me with: ===> zfs (depend) @ -> /usr/src/sys machine -> /usr/src/sys/amd64/include awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -q awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -h make: don't know how to make u8_textprep.c. Stop *** Error code 2 Stop in /usr/src/sys/modules. *** Error code 1 Stop in /usr/obj/usr/src/sys/GENERIC. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. buildworld fails with: ===> sys/boot/i386/cdboot (cleandir) rm -f cdboot cdboot.o rm -f .depend GPATH GRTAGS GSYMS GTAGS ===> sys/boot/i386/gptboot (cleandir) rm -f gptboot gptldr.bin gptldr.out gptldr.o gptboot.bin gptboot.out gptboot.o sio.o machine ===> sys/boot/i386/zfsboot (cleandir) cd: can't cd to /usr/src/sys/boot/i386/zfsboot *** Error code 2 Stop in /usr/src/sys/boot/i386. *** Error code 1 Stop in /usr/src/sys/boot. *** Error code 1 Stop in /usr/src/sys. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. Also there are some warning: [fk@kendra /usr/src]$ time make buildkernel KERNCONF=GENERIC "/usr/src/Makefile", line 134: warning: duplicate script for target "machine" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "clean" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "cleandepend" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "distribute" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "lint" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "obj" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "objlink" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "tags" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "files" ignored "/usr/src/Makefile", line 306: warning: duplicate script for target "includes" ignored -------------------------------------------------------------- >>> Kernel build for GENERIC started on Sun Jul 27 15:59:40 CEST 2008 -------------------------------------------------------------- I'm using 8.0-CURRENT, my sources are from today your vfs_subr.c commits were the last changes I got. I tried on both i386 and amd64. Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/735ded82/signature.pgp From morganw at chemikals.org Sun Jul 27 14:48:24 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sun Jul 27 14:48:31 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: On Sun, 27 Jul 2008, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). Is this patch against -stable or -current? From lulf at stud.ntnu.no Sun Jul 27 15:27:45 2008 From: lulf at stud.ntnu.no (Ulf Lilleengen) Date: Sun Jul 27 15:27:56 2008 Subject: ZFS patches. In-Reply-To: <488C7B30.3020503@rfc2549.org> References: <20080727125413.GG1345@garage.freebsd.pl> <488C7B30.3020503@rfc2549.org> Message-ID: <20080727152724.GA3336@nobby.studby.ntnu.no> On Sun, Jul 27, 2008 at 03:42:08PM +0200, Arne Schwabe wrote: > Pawel Jakub Dawidek schrieb: > > Hi. > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > The patch above contains the most recent ZFS version that could be found > > in OpenSolaris as of today. Apart for large amount of new functionality, > > I belive there are many stability (and also performance) improvements > > compared to the version from the base system. > > > > > So this build could import zfs version 3/zpool version 10? Just asking > because I have a opensolaris box where could try FreeBSD in this case. > Yes, it supports zpool version 11 and zfs version 3. -- Ulf Lilleengen From lulf at stud.ntnu.no Sun Jul 27 15:32:04 2008 From: lulf at stud.ntnu.no (Ulf Lilleengen) Date: Sun Jul 27 15:32:11 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080727153144.GB3336@nobby.studby.ntnu.no> On Sun, Jul 27, 2008 at 09:48:19AM -0500, Wes Morgan wrote: > On Sun, 27 Jul 2008, Pawel Jakub Dawidek wrote: > > > Hi. > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > The patch above contains the most recent ZFS version that could be found > > in OpenSolaris as of today. Apart for large amount of new functionality, > > I belive there are many stability (and also performance) improvements > > compared to the version from the base system. > > > > Check out OpenSolaris website to find out the differences between base > > system version and patch version. > > > > Please test, test, test. If I get enough positive feedback, I may be > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > > > If you have any questions, please use mailing lists > > (freebsd-fs@FreeBSD.org would be the best). > > Is this patch against -stable or -current? -current -- Ulf Lilleengen From denis at h3q.com Sun Jul 27 16:14:09 2008 From: denis at h3q.com (Denis Ahrens) Date: Sun Jul 27 16:14:16 2008 Subject: zfs patch Message-ID: <86929730-717E-4B13-9EE8-857F796D77DE@h3q.com> hi did you add -p0 to the patch command? I did not first and had the errors you had. with -p0 as an option to the patch command everything works now Denis From freebsd-listen at fabiankeil.de Sun Jul 27 16:32:51 2008 From: freebsd-listen at fabiankeil.de (Fabian Keil) Date: Sun Jul 27 16:32:58 2008 Subject: zfs patch In-Reply-To: <86929730-717E-4B13-9EE8-857F796D77DE@h3q.com> References: <86929730-717E-4B13-9EE8-857F796D77DE@h3q.com> Message-ID: <20080727183246.5649e644@fabiankeil.de> Denis Ahrens wrote: > did you add -p0 to the patch command? I did not. Thanks. Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/403c30f2/signature.pgp From nork at FreeBSD.org Sun Jul 27 18:11:19 2008 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Sun Jul 27 18:11:25 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080728031115.b0ac0d07.nork@FreeBSD.org> On Sun, 27 Jul 2008 14:54:13 +0200 Pawel Jakub Dawidek wrote: > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. I read your patch. So I don't test, yet. But I noticed a minor issue of your patch. * NO NEED FOLLOWING PATCH * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- sys/cddl/contrib/opensolaris/common/atomic/ia64/atomic.S.orig 2008-03-29 07:16:08.000000000 +0900 +++ sys/cddl/contrib/opensolaris/common/atomic/ia64/atomic.S 2008-07-28 01:54:52.314417185 +0900 @@ -23,7 +23,7 @@ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * - * $FreeBSD: src/sys/cddl/contrib/opensolaris/common/atomic/ia64/atomic.S,v 1.3 2008/03/28 22:16:08 jb Exp $ + * $FreeBSD: src/sys/contrib/opensolaris/common/atomic/ia64/atomic.S,v 1.2 2007/06/08 16:20:03 marcel Exp $ */ #include - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I'm using following patch: 1. To support zpool for lsdev. 2. To support case of "slice has only zpool and no bsd partition". I don't test on new zfs, yet. I'll try to test it, too. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- sys/boot/i386/libi386/biosdisk.c~ 2008-02-29 02:49:23.000000000 +0900 +++ sys/boot/i386/libi386/biosdisk.c 2008-03-18 09:15:34.209096127 +0900 @@ -469,6 +469,7 @@ * unused. */ if ((lp->d_partitions[i].p_fstype == FS_BSDFFS) || + (lp->d_partitions[i].p_fstype == FS_ZFS) || (lp->d_partitions[i].p_fstype == FS_SWAP) || (lp->d_partitions[i].p_fstype == FS_VINUM) || ((lp->d_partitions[i].p_fstype == FS_UNUSED) && @@ -477,6 +478,7 @@ /* Only print out statistics in verbose mode */ if (verbose) sprintf(line, " %s%c: %s %s (%d - %d)\n", prefix, 'a' + i, + (lp->d_partitions[i].p_fstype == FS_ZFS) ? "ZFS " : (lp->d_partitions[i].p_fstype == FS_SWAP) ? "swap " : (lp->d_partitions[i].p_fstype == FS_VINUM) ? "vinum" : "FFS ", @@ -485,6 +487,7 @@ lp->d_partitions[i].p_offset + lp->d_partitions[i].p_size); else sprintf(line, " %s%c: %s\n", prefix, 'a' + i, + (lp->d_partitions[i].p_fstype == FS_ZFS) ? "ZFS" : (lp->d_partitions[i].p_fstype == FS_SWAP) ? "swap" : (lp->d_partitions[i].p_fstype == FS_VINUM) ? "vinum" : "FFS"); @@ -696,7 +699,12 @@ if (lp->d_magic != DISKMAGIC) { DEBUG("no disklabel"); +#if 0 return (ENOENT); +#else + od->od_flags &= ~BD_LABELOK; + od->od_boff = sector; /* no partition, must be after the slice */ +#endif } if (dev->d_kind.biosdisk.partition >= lp->d_npartitions) { DEBUG("partition '%c' exceeds partitions in table (a-'%c')", - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - From peter.schuller at infidyne.com Sun Jul 27 18:25:48 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sun Jul 27 18:25:55 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <20080726205118.GB1345@garage.freebsd.pl> References: <200807262005.54235.peter.schuller@infidyne.com> <20080726205118.GB1345@garage.freebsd.pl> Message-ID: <200807272026.54907.peter.schuller@infidyne.com> Hello, > The problem is that we don't between async and sync I/O request on GEOM > level, that's why I decided to commit a ZIL log after each write, which > wasn't very smart it seems. This is handled differently in version I've > in perforce. Could you try the below patch and see how it performs now? > > http://people.freebsd.org/~pjd/patches/zvol.c.patch The above (though the files has moved, for anyone else reading wanting to apply) does eliminate the synchronicity problem. I am now seeing 5-15 MB/second write speeds to the zvol, with 100% constituent disk utilization. I am not sure why I don't see faster writes; I get more like 40-60 when writing to a file in a ZFS file system on the same pool. But regardless, the synchronisity issue is gone. Does your comment above regarding distinguishing bewteen sync and asynch apply to the section of code affected by the above patch, or did you mean there is some other place above the zvol handling where there is lack of distinction? That is, is the end-effect of the above change that we *never* do synchronous writes (because the fact that a write is supposed to be synchronous is somehow lost before it reaches that point)? I understand a zil_commit is only required on BIO_FLUSH requests, which is what the patch fixes. But I get the impression from your phrasing above that the reason that a zil_commit was done on every I/O from the get go was in an effort to honor actual synchronous writes by conservatively *always* doing synchronous writes, because the synchronicity of synchronous writes would not be propagated down to the zvol class. I wouldn't want to sacrifice correctness just to get the speed ;) -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/6f356874/attachment.pgp From peter.schuller at infidyne.com Sun Jul 27 18:27:12 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sun Jul 27 18:27:19 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <20080726210319.GA57383@voi.aagh.net> References: <200807262005.54235.peter.schuller@infidyne.com> <20080726210319.GA57383@voi.aagh.net> Message-ID: <200807272028.22674.peter.schuller@infidyne.com> > WARNING: ZFS is considered to be an experimental feature in FreeBSD. > ZFS filesystem version 6 > > http://opensolaris.org/os/community/zfs/version/6/ I know of the 'version 6' bit, but that's just the on-disk format, not the version of the code base (unless I am misunderstanding something). -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/84791c83/attachment.pgp From delphij at delphij.net Sun Jul 27 18:32:53 2008 From: delphij at delphij.net (Xin LI) Date: Sun Jul 27 18:33:05 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488CBF49.10308@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Great work! For the record, this has fixed all of our known problem (multithread load causes panic, non-tuned loader.conf would be fragile on heavy load, etc) with ZFS as reported at the FreeBSD mailing lists or privately to Pawel, plus there is some performance improvements as compared with FreeBSD 7-STABLE (8-CURRENT with WITNESS, INVARIANTS off but userland malloc debugging options on). Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkiMv0kACgkQi+vbBBjt66AvhACfb00igV3cmes4i4b3jgksUEZg JVUAn0vgdyfsFooYL+xY6J9jOHQkwpag =qkbL -----END PGP SIGNATURE----- From delphij at delphij.net Sun Jul 27 18:37:36 2008 From: delphij at delphij.net (Xin LI) Date: Sun Jul 27 18:37:47 2008 Subject: ZFS patches. In-Reply-To: <488CBF49.10308@delphij.net> References: <20080727125413.GG1345@garage.freebsd.pl> <488CBF49.10308@delphij.net> Message-ID: <488CC066.2040800@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Xin LI wrote: | Great work! | | For the record, this has fixed all of our known problem (multithread | load causes panic, non-tuned loader.conf would be fragile on heavy load, | etc) with ZFS as reported at the FreeBSD mailing lists or privately to | Pawel, plus there is some performance improvements as compared with | FreeBSD 7-STABLE (8-CURRENT with WITNESS, INVARIANTS off but userland | malloc debugging options on). One note: our test environment is amd64 with 8GB of RAM; the pool is version 6. We have not done 'zpool upgrade' as we want to share it between 7.0-STABLE and 8-CURRENT for testing purposes. Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkiMwGYACgkQi+vbBBjt66A4NgCfesg95cVSx4lgrRdcCKL4VipA ns4AoJi5rSx5mhzKNj2ze2EzlHuuRc9o =WSpV -----END PGP SIGNATURE----- From max at love2party.net Sun Jul 27 18:46:38 2008 From: max at love2party.net (Max Laier) Date: Sun Jul 27 18:46:44 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <200807272034.01290.max@love2party.net> Hi Pawel, On Sunday 27 July 2008 14:54:13 Pawel Jakub Dawidek wrote: > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. nice! > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). Is this supposed to help with memory pressure on i386, too? Or do the caveats from the wiki still apply? I heard some anecdotal evidence that it would indeed help. Everybody, remember to use "patch -p0" - just bit me ... again. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From tom at hur.st Sun Jul 27 20:26:57 2008 From: tom at hur.st (Thomas Hurst) Date: Sun Jul 27 20:27:04 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <200807272028.22674.peter.schuller@infidyne.com> References: <200807262005.54235.peter.schuller@infidyne.com> <20080726210319.GA57383@voi.aagh.net> <200807272028.22674.peter.schuller@infidyne.com> Message-ID: <20080727200814.GA19914@voi.aagh.net> * Peter Schuller (peter.schuller@infidyne.com) wrote: > > WARNING: ZFS is considered to be an experimental feature in > > FreeBSD. ZFS filesystem version 6 > > > > http://opensolaris.org/os/community/zfs/version/6/ > > I know of the 'version 6' bit, but that's just the on-disk format, not > the version of the code base (unless I am misunderstanding something). Sure, but since FreeBSD's using version 6, chances are it's using code from around where it appeared in Solaris Nevada. snv_59 was using version 3, which didn't even support compression. It would be nice if this were documented, though, so we don't have to make vague guesses. -- Thomas 'Freaky' Hurst http://hur.st/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/1205e352/attachment.pgp From ivoras at freebsd.org Sun Jul 27 21:24:28 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Jul 27 21:24:36 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. I'm trying to test it, and my build fails at an unusual place - dtrace. I've verified that a clean cvsup tree compiles the dtrace modules, and it fails with the same tree patched with the above patch. Any ideas? The exact command for applying the patch was: wbench:/usr/src# patch -p0 < ~ivoras/zfs_20080727.patch The failure is: ===> dtmalloc (all) Warning: Object directory not changed from original /usr/src/sys/modules/dtrace/dtmalloc cc -O2 -fno-strict-aliasing -pipe -Werror -D_KERNEL -DKLD_MODULE -std=c99 -nostdinc -I/usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/compat/opensolaris -I/usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/contrib/opensolaris/uts/common -I/usr/src/sys/modules/dtrace/dtmalloc/../../.. -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestanding -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wno-unknown-pragmas -c /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/dev/dtmalloc/dtmalloc.c In file included from /usr/src/sys/modules/dtrace/dtmalloc/../../../sys/vnode.h:541, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/contrib/opensolaris/uts/common/sys/vnode.h:44, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/compat/opensolaris/sys/vnode.h:43, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/compat/opensolaris/sys/kobj.h:41, from /usr/src/sys/modules/dtrace/dtmalloc/../../../sys/linker.h:35, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/compat/opensolaris/sys/modctl.h:34, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/contrib/opensolaris/uts/common/sys/dtrace.h:50, from /usr/src/sys/modules/dtrace/dtmalloc/../../../cddl/dev/dtmalloc/dtmalloc.c:35: ./vnode_if.h:1161: error: expected specifier-qualifier-list before 'acl_type_t' ./vnode_if.h:1174: error: expected declaration specifiers or '...' before 'acl_type_t' cc1: warnings being treated as errors ./vnode_if.h:1177: warning: 'struct acl' declared inside parameter list ./vnode_if.h:1177: warning: its scope is only this definition or declaration, which is probably not what you want ./vnode_if.h: In function 'VOP_GETACL': ./vnode_if.h:1183: error: 'struct vop_getacl_args' has no member named 'a_type' ./vnode_if.h:1183: error: 'type' undeclared (first use in this function) This is -current from a few minutes ago, i386, GENERIC kernel with INVARIANTS and WITNESS (and their supporting options) removed. Excerpt from vnode_if.h: 1158 struct vop_getacl_args { 1159 struct vop_generic_args a_gen; 1160 struct vnode *a_vp; 1161 acl_type_t a_type; 1162 struct acl *a_aclp; 1163 struct ucred *a_cred; 1164 struct thread *a_td; 1165 }; -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080727/ce61a285/signature.pgp From roberto at keltia.freenix.fr Sun Jul 27 21:46:52 2008 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Sun Jul 27 21:47:11 2008 Subject: ZFS patches. In-Reply-To: <20080727153144.GB3336@nobby.studby.ntnu.no> References: <20080727125413.GG1345@garage.freebsd.pl> <20080727153144.GB3336@nobby.studby.ntnu.no> Message-ID: <20080727214649.GA98433@keltia.freenix.fr> According to Ulf Lilleengen: > > Is this patch against -stable or -current? > -current Has 7 & 8 diverged so much that it will not apply to 7-STABLE? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr Darwin sidhe.keltia.net Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; i386 From morganw at chemikals.org Sun Jul 27 22:56:34 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sun Jul 27 22:56:40 2008 Subject: ZFS patches. In-Reply-To: <20080727214649.GA98433@keltia.freenix.fr> References: <20080727125413.GG1345@garage.freebsd.pl> <20080727153144.GB3336@nobby.studby.ntnu.no> <20080727214649.GA98433@keltia.freenix.fr> Message-ID: On Sun, 27 Jul 2008, Ollivier Robert wrote: > According to Ulf Lilleengen: >>> Is this patch against -stable or -current? >> -current > > Has 7 & 8 diverged so much that it will not apply to 7-STABLE? > Well, the patch doesn't apply. Not sure how far they have diverged. I'd love to test it, but I stayed with 7 when it was branched to -current. From nork at FreeBSD.org Mon Jul 28 03:03:05 2008 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Mon Jul 28 03:03:11 2008 Subject: ZFS patches. In-Reply-To: <20080728031115.b0ac0d07.nork@FreeBSD.org> References: <20080727125413.GG1345@garage.freebsd.pl> <20080728031115.b0ac0d07.nork@FreeBSD.org> Message-ID: <20080728120300.4196ea62.nork@FreeBSD.org> Hi pjd! I could upgrad to your new zfs environment with NO WORRY and NO TROUBLE:-). I can still boot my PC from zfs. I'll try to stress test. Thank you! From malus.x at gmail.com Mon Jul 28 05:21:29 2008 From: malus.x at gmail.com (David Grochowski) Date: Mon Jul 28 05:21:36 2008 Subject: ZFS patches. In-Reply-To: <20080728032427.GN79560@egr.msu.edu> References: <20080727125413.GG1345@garage.freebsd.pl> <20080728032427.GN79560@egr.msu.edu> Message-ID: Hey, On Sun, Jul 27, 2008 at 11:24 PM, Adam McDougall wrote: > > On Sun, Jul 27, 2008 at 02:54:13PM +0200, Pawel Jakub Dawidek wrote: > > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch applied fine for me, but I get a compile error part way through > a buildworld. My world/kernel is from -current from Thursday, I also > tried csupping HEAD first, cleaning up my source tree, doing things with > headers I shouldn't (which I will revert). To the best of my weak knowledge > of C, it seems like ace_t should be fine (I tried to trace it through the > includes). Am I doing something wrong? Also, is this patch expected to > apply to 7? (I can find out for myself if I don't hear). Thanks. > > cc -O2 -pipe -I/usr/src/cddl/lib/libzpool/../../../sys/cddl/compat/opensolaris > -I/usr/src/cddl/lib/libzpool/../../../cddl/compat/opensolaris/include > -I/usr/src/cddl/lib/libzpool/../../../cddl/compat/opensolaris/lib/libumem > -I/usr/src/cddl/lib/libzpool/../../../cddl/contrib/opensolaris/lib/libzpool/common > -I/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys > -I/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs > -I/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/common/zfs > -I/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common > -I/usr/src/cddl/lib/libzpool/../../../cddl/contrib/opensolaris/head > -I/usr/src/cddl/lib/libzpool/../../../cddl/lib/libumem > -I/usr/src/cddl/lib/libzpool/../../../cddl/contrib/opensolaris/lib/libnvpair -DWANTS_MUTEX_OWNED > -I/usr/src/cddl/lib/libzpool/../../../lib/libpthread/thread > -I/usr/src/cddl/lib/libzpool/../../../lib/libpthread/sys > -I/usr/src/cddl/lib/libzpool/../../../lib/libthr/arch/amd64/include -fstack-protector > -Wno-unknown-pragmas -c > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:35: > error: expected ')' before '*' token > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c: > In function 'zfs_oldacl_byteswap': > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:127: > error: 'ace_t' undeclared (first use in this function) > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:127: > error: (Each undeclared identifier is reported only once > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:127: > error: for each function it appears in.) > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:129: > error: expected expression before ')' token > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c: > In function 'zfs_znode_byteswap': > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:177: > error: 'ace_t' undeclared (first use in this function) > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:177: > error: expected expression before ')' token > *** Error code 1 > > Stop in /usr/src/cddl/lib/libzpool. > *** Error code 1 > > Stop in /usr/src/cddl/lib. > *** Error code 1 > > Stop in /usr/src. > *** Error code 1 > > Stop in /usr/src. > *** Error code 1 > > Stop in /usr/src. > *** Error code 1 > > Stop in /usr/src. I had the same issue. Try deleting "/usr/src/sys/cddl/compat/opensolaris/sys/acl.h" and "/usr/src/sys/cddl/compat/opensolaris/sys/callb.h" (make sure that these files have a length of zero first!). When patching, these files are supposed to be deleted, but were instead left as empty files. Since these files are included before the actual ones in "/usr/src/sys/cddl/contrib/opensolaris/uts/common/sys", this will cause a problem. Also, I would like to note that the patch has been working for me without any problems. Sincerely, Dave Grochowski From pjd at FreeBSD.org Mon Jul 28 08:33:06 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Jul 28 08:33:12 2008 Subject: ZFS patches. In-Reply-To: <200807272034.01290.max@love2party.net> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> Message-ID: <20080728083303.GD2953@garage.freebsd.pl> On Sun, Jul 27, 2008 at 08:34:00PM +0200, Max Laier wrote: > Hi Pawel, > > On Sunday 27 July 2008 14:54:13 Pawel Jakub Dawidek wrote: > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > The patch above contains the most recent ZFS version that could be found > > in OpenSolaris as of today. Apart for large amount of new functionality, > > I belive there are many stability (and also performance) improvements > > compared to the version from the base system. > > nice! > > > Check out OpenSolaris website to find out the differences between base > > system version and patch version. > > > > Please test, test, test. If I get enough positive feedback, I may be > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > > > If you have any questions, please use mailing lists > > (freebsd-fs@FreeBSD.org would be the best). > > Is this supposed to help with memory pressure on i386, too? Or do the caveats > from the wiki still apply? I heard some anecdotal evidence that it would > indeed help. Yes, it should fix most if not all 'kmem_map too small' panics, at least from what I tried. Tunning kmem_size is still needed to get better performance. > Everybody, remember to use "patch -p0" - just bit me ... again. Grr, forgot to mention that, sorry. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080728/e18c442f/attachment.pgp From bugmaster at FreeBSD.org Mon Jul 28 11:06:56 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 28 11:07:40 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200807281106.m6SB6tUK078894@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li 8 problems total. From max at love2party.net Mon Jul 28 12:54:42 2008 From: max at love2party.net (Max Laier) Date: Mon Jul 28 12:54:56 2008 Subject: ZFS patches. In-Reply-To: <20080728083303.GD2953@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> <20080728083303.GD2953@garage.freebsd.pl> Message-ID: <200807281454.36892.max@love2party.net> On Monday 28 July 2008 10:33:03 Pawel Jakub Dawidek wrote: > On Sun, Jul 27, 2008 at 08:34:00PM +0200, Max Laier wrote: > > Hi Pawel, > > > > On Sunday 27 July 2008 14:54:13 Pawel Jakub Dawidek wrote: > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > > > The patch above contains the most recent ZFS version that could be > > > found in OpenSolaris as of today. Apart for large amount of new > > > functionality, I belive there are many stability (and also performance) > > > improvements compared to the version from the base system. > > > > nice! > > > > > Check out OpenSolaris website to find out the differences between base > > > system version and patch version. > > > > > > Please test, test, test. If I get enough positive feedback, I may be > > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > > > > > If you have any questions, please use mailing lists > > > (freebsd-fs@FreeBSD.org would be the best). > > > > Is this supposed to help with memory pressure on i386, too? Or do the > > caveats from the wiki still apply? I heard some anecdotal evidence that > > it would indeed help. > > Yes, it should fix most if not all 'kmem_map too small' panics, at least > from what I tried. Tunning kmem_size is still needed to get better > performance. With the i386 default settings it was not too hard to get the attached panic. Some cpdup and rm cycles of src and ports to a single disk zfs pool. With 512M I haven't been able to kill it, yet. > > Everybody, remember to use "patch -p0" - just bit me ... again. > > Grr, forgot to mention that, sorry. No problem, wasn't meant as criticism but as a community service rather. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News -------------- next part -------------- Script started on Mon Jul 28 13:55:02 2008 You have mail. fbsd8# kgdb -n 0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Unread portion of the kernel message buffer: panic: kmem_malloc(131072): kmem_map too small: 333963264 total allocated cpuid = 0 KDB: enter: panic Physical memory: 2026 MB Dumping 456 MB: 441 425 409 393 377 361 345 329 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41 25 9 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/snd_hda.ko...Reading symbols from /boot/kernel/snd_hda.ko.symbols...done. done. Loaded symbols for /boot/kernel/snd_hda.ko Reading symbols from /boot/kernel/sound.ko...Reading symbols from /boot/kernel/sound.ko.symbols...done. done. Loaded symbols for /boot/kernel/sound.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/linprocfs.ko #0 doadump () at pcpu.h:196 196 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:196 #1 0xc0496429 in db_fncall (dummy1=-1060091392, dummy2=0, dummy3=3, dummy4=0xe85ef49c "\200l\223?") at /usr/src/sys/ddb/db_command.c:516 #2 0xc04969d8 in db_command (last_cmdp=0xc0c56ef0, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413 #3 0xc0496b0a in db_command_loop () at /usr/src/sys/ddb/db_command.c:466 #4 0xc04982fd in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228 #5 0xc07beea6 in kdb_trap (type=3, code=0, tf=0xe85ef644) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xc0ab5b4b in trap (frame=0xe85ef644) at /usr/src/sys/i386/i386/trap.c:683 #7 0xc0a999ab in calltrap () at /usr/src/sys/i386/i386/exception.s:165 #8 0xc07bf02a in kdb_enter (why=0xc0b46703 "panic", msg=0xc0b46703 "panic") at cpufunc.h:60 #9 0xc0791dfc in panic ( fmt=0xc0b6c49b "kmem_malloc(%ld): kmem_map too small: %ld total allocated") at /usr/src/sys/kern/kern_shutdown.c:556 #10 0xc09cd520 in kmem_malloc (map=0xc1c71084, size=131072, flags=2) at /usr/src/sys/vm/vm_kern.c:303 #11 0xc09c4c77 in page_alloc (zone=0x0, bytes=131072, pflag=0xe85ef73b "\002", wait=2) at /usr/src/sys/vm/uma_core.c:959 #12 0xc09c5cd0 in uma_large_malloc (size=131072, wait=2) at /usr/src/sys/vm/uma_core.c:2713 #13 0xc0780978 in malloc (size=131072, mtp=0xc0f4c9a0, flags=2) at /usr/src/sys/kern/kern_malloc.c:393 ---Type to continue, or q to quit--- #14 0xc0e5a980 in zfs_kmem_alloc (size=131072, kmflags=2) at /usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/kern/opensolaris_kmem.c:74 #15 0xc0ecd7a9 in zio_buf_alloc (size=131072) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:248 #16 0xc0e6dda7 in arc_get_data_buf (buf=0xd0fd3a28) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2134 #17 0xc0e6df15 in arc_buf_alloc (spa=0xc6001000, size=131072, tag=0xd79db088, type=ARC_BUFC_METADATA) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1143 #18 0xc0e73004 in dbuf_read (db=0xd79db088, zio=0xd974c480, flags=2) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:488 #19 0xc0e7787a in dmu_buf_hold (os=0xc6471c18, object=44, offset=10616832, tag=0xc660d820, dbp=0xc660d850) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:107 #20 0xc0e70b4b in bplist_cache (bpl=0xc660d820, blkid=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/bplist.c:138 #21 0xc0e70be6 in bplist_enqueue (bpl=0xc660d820, bp=0xceaa3040, ---Type to continue, or q to quit--- tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/bplist.c:199 #22 0xc0e8e111 in dsl_dataset_block_kill (ds=0xc660d800, bp=0xceaa3040, pio=0xd95196c0, tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:170 #23 0xc0e86cc5 in free_blocks (dn=0xd7fe71c8, bp=0xceaa3040, num=1, tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:125 #24 0xc0e8786c in dnode_sync (dn=0xd7fe71c8, tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:320 #25 0xc0e7dcf7 in dmu_objset_sync_dnodes (list=0xc660db7c, tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:791 #26 0xc0e7de66 in dmu_objset_sync (os=0xc660da00, pio=0xd8138240, tx=0xccade200) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:882 #27 0xc0e92378 in dsl_pool_sync (dp=0xc670f400, txg=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:295 ---Type to continue, or q to quit--- #28 0xc0ea533e in spa_sync (spa=0xc6001000, txg=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3936 #29 0xc0eae66d in txg_sync_thread (arg=0xc670f400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:360 #30 0xc0770ba8 in fork_exit (callout=0xc0eae280 , arg=0xc670f400, frame=0xe85efd38) at /usr/src/sys/kern/kern_fork.c:810 #31 0xc0a99a20 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 (kgdb) q fbsd8# ^Dexit Script done on Mon Jul 28 13:55:27 2008 From pjd at FreeBSD.org Mon Jul 28 12:57:18 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Jul 28 12:57:24 2008 Subject: ZFS patches. In-Reply-To: <200807281454.36892.max@love2party.net> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> <20080728083303.GD2953@garage.freebsd.pl> <200807281454.36892.max@love2party.net> Message-ID: <20080728125711.GH2953@garage.freebsd.pl> On Mon, Jul 28, 2008 at 02:54:36PM +0200, Max Laier wrote: > On Monday 28 July 2008 10:33:03 Pawel Jakub Dawidek wrote: > > Yes, it should fix most if not all 'kmem_map too small' panics, at least > > from what I tried. Tunning kmem_size is still needed to get better > > performance. > > With the i386 default settings it was not too hard to get the attached panic. > Some cpdup and rm cycles of src and ports to a single disk zfs pool. With > 512M I haven't been able to kill it, yet. I was probably too optimistic. The default kmem_size is probably just too low. I'm quite sure it would be too low even for Solaris. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080728/8147d421/attachment.pgp From outbackdingo at gmail.com Mon Jul 28 13:26:29 2008 From: outbackdingo at gmail.com (OutBackdingo) Date: Mon Jul 28 13:26:35 2008 Subject: ZFS patches. In-Reply-To: <20080728125711.GH2953@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> <20080728083303.GD2953@garage.freebsd.pl> <200807281454.36892.max@love2party.net> <20080728125711.GH2953@garage.freebsd.pl> Message-ID: <1217250051.6657.0.camel@dingo-laptop> So are we saying that i386 with a default kmem of 512MB has gotten psuedo stable with some load? On Mon, 2008-07-28 at 14:57 +0200, Pawel Jakub Dawidek wrote: > On Mon, Jul 28, 2008 at 02:54:36PM +0200, Max Laier wrote: > > On Monday 28 July 2008 10:33:03 Pawel Jakub Dawidek wrote: > > > Yes, it should fix most if not all 'kmem_map too small' panics, at least > > > from what I tried. Tunning kmem_size is still needed to get better > > > performance. > > > > With the i386 default settings it was not too hard to get the attached panic. > > Some cpdup and rm cycles of src and ports to a single disk zfs pool. With > > 512M I haven't been able to kill it, yet. > > I was probably too optimistic. The default kmem_size is probably just > too low. I'm quite sure it would be too low even for Solaris. > From lists at jnielsen.net Mon Jul 28 16:15:37 2008 From: lists at jnielsen.net (John Nielsen) Date: Mon Jul 28 16:15:44 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> <20080728032427.GN79560@egr.msu.edu> Message-ID: <200807281139.45771.lists@jnielsen.net> On Monday 28 July 2008, David Grochowski wrote: > Hey, > > On Sun, Jul 27, 2008 at 11:24 PM, Adam McDougall wrote: > > On Sun, Jul 27, 2008 at 02:54:13PM +0200, Pawel Jakub Dawidek wrote: > > > > > > Hi. > > > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > Stop in /usr/src. > > I had the same issue. Try deleting > "/usr/src/sys/cddl/compat/opensolaris/sys/acl.h" and > "/usr/src/sys/cddl/compat/opensolaris/sys/callb.h" (make sure that > these files have a length of zero first!). When patching, these files > are supposed to be deleted, but were instead left as empty files. > Since these files are included before the actual ones in > "/usr/src/sys/cddl/contrib/opensolaris/uts/common/sys", this will > cause a problem. > > Also, I would like to note that the patch has been working for me > without any problems. Thanks for pointing this out David, I had been scratching my head too. (Also thanks to those who posted reminders to use patch -p0). I'm now up and running with the patch and an upgraded zpool. No issues thus far. I even tried to reproduce the UDP NFS write lockup issue I reported recently and was unable to. Thanks PJD! JN From sfourman at gmail.com Mon Jul 28 16:19:28 2008 From: sfourman at gmail.com (Sam Fourman Jr.) Date: Mon Jul 28 16:19:34 2008 Subject: ZFS patches. In-Reply-To: <20080728125711.GH2953@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> <20080728083303.GD2953@garage.freebsd.pl> <200807281454.36892.max@love2party.net> <20080728125711.GH2953@garage.freebsd.pl> Message-ID: <11167f520807280853u135eb813r20eb6d78734344b@mail.gmail.com> > I was probably too optimistic. The default kmem_size is probably just > too low. I'm quite sure it would be too low even for Solaris. In your estimation what is a good setting for kmem_size ? Sam Fourman Jr. From pgollucci at p6m7g8.com Mon Jul 28 19:50:04 2008 From: pgollucci at p6m7g8.com (Philip M. Gollucci) Date: Mon Jul 28 19:50:21 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488E2090.6020707@p6m7g8.com> Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). I have the go-ahead to try this on svn.apache.org and/or svn.eu.apache.org. I won't have anytime to do it until Late next week at the earliest though. From beat at chruetertee.ch Mon Jul 28 20:06:00 2008 From: beat at chruetertee.ch (=?ISO-8859-1?Q?Beat_G=E4tzi?=) Date: Mon Jul 28 20:06:07 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488E246D.2030508@chruetertee.ch> Hi, Pawel Jakub Dawidek wrote: > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. Thanks for the great work! > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. I have a amd64 box with 8GB RAM running CURRENT-200806 snapshot. I get the latest version of the sources with csup, applied your patch and build the world/kernel. /usr/src and /usr/obj are located on a zfs file system. After "make installkernel" and reboot into single user mode I had to start the zfs file system but it failed: # fsck # mount -a # /etc/rc.d/hostid start Setting hostuuid: ... Setting hostid: ... # /etc/rc.d/zfs start lock order reversal: 1st 0xffffff0004832620 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2053 2nd 0xffffffff80b09da0 kernel linker (kernel linker) @ /usr/src/sys/kern/kern_linker.c:693 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x609 _sx_xlock() at _sx_xlock+0x52 linker_file_lookup_set() at linker_file_lookup_set+0xe1 linker_file_register_sysctls() at linker_file_register_sysctls+0x20 linker_load_module() at linker_load_module+0x919 linker_load_dependencies() at linker_load_dependencies+0x1bc link_elf_load_file() at link_elf_load_file+0xa96 linker_load_module() at linker_load_module+0x8cf kern_kldload() at kern_kldload+0xac kldload() at kldload+0x84 syscall() at syscall+0x1bf Xfast_syscall() at Xfast_syscall+0xab --- syscall (304, FreeBSD ELF64, kldload), rip = 0x80068561c, rsp = 0x7fffffffec88, rbp = 0 --- This module (opensolaris) contains code covered by the Common Development and Distribution License (CDDL) see http://opensolaris.org/os/licensing/opensolaris_license/ WARNING: ZFS is considered to be an experimental feature in FreeBSD. ZFS filesystem version 11 ZFS storage pool version 11 internal error: out of memory internal error: out of memory internal error: out of memory internal error: out of memory Running "zpool list" shows no available pool and the "internal error: out of memory" error message. The same problem occurs in multi-user mode. loader.conf is set to: vm.kmem_size_max="2147483648" vm.kmem_size="2147483648" Increase/remove the kmem_size-values didn't change anything. To solve the problem I had to boot kernel.old and run make installworld/mergemaster. After rebooting with the new kernel the pool was available again and everything work without a problem. Did I do something wrong when I upgraded the server? Regards, Beat From jorn at wcborstel.com Mon Jul 28 20:23:00 2008 From: jorn at wcborstel.com (Jorn Argelo) Date: Mon Jul 28 20:23:13 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488E26C8.3040306@wcborstel.com> Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > > First of all PJD and all of the people involved with ZFS for FreeBSD, thanks a lot for all your efforts. I'm a happy user of ZFS :-) Anyway, I was wondering ... is this patch also applicable for 7.0-RELEASE, or is it only for -CURRENT? If it's the former I'll go ahead and apply the patch to see if I run into any problems. I just don't have a kernel debugger enabled, nor do I have WITNESS in the kernel (for obvious reasons). I don't know if this is a major problem or not. Sorry for the perhaps RTFM questions - I'm usually not really into this sort of stuff, but I'd like to help out where possible. Thanks, Jorn From pjd at FreeBSD.org Mon Jul 28 20:53:33 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Jul 28 20:53:40 2008 Subject: ZFS patches. In-Reply-To: <488E26C8.3040306@wcborstel.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488E26C8.3040306@wcborstel.com> Message-ID: <20080728205324.GC2740@garage.freebsd.pl> On Mon, Jul 28, 2008 at 10:06:32PM +0200, Jorn Argelo wrote: > Pawel Jakub Dawidek wrote: > >Hi. > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > >The patch above contains the most recent ZFS version that could be found > >in OpenSolaris as of today. Apart for large amount of new functionality, > >I belive there are many stability (and also performance) improvements > >compared to the version from the base system. > > > >Check out OpenSolaris website to find out the differences between base > >system version and patch version. > > > >Please test, test, test. If I get enough positive feedback, I may be > >able to squeeze it into 7.1-RELEASE, but this might be hard. > > > >If you have any questions, please use mailing lists > >(freebsd-fs@FreeBSD.org would be the best). > > > >Thank you in advance! > > > > > First of all PJD and all of the people involved with ZFS for FreeBSD, > thanks a lot for all your efforts. I'm a happy user of ZFS :-) > > Anyway, I was wondering ... is this patch also applicable for > 7.0-RELEASE, or is it only for -CURRENT? If it's the former I'll go > ahead and apply the patch to see if I run into any problems. I just > don't have a kernel debugger enabled, nor do I have WITNESS in the > kernel (for obvious reasons). I don't know if this is a major problem or > not. > > Sorry for the perhaps RTFM questions - I'm usually not really into this > sort of stuff, but I'd like to help out where possible. The patch is against HEAD and HEAD only. Don't expect patch against 7-STABLE soon. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080728/7b72fd96/attachment.pgp From sven at dmv.com Mon Jul 28 20:55:32 2008 From: sven at dmv.com (Sven W) Date: Mon Jul 28 20:55:44 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488E2B51.2010709@dmv.com> Pawel Jakub Dawidek presumably uttered the following on 07/27/08 08:54: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > Is there anyway to apply this to a 7.0-RELEASE-p2 cleanly? (i.e. previous patches, caveats, etc?) I would be interested in seeing if this patch fixes the issue of system locks when trying to do a large write (100+MB) to a zpool with ggate when the remote device is down). Sven From max at love2party.net Mon Jul 28 21:16:41 2008 From: max at love2party.net (Max Laier) Date: Mon Jul 28 21:16:48 2008 Subject: allow vs. usermount [Re: ZFS patches.] In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <200807282316.37722.max@love2party.net> On Sunday 27 July 2008 14:54:13 Pawel Jakub Dawidek wrote: > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). Short exercise: | $ whoami | mlaier | $ zfs list | NAME USED AVAIL REFER MOUNTPOINT | tank 104K 228G 19K /tank | tank/mlaier 18K 228G 18K /tank/mlaier | $ zfs allow tank/mlaier | ------------------------------------------------------------- | Local+Descendent permissions on (tank/mlaier) | user mlaier create,destroy,mount,snapshot | ------------------------------------------------------------- | $ zfs create tank/mlaier/test | cannot mount 'tank/mlaier/test': Insufficient privileges | filesystem successfully created, but not mounted This is obviously due to the check in vfs_mount.c patched line 851: if (jailed(td->td_ucred) || usermount == 0) { the question is, should this be tuned to allow for the finer grained zfs permissions to take effect or will we force usermount to use zfs allow mount? -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From fbsd-fs at mawer.org Mon Jul 28 21:28:23 2008 From: fbsd-fs at mawer.org (Antony Mawer) Date: Mon Jul 28 21:28:36 2008 Subject: ZFS patches. In-Reply-To: <1217250051.6657.0.camel@dingo-laptop> References: <20080727125413.GG1345@garage.freebsd.pl> <200807272034.01290.max@love2party.net> <20080728083303.GD2953@garage.freebsd.pl> <200807281454.36892.max@love2party.net> <20080728125711.GH2953@garage.freebsd.pl> <1217250051.6657.0.camel@dingo-laptop> Message-ID: <488E378B.40406@mawer.org> I have a single-disk i386 system with 1GB RAM, 512MB kmem, which now appears to be perfectly stable... It's a Via Epia system (C3 1GHz) so it's not exactly high-end hardware. This is running 7-STABLE from ~16 May 2008 (without the most recent patches applied), and the following configuration in /boot/loader.conf: # Root on ZFS zfs_load="YES" vfs.root.mountfrom="zfs:tank" # Tune kernel KVA space vm.kmem_size="512M" vm.kmem_size_max="512M" # Tune ZFS arc and vdev cache sizes vfs.zfs.arc_min="16M" vfs.zfs.arc_max="40M" vfs.zfs.vdev.cache.size="5M" # Disable prefetch to improve performance vfs.zfs.prefetch_disable="1" Some of these tunings may be superfluous, but until I added them recently the box never lasted more than 2-3 days. It's now been up for 3 weeks without a panic. Kernel memory usage is comfortable: # kmem TEXT=6285036, 5.99388 MB DATA=93330432, 89.0068 MB TOTAL=99615468, 95.0007 MB The kmem script is the one posted on the FreeBSD wiki. The highest I have seen it climb is ~102mb, which to me suggests I can afford to tune the vfs.zfs.arc_max value higher (I started out with a conservatively low value and planned to tune and tweak from there based on observing kernel memory usage). --Antony OutBackdingo wrote: > So are we saying that i386 with a default kmem of 512MB has gotten > psuedo stable with some load? > > On Mon, 2008-07-28 at 14:57 +0200, Pawel Jakub Dawidek wrote: >> On Mon, Jul 28, 2008 at 02:54:36PM +0200, Max Laier wrote: >>> On Monday 28 July 2008 10:33:03 Pawel Jakub Dawidek wrote: >>>> Yes, it should fix most if not all 'kmem_map too small' panics, at least >>>> from what I tried. Tunning kmem_size is still needed to get better >>>> performance. >>> With the i386 default settings it was not too hard to get the attached panic. >>> Some cpdup and rm cycles of src and ports to a single disk zfs pool. With >>> 512M I haven't been able to kill it, yet. >> I was probably too optimistic. The default kmem_size is probably just >> too low. I'm quite sure it would be too low even for Solaris. From pjd at FreeBSD.org Mon Jul 28 21:35:06 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Jul 28 21:35:12 2008 Subject: allow vs. usermount [Re: ZFS patches.] In-Reply-To: <200807282316.37722.max@love2party.net> References: <20080727125413.GG1345@garage.freebsd.pl> <200807282316.37722.max@love2party.net> Message-ID: <20080728213500.GD2740@garage.freebsd.pl> On Mon, Jul 28, 2008 at 11:16:37PM +0200, Max Laier wrote: > On Sunday 27 July 2008 14:54:13 Pawel Jakub Dawidek wrote: > > If you have any questions, please use mailing lists > > (freebsd-fs@FreeBSD.org would be the best). > > Short exercise: > | $ whoami > | mlaier > | $ zfs list > | NAME USED AVAIL REFER MOUNTPOINT > | tank 104K 228G 19K /tank > | tank/mlaier 18K 228G 18K /tank/mlaier > | $ zfs allow tank/mlaier > | ------------------------------------------------------------- > | Local+Descendent permissions on (tank/mlaier) > | user mlaier create,destroy,mount,snapshot > | ------------------------------------------------------------- > | $ zfs create tank/mlaier/test > | cannot mount 'tank/mlaier/test': Insufficient privileges > | filesystem successfully created, but not mounted > > This is obviously due to the check in vfs_mount.c patched line 851: > > if (jailed(td->td_ucred) || usermount == 0) { > > the question is, should this be tuned to allow for the finer grained zfs > permissions to take effect or will we force usermount to use zfs allow mount? Current plan is to document it in the same way ZFS within a jail is documented in zfs(8). Yes, one needs to set vfs.usermount=1 by hand. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080728/e92a1b6a/attachment.pgp From ivoras at freebsd.org Tue Jul 29 00:15:07 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Jul 29 00:15:14 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. I currently don't have high-end (4 CPU+) AMD64 machines to test, but with 1 CPU i386 virtual machine in VMWare, with 1 GB of memory, kmem_size=kmem_size_max=512M and no other tuning, with latest zpool format (11) it took about 15 minutes to get a "kmem_map too small" panic on a mixed load (buildkernel + blogbench + bonnie++). I've then tried the same load on the "real" hardware, 2 CPU, 2 GB memory, kmem_size=kmem_size_max=512M, and no other tuning, with the older zpool format (6) i get the same panic, though it takes about twice as long to happen. In both cases, iostat was running and I noticed there's about 30 seconds of complete inactivity (CPU 100% idle, no IO on any drives) just before the panic. Locking issue? In the second case I was also monitoring the system more closely and before the inactivity period the IO bandwidth gets really slow, considering the type of load I'm generating: cca 2 MB/s, with all tasks except bonnnie++ stopped (SIGSTOP), and bonnie++ generating large-block writes. This is what provoked the panic in the second case. Core dumps are available, as always. But, overall, I see a definite improvement here. Before the new patch I could panic the machine within a minute and now it can survive much more beating. If the other problems (deadlocks) are solved, I'd say it's worth the effort to get it in 7.1 - considering what's in 7.0, any improvement helps. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080729/68a617e8/signature.pgp From fbsd-fs at mawer.org Tue Jul 29 00:42:05 2008 From: fbsd-fs at mawer.org (Antony Mawer) Date: Tue Jul 29 00:42:22 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488E647C.7@mawer.org> Ivan Voras wrote: > Pawel Jakub Dawidek wrote: >> Hi. >> >> http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >> >> The patch above contains the most recent ZFS version that could be found >> in OpenSolaris as of today. Apart for large amount of new functionality, >> I belive there are many stability (and also performance) improvements >> compared to the version from the base system. >> >> Check out OpenSolaris website to find out the differences between base >> system version and patch version. >> >> Please test, test, test. If I get enough positive feedback, I may be >> able to squeeze it into 7.1-RELEASE, but this might be hard. > > I currently don't have high-end (4 CPU+) AMD64 machines to test, but > with 1 CPU i386 virtual machine in VMWare, with 1 GB of memory, > kmem_size=kmem_size_max=512M and no other tuning, with latest zpool > format (11) it took about 15 minutes to get a "kmem_map too small" panic > on a mixed load (buildkernel + blogbench + bonnie++). > > I've then tried the same load on the "real" hardware, 2 CPU, 2 GB > memory, kmem_size=kmem_size_max=512M, and no other tuning, with the > older zpool format (6) i get the same panic, though it takes about twice > as long to happen. Have you tried tuning arc_max and/or monitoring vmstat -m to see what is happening? What does arc_max get auto-tuned to at the moment (ie. without manually specifying)? One of the things I recall reading that arc_max is more like a guide, as some ZFS threads can exceed the max whilst other thread(s) go around cleaning up and freeing memory once the limit is hit. Maybe some better smarts are needed in auto-tuning arc_max so that it leaves more of a buffer zone than it does at the moment...? --Antony From ivoras at freebsd.org Tue Jul 29 00:51:10 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Jul 29 00:51:16 2008 Subject: ZFS patches. In-Reply-To: <488E647C.7@mawer.org> References: <20080727125413.GG1345@garage.freebsd.pl> <488E647C.7@mawer.org> Message-ID: Antony Mawer wrote: > Ivan Voras wrote: >> I currently don't have high-end (4 CPU+) AMD64 machines to test, but >> with 1 CPU i386 virtual machine in VMWare, with 1 GB of memory, >> kmem_size=kmem_size_max=512M and no other tuning, with latest zpool >> format (11) it took about 15 minutes to get a "kmem_map too small" >> panic on a mixed load (buildkernel + blogbench + bonnie++). >> >> I've then tried the same load on the "real" hardware, 2 CPU, 2 GB >> memory, kmem_size=kmem_size_max=512M, and no other tuning, with the >> older zpool format (6) i get the same panic, though it takes about >> twice as long to happen. > > Have you tried tuning arc_max and/or monitoring vmstat -m to see what is > happening? What does arc_max get auto-tuned to at the moment (ie. > without manually specifying)? > > One of the things I recall reading that arc_max is more like a guide, as > some ZFS threads can exceed the max whilst other thread(s) go around > cleaning up and freeing memory once the limit is hit. > > Maybe some better smarts are needed in auto-tuning arc_max so that it > leaves more of a buffer zone than it does at the moment...? I think speculation in the same direction was discussed with the original port of ZFS - I don't know the details but if it arc_max could be better auto-tuned, I think it should be by now. I'm more concerned about the "quiet period" before the panic. I notice ZFS threads have the same priority (PRI field in top) as userland threads (e.g. 44, 55...), while GEOM threads have it different (-8). I don't have the McCusicks book about it so I don't know how priorities whould work, but is this situation OK? I will monitor vmstate -m if it will help Pawel. Should I? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080729/3fe9e39a/signature.pgp From mcdouga9 at egr.msu.edu Tue Jul 29 01:52:37 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Tue Jul 29 01:52:44 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> <20080728032427.GN79560@egr.msu.edu> Message-ID: <20080729013623.GF79560@egr.msu.edu> Thanks, that worked! My laptop is upgraded and running from it with root mounted from zfs. It seems like there is less compulsory disk activity like there used to be every ~3 seconds when idle, that is welcome. I've only ran my laptop with it about 30 minutes so far, two different boots. The first time I shut down I think I saw around 10-15 unexpected messages something like zfs_umount: force unmount not supported, removing FORCE flag. No problems though, it was probably one for each of my 13 zfs mounts. You got my vote for this to be committed to -current. I'll be loading this onto a few more systems as I get a chance, all of which I'll have to upgrade to -current and don't mind doing for a worthwhile reason, including a download mirror server and a backups server. I'll probably hold off on upgrading the fs version to 11 until I don't need to patch the source, incase I forget to on an upgrade. I do have some others running ZFS without problem that will have to wait for a 7.x patch but since they are running fine, I can wait as long as needed. On Mon, Jul 28, 2008 at 12:55:49AM -0400, David Grochowski wrote: Hey, On Sun, Jul 27, 2008 at 11:24 PM, Adam McDougall wrote: > > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:35: > error: expected ')' before '*' token > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c: > In function 'zfs_oldacl_byteswap': > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c:127: > error: 'ace_t' undeclared (first use in this function) I had the same issue. Try deleting "/usr/src/sys/cddl/compat/opensolaris/sys/acl.h" and "/usr/src/sys/cddl/compat/opensolaris/sys/callb.h" (make sure that these files have a length of zero first!). When patching, these files are supposed to be deleted, but were instead left as empty files. Since these files are included before the actual ones in "/usr/src/sys/cddl/contrib/opensolaris/uts/common/sys", this will cause a problem. Also, I would like to note that the patch has been working for me without any problems. Sincerely, Dave Grochowski _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From remy.nonnenmacher at activnetworks.com Tue Jul 29 08:34:04 2008 From: remy.nonnenmacher at activnetworks.com (Remy Nonnenmacher) Date: Tue Jul 29 08:34:16 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488ED5F9.3090004@activnetworks.com> Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >... > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > Hello, I have updated a test machine with the patch. Before the patch, the machine was fairly stable using vm.kmem_size=1024M and vfs.zfs.arc_max=200M. After the patch, I ran a few test with the following loader.conf: vm.kmem_size="512M" vm.kmem_size_max="512M" zfs_load="YES" vfs.zfs.arc_max="100M" kern.maxvnodes="400000" I am now getting back "kmem_map too small" panics within a few minutes of cvs update of ports. If I have a look at kstat.zfs.misc.arcstats.size in the mean time, I see the following: kstat.zfs.misc.arcstats.size: 275762656 kstat.zfs.misc.arcstats.size: 279666312 kstat.zfs.misc.arcstats.size: 284994776 kstat.zfs.misc.arcstats.size: 298142184 kstat.zfs.misc.arcstats.size: 304219168 kstat.zfs.misc.arcstats.size: 312289376 kstat.zfs.misc.arcstats.size: 318243832 kstat.zfs.misc.arcstats.size: 331942168 kstat.zfs.misc.arcstats.size: 335262560 kstat.zfs.misc.arcstats.size: 344793136 kstat.zfs.misc.arcstats.size: 359504168 kstat.zfs.misc.arcstats.size: 334877376 kstat.zfs.misc.arcstats.size: 334877376 kstat.zfs.misc.arcstats.size: 334877376 kstat.zfs.misc.arcstats.size: 334877376 kstat.zfs.misc.arcstats.size: 334877376 ((panic here)) (2 seconds between reads) It seems that arc_max is ignored or arc_reclaim is not working as expected. Tanks for your work, Pawel. From gary.jennejohn at freenet.de Tue Jul 29 10:57:14 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Tue Jul 29 10:57:20 2008 Subject: ZFS patches. Message-ID: <20080729125711.563b3a9a@peedub.jennejohn.org> I now see a weird problem after upgrading to the latest ZFS. I have a filesystem which is NFS exported. Before the upgrade I could write to this FS w/o any problems. Now, after the upgrade, I see the following very strang behzvior: [on the client] cp 0001-Change-from-Keymile-to-support-CS4.patch /u7/garyj/proj/coge cp: cannot create regular file `/u7/garyj/proj/coge/0001-Change-from-Keymile-to-support-CS4.patch': Input/output error [on the server] garyj:peedub:coge:-bash:9> ll *CS4* ---------- 1 garyj tty 0 Jul 29 12:48 0001-Change-from-Keymile-to-support-CS4.patch The uid on both machines is identical. I have not changed any settings in /etc/exports or the ZFS. I also can't explain where the GID tty comes from, since earlier the GID was always garyj. If I touch the file on the server first then the cp succeeds. Very inconvenient. Any help? --- Gary Jennejohn From gary.jennejohn at freenet.de Tue Jul 29 11:59:25 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Tue Jul 29 11:59:32 2008 Subject: ZFS patches. In-Reply-To: <20080729125711.563b3a9a@peedub.jennejohn.org> References: <20080729125711.563b3a9a@peedub.jennejohn.org> Message-ID: <20080729135921.208f8925@peedub.jennejohn.org> On Tue, 29 Jul 2008 12:57:11 +0200 Gary Jennejohn wrote: > I now see a weird problem after upgrading to the latest ZFS. > > I have a filesystem which is NFS exported. > > Before the upgrade I could write to this FS w/o any problems. Now, > after the upgrade, I see the following very strang behzvior: > > [on the client] > cp 0001-Change-from-Keymile-to-support-CS4.patch /u7/garyj/proj/coge > cp: cannot create regular file `/u7/garyj/proj/coge/0001-Change-from-Keymile-to-support-CS4.patch': Input/output error > > [on the server] > garyj:peedub:coge:-bash:9> ll *CS4* > ---------- 1 garyj tty 0 Jul 29 12:48 0001-Change-from-Keymile-to-support-CS4.patch > > The uid on both machines is identical. I have not changed any settings > in /etc/exports or the ZFS. I also can't explain where the GID tty comes > from, since earlier the GID was always garyj. > > If I touch the file on the server first then the cp succeeds. Very > inconvenient. > > Any help? > I got it working by setting mapall in /etc/exports, but I'd still like to know why it suddenly stopped working. Maybe it had nothing to do with ZFS at all. --- Gary Jennejohn From stefan.lambrev at moneybookers.com Tue Jul 29 12:41:38 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 29 12:41:49 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <488F0C71.9010902@moneybookers.com> Greetings, I just got new server where I can experiment. Any ideas how to install current using only ZFS? Or I should start with UFS for root partition and then move to ZFS only? Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From koitsu at FreeBSD.org Tue Jul 29 12:55:52 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Tue Jul 29 12:56:04 2008 Subject: ZFS patches. In-Reply-To: <488F0C71.9010902@moneybookers.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> Message-ID: <20080729125551.GA70379@eos.sc1.parodius.com> On Tue, Jul 29, 2008 at 03:26:25PM +0300, Stefan Lambrev wrote: > Greetings, > > I just got new server where I can experiment. > Any ideas how to install current using only ZFS? > Or I should start with UFS for root partition and then move to ZFS only? I've written a doc on how to do this, at least for RELENG_7, although I'm willing to bet the procedure is 100% identical for CURRENT: http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From des at des.no Tue Jul 29 13:00:37 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Jul 29 13:00:43 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: (Rick Macklem's message of "Wed\, 16 Jul 2008 18\:35\:07 -0400 \(EDT\)") References: Message-ID: <86myk06e18.fsf@ds4.des.no> Rick Macklem writes: > Hope this isn't too simplistic for this list, but I need to know which > GSSAPI library sources are being used. They don't appear to be either > vanilla MIT nor Heimdal. Homegrown (by Doug Rabson, dfr@) with portions borrowed from Heimdal. DES -- Dag-Erling Sm?rgrav - des@des.no From outbackdingo at gmail.com Tue Jul 29 13:41:00 2008 From: outbackdingo at gmail.com (OutBackdingo) Date: Tue Jul 29 13:41:07 2008 Subject: ZFS patches. In-Reply-To: <20080729125551.GA70379@eos.sc1.parodius.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> Message-ID: <1217338852.10413.1.camel@dingo-laptop> As in the whole including boot from ZFs, not sure that code is in here yet?? had the boot code been migrated to allow for booting from a ZFS partition?? or are we still in the recommended / or /boot being on UFS On Tue, 2008-07-29 at 05:55 -0700, Jeremy Chadwick wrote: > On Tue, Jul 29, 2008 at 03:26:25PM +0300, Stefan Lambrev wrote: > > Greetings, > > > > I just got new server where I can experiment. > > Any ideas how to install current using only ZFS? > > Or I should start with UFS for root partition and then move to ZFS only? > > I've written a doc on how to do this, at least for RELENG_7, although > I'm willing to bet the procedure is 100% identical for CURRENT: > > http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool > From randy at psg.com Tue Jul 29 13:51:58 2008 From: randy at psg.com (Randy Bush) Date: Tue Jul 29 13:52:04 2008 Subject: ZFS patches. In-Reply-To: <1217338852.10413.1.camel@dingo-laptop> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> Message-ID: <488F2078.708@psg.com> OutBackdingo wrote: > As in the whole including boot from ZFs, not sure that code is in here > yet?? had the boot code been migrated to allow for booting from a ZFS > partition?? or are we still in the recommended / or /boot being on UFS >> http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool i guess you did not follow the url. boot from zfs is not supported (yet). imiho, jeremy's instrs should be combined with those on creating sliced boot gmirror. randy From rmacklem at uoguelph.ca Tue Jul 29 14:17:06 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Jul 29 14:17:13 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: <86myk06e18.fsf@ds4.des.no> References: <86myk06e18.fsf@ds4.des.no> Message-ID: On Tue, 29 Jul 2008, Dag-Erling Sm?rgrav wrote: > Rick Macklem writes: >> Hope this isn't too simplistic for this list, but I need to know which >> GSSAPI library sources are being used. They don't appear to be either >> vanilla MIT nor Heimdal. > > Homegrown (by Doug Rabson, dfr@) with portions borrowed from Heimdal. > Ok, thanks. I was able to work around my problem by statically linking my gssd against libraries built from vanilla Heimdal sources. It looks like it inherited the heimdal-0.6 bug, which ignores the lack of the GSS_C_SEQUENCE_FLAG and checks it even if it wasn't specified. This breaks the client side of RPCSEC_GSS, since somewhat out-of-order Sun RPCs, is normal. (RPCSEC_GSS uses a window of recent seq#s to protect against replay attempts.) Should I email Doug or submit a bug report, to see if someone is willing to work on fixing this? Thanks again, rick From des at des.no Tue Jul 29 15:10:10 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Jul 29 15:10:18 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: (Rick Macklem's message of "Tue\, 29 Jul 2008 10\:27\:58 -0400 \(EDT\)") References: <86myk06e18.fsf@ds4.des.no> Message-ID: <86ej5c681b.fsf@ds4.des.no> Rick Macklem writes: > [...] It looks like [FreeBSD's libgssapi] inherited the heimdal-0.6 > bug, which ignores the lack of the GSS_C_SEQUENCE_FLAG and checks it > even if it wasn't specified. This breaks the client side of > RPCSEC_GSS, since somewhat out-of-order Sun RPCs, is normal. > (RPCSEC_GSS uses a window of recent seq#s to protect against replay > attempts.) > > Should I email Doug or submit a bug report, to see if someone is willing > to work on fixing this? You should contact Doug directly. I wonder what this has to do with filesystems, though... DES -- Dag-Erling Sm?rgrav - des@des.no From outbackdingo at gmail.com Tue Jul 29 16:11:38 2008 From: outbackdingo at gmail.com (OutBackdingo) Date: Tue Jul 29 16:11:44 2008 Subject: ZFS patches. In-Reply-To: <488F2078.708@psg.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> Message-ID: <1217347882.10413.5.camel@dingo-laptop> Maybe i should have rephrased that. Ive had a running ZFS, i thought as i was reading your version of the install guide, if the newer code drop included the boot from ZFS. seems i read it was in the perforce tree, so i guess ive answered my own qurestion, that it is not in fact in this patch On Tue, 2008-07-29 at 14:51 +0100, Randy Bush wrote: > OutBackdingo wrote: > > As in the whole including boot from ZFs, not sure that code is in here > > yet?? had the boot code been migrated to allow for booting from a ZFS > > partition?? or are we still in the recommended / or /boot being on UFS > >> http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool > > i guess you did not follow the url. > > boot from zfs is not supported (yet). > > imiho, jeremy's instrs should be combined with those on creating sliced > boot gmirror. > > randy From hartzell at alerce.com Tue Jul 29 16:24:59 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue Jul 29 16:25:06 2008 Subject: ZFS patches. [Problem with root on zfs and upgrading] In-Reply-To: <488E246D.2030508@chruetertee.ch> References: <20080727125413.GG1345@garage.freebsd.pl> <488E246D.2030508@chruetertee.ch> Message-ID: <18575.17497.248521.461931@almost.alerce.com> Beat G?tzi writes: > Hi, > > Pawel Jakub Dawidek wrote: > > The patch above contains the most recent ZFS version that could be found > > in OpenSolaris as of today. Apart for large amount of new functionality, > > I belive there are many stability (and also performance) improvements > > compared to the version from the base system. > > Thanks for the great work! > > > Please test, test, test. If I get enough positive feedback, I may be > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > I have a amd64 box with 8GB RAM running CURRENT-200806 snapshot. I get > the latest version of the sources with csup, applied your patch and > build the world/kernel. > /usr/src and /usr/obj are located on a zfs file system. After "make > installkernel" and reboot into single user mode I had to start the zfs > file system but it failed: > > # fsck > # mount -a > # /etc/rc.d/hostid start > Setting hostuuid: ... > Setting hostid: ... > # /etc/rc.d/zfs start > lock order reversal: > 1st 0xffffff0004832620 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2053 > 2nd 0xffffffff80b09da0 kernel linker (kernel linker) @ > /usr/src/sys/kern/kern_linker.c:693 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > witness_checkorder() at witness_checkorder+0x609 > _sx_xlock() at _sx_xlock+0x52 > linker_file_lookup_set() at linker_file_lookup_set+0xe1 > linker_file_register_sysctls() at linker_file_register_sysctls+0x20 > linker_load_module() at linker_load_module+0x919 > linker_load_dependencies() at linker_load_dependencies+0x1bc > link_elf_load_file() at link_elf_load_file+0xa96 > linker_load_module() at linker_load_module+0x8cf > kern_kldload() at kern_kldload+0xac > kldload() at kldload+0x84 > syscall() at syscall+0x1bf > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (304, FreeBSD ELF64, kldload), rip = 0x80068561c, rsp = > 0x7fffffffec88, rbp = 0 --- > This module (opensolaris) contains code covered by the > Common Development and Distribution License (CDDL) > see http://opensolaris.org/os/licensing/opensolaris_license/ > WARNING: ZFS is considered to be an experimental feature in FreeBSD. > ZFS filesystem version 11 > ZFS storage pool version 11 > internal error: out of memory > internal error: out of memory > internal error: out of memory > internal error: out of memory > > Running "zpool list" shows no available pool and the "internal error: > out of memory" error message. > > The same problem occurs in multi-user mode. loader.conf is set to: > vm.kmem_size_max="2147483648" > vm.kmem_size="2147483648" > > Increase/remove the kmem_size-values didn't change anything. > > To solve the problem I had to boot kernel.old and run make > installworld/mergemaster. After rebooting with the new kernel the pool > was available again and everything work without a problem. > > Did I do something wrong when I upgraded the server? I'm being bitten by the problem that bit Beat, but worse. I'm running a root on zfs system, built using variations of Yarema's tools (which do a great job of rounding up and automating all of the little tips and tricks about putting your root on a zfs filesystem, you should read and understand what they're doing though, you'll probably need to adapt them a bit... [ http://yds.coolrat.org/zfsboot.shtml ]). I moved a computer from -STABLE up to -CURRENT via csup and rebuilt everything to convince myself that the upgrade went well. Then I applied Pawel's patch (-p0 -E), and: make buildworld make buildkernel KERNCONF=BLUETOO make installkernel KERNCONF=BLUETOO and rebooted. I planned to drop down to single user and do the mergemaster/installworld. When I try to boot multi user things go south and it's clear that /usr et al. is missing. I can boot my new kernel single user and my root gets mounted from my zpool, but none of my other zfs filesystems are mounted, and when I try to run zfs list or zpool status I got the same out of memory message that Beat sees. The ZFS filesystem and pool are at version 11 (seen scrolling by on the console). I suspect that my newer kernel isn't cooperating with the older userland utilities which prevents the filesystems from being mounted. I tried to boot from kernel.old, but I end up at the mountroot prompt and can't mount my root. Presumably since my pool has been automagically upgraded to version 11 I can no longer mount my root using kernel.old, so Beat's end-run won't help me. There's nothing I care about on the machine, just the time it took to csup and build and such, so if I have to scrag it and start over it's not a the end of the world. Maybe someone could make an patched copy of /sbin/zfs (and whatever dependencies it has into /lib, etc...) available and I could drop them onto a usb key and use some combination of PATH and LD_LIBRARY_PATH to use them to get my /usr etc... mounted? Or I could build up another machine to the same patched point, do the buildworld and buildkernel, then use that to make a patched bootable usb drive. That'll take a while to free up the extra hardware though. g. From hartzell at alerce.com Tue Jul 29 16:36:31 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue Jul 29 16:36:43 2008 Subject: ZFS patches. In-Reply-To: <488F0C71.9010902@moneybookers.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> Message-ID: <18575.18190.242505.359259@almost.alerce.com> Stefan Lambrev writes: > Greetings, > > I just got new server where I can experiment. > Any ideas how to install current using only ZFS? > Or I should start with UFS for root partition and then move to ZFS only? > Check out the tools that Yarema's put together: http://yds.coolrat.org/zfsboot.shtml You should read through them, you may need to change e.g. where you have the install CD mounted, and you'll probably want to change the list of zfs filesystems that get built, but the tools do a great job of collecting all of the little secrets for building a zfs on root system. BUT, be careful about trying to pick up and apply Pawel's latest patches. I'm currently wedged with a new kernel and automagically upgraded zfs pool and an old userland that can't cope. I can't mount my various /usr, /usr/src, etc... filesystems so I can't installworld with the patched stuff. Catch-22. Fortunately the box was set up for just this experiment.... g. From hartzell at alerce.com Tue Jul 29 17:49:06 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue Jul 29 17:49:13 2008 Subject: ZFS patches. [Problem with root on zfs and upgrading] In-Reply-To: <18575.17497.248521.461931@almost.alerce.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488E246D.2030508@chruetertee.ch> <18575.17497.248521.461931@almost.alerce.com> Message-ID: <1217352365.3554.3.camel@delicious> On Tue, 2008-07-29 at 09:24 -0700, George Hartzell wrote: > Beat G?tzi writes: > > Hi, > > > > Pawel Jakub Dawidek wrote: > > > The patch above contains the most recent ZFS version that could be found > > > in OpenSolaris as of today. Apart for large amount of new functionality, > > > I belive there are many stability (and also performance) improvements > > > compared to the version from the base system. > > > > Thanks for the great work! > > > > > Please test, test, test. If I get enough positive feedback, I may be > > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > > > I have a amd64 box with 8GB RAM running CURRENT-200806 snapshot. I get > > the latest version of the sources with csup, applied your patch and > > build the world/kernel. > > /usr/src and /usr/obj are located on a zfs file system. After "make > > installkernel" and reboot into single user mode I had to start the zfs > > file system but it failed: > > > > # fsck > > # mount -a > > # /etc/rc.d/hostid start > > Setting hostuuid: ... > > Setting hostid: ... > > # /etc/rc.d/zfs start > > lock order reversal: > > 1st 0xffffff0004832620 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2053 > > 2nd 0xffffffff80b09da0 kernel linker (kernel linker) @ > > /usr/src/sys/kern/kern_linker.c:693 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > witness_checkorder() at witness_checkorder+0x609 > > _sx_xlock() at _sx_xlock+0x52 > > linker_file_lookup_set() at linker_file_lookup_set+0xe1 > > linker_file_register_sysctls() at linker_file_register_sysctls+0x20 > > linker_load_module() at linker_load_module+0x919 > > linker_load_dependencies() at linker_load_dependencies+0x1bc > > link_elf_load_file() at link_elf_load_file+0xa96 > > linker_load_module() at linker_load_module+0x8cf > > kern_kldload() at kern_kldload+0xac > > kldload() at kldload+0x84 > > syscall() at syscall+0x1bf > > Xfast_syscall() at Xfast_syscall+0xab > > --- syscall (304, FreeBSD ELF64, kldload), rip = 0x80068561c, rsp = > > 0x7fffffffec88, rbp = 0 --- > > This module (opensolaris) contains code covered by the > > Common Development and Distribution License (CDDL) > > see http://opensolaris.org/os/licensing/opensolaris_license/ > > WARNING: ZFS is considered to be an experimental feature in FreeBSD. > > ZFS filesystem version 11 > > ZFS storage pool version 11 > > internal error: out of memory > > internal error: out of memory > > internal error: out of memory > > internal error: out of memory > > > > Running "zpool list" shows no available pool and the "internal error: > > out of memory" error message. > > > > The same problem occurs in multi-user mode. loader.conf is set to: > > vm.kmem_size_max="2147483648" > > vm.kmem_size="2147483648" > > > > Increase/remove the kmem_size-values didn't change anything. > > > > To solve the problem I had to boot kernel.old and run make > > installworld/mergemaster. After rebooting with the new kernel the pool > > was available again and everything work without a problem. > > > > Did I do something wrong when I upgraded the server? > > I'm being bitten by the problem that bit Beat, but worse. > > I'm running a root on zfs system, built using variations of Yarema's > tools (which do a great job of rounding up and automating all of the > little tips and tricks about putting your root on a zfs filesystem, > you should read and understand what they're doing though, you'll > probably need to adapt them a bit... > [ http://yds.coolrat.org/zfsboot.shtml ]). > > I moved a computer from -STABLE up to -CURRENT via csup and rebuilt > everything to convince myself that the upgrade went well. > > Then I applied Pawel's patch (-p0 -E), and: > > make buildworld > make buildkernel KERNCONF=BLUETOO > make installkernel KERNCONF=BLUETOO > > and rebooted. I planned to drop down to single user and do the > mergemaster/installworld. > > When I try to boot multi user things go south and it's clear that /usr > et al. is missing. > > I can boot my new kernel single user and my root gets mounted from my > zpool, but none of my other zfs filesystems are mounted, and when I > try to run zfs list or zpool status I got the same out of memory > message that Beat sees. > > The ZFS filesystem and pool are at version 11 (seen scrolling by on > the console). > > I suspect that my newer kernel isn't cooperating with the older > userland utilities which prevents the filesystems from being mounted. > > I tried to boot from kernel.old, but I end up at the mountroot prompt > and can't mount my root. Presumably since my pool has been > automagically upgraded to version 11 I can no longer mount my root > using kernel.old, so Beat's end-run won't help me. > > There's nothing I care about on the machine, just the time it took to > csup and build and such, so if I have to scrag it and start over it's > not a the end of the world. > > Maybe someone could make an patched copy of /sbin/zfs (and whatever > dependencies it has into /lib, etc...) available and I could drop them > onto a usb key and use some combination of PATH and LD_LIBRARY_PATH to > use them to get my /usr etc... mounted? > > Or I could build up another machine to the same patched point, do the > buildworld and buildkernel, then use that to make a patched bootable > usb drive. That'll take a while to free up the extra hardware though. It turns out that I can boot into single user with the new kernel and then mount the zfs filesystems by hand, like this: mount -t zfs z/usr /usr Just need to do it (little scripting on a similar system helps) for the 43 zfs filesystems that yarema's tool set up and I'm booted multi-user with Pawel's new patches. phew. g. From zbeeble at gmail.com Tue Jul 29 19:49:30 2008 From: zbeeble at gmail.com (Zaphod Beeblebrox) Date: Tue Jul 29 19:49:36 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <5f67a8c40807291223j52f0ccf7r27021bf882b13ad6@mail.gmail.com> On Sun, Jul 27, 2008 at 8:54 AM, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. If the near term goal is to have this in 7.1, it may help to post a patch that works with 7-STABLE to test. I've tried several bits of advice from this thread to compile it. I even nuked my src tree and pulled a virgin one from cvsup. Right now, on 7-STABLE, it stops at: cc -O2 -fno-strict-aliasing -pipe -DZFS_NO_ACL -I/usr/src/cddl/lib/libzfs/../../../sbin/mount -I/usr/src/cddl/lib/libzfs/../../../cddl/lib/libumem -I/usr/src/cddl/lib/libzfs/../../../sys/cddl/compat/opensolaris -I/usr/src/cddl/lib/libzfs/../../../cddl/compat/opensolaris/include -I/usr/src/cddl/lib/libzfs/../../../cddl/compat/opensolaris/lib/libumem -I/usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libzpool/common -I/usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/common/zfs -I/usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs -I/usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/sys -I/usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/head -I/usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common -I/usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libnvpair -I/usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libuutil/common -I/usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libzfs/common -D_SOLARIS_C_SOURCE -c /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c In file included from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/sys/u8_textprep.h:31, from /usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h:83, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:32, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c:28: /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/sys/isa_defs.h:232:1: warning: "_LITTLE_ENDIAN" redefined In file included from /usr/src/cddl/lib/libzfs/../../../sys/cddl/compat/opensolaris/machine/endian.h:32, from /usr/obj/usr/src/tmp/usr/include/sys/types.h:44, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/compat/opensolaris/sys/types.h:37, from /usr/obj/usr/src/tmp/usr/include/unistd.h:41, from /usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h:53, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:32, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c:28: /usr/obj/usr/src/tmp/usr/include/machine/endian.h:53:1: warning: this is the location of the previous definition In file included from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_acl.h:34, from /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c:31: /usr/src/cddl/lib/libzfs/../../../sys/cddl/contrib/opensolaris/uts/common/sys/acl.h:42: error: expected specifier-qualifier-list before 'o_mode_t' *** Error code 1 Stop in /usr/src/cddl/lib/libzfs. *** Error code 1 From koitsu at FreeBSD.org Tue Jul 29 20:17:25 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Tue Jul 29 20:17:32 2008 Subject: ZFS patches. In-Reply-To: <1217347882.10413.5.camel@dingo-laptop> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> <1217347882.10413.5.camel@dingo-laptop> Message-ID: <20080729201725.GA89512@eos.sc1.parodius.com> On Tue, Jul 29, 2008 at 11:11:22PM +0700, OutBackdingo wrote: > Maybe i should have rephrased that. Ive had a running ZFS, i thought as > i was reading your version of the install guide, if the newer code drop > included the boot from ZFS. seems i read it was in the perforce tree, so > i guess ive answered my own qurestion, that it is not in fact in this > patch > > On Tue, 2008-07-29 at 14:51 +0100, Randy Bush wrote: > > OutBackdingo wrote: > > > As in the whole including boot from ZFs, not sure that code is in here > > > yet?? had the boot code been migrated to allow for booting from a ZFS > > > partition?? or are we still in the recommended / or /boot being on UFS > > >> http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool > > > > i guess you did not follow the url. > > > > boot from zfs is not supported (yet). > > > > imiho, jeremy's instrs should be combined with those on creating sliced > > boot gmirror. > > > > randy I believe it is possible (with or without the patch) to boot purely off of ZFS. The ish.com.au document describes how to do this in "Step Three: solving the ZFS boot problem". https://www.ish.com.au/solutions/articles/freebsdzfs I simply choose not to utilise that method. I'm a bit paranoid about non-UFS root filesystems. My main concern revolves around booting into single-user, which is an important part of the whole build/install world process -- does it actually work with ZFS as a root fs, and if so, is any sort of craziness required to accomplish it? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From lulf at freebsd.org Tue Jul 29 21:11:56 2008 From: lulf at freebsd.org (Ulf Lilleengen) Date: Tue Jul 29 21:12:09 2008 Subject: ZFS patches. In-Reply-To: <1217347882.10413.5.camel@dingo-laptop> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> <1217347882.10413.5.camel@dingo-laptop> Message-ID: <20080729211137.GA52154@nobby.studby.ntnu.no> On tir, jul 29, 2008 at 11:11:22pm +0700, OutBackdingo wrote: > Maybe i should have rephrased that. Ive had a running ZFS, i thought as > i was reading your version of the install guide, if the newer code drop > included the boot from ZFS. seems i read it was in the perforce tree, so > i guess ive answered my own qurestion, that it is not in fact in this > patch > This patch does include ZFS boot support (for i386 only. Look in sys/boot/i386/zfsboot and boot/zfs) I was unable to make it work though, but I was able to install a ZFS-supporting loader, by building the loader with LOADER_ZFS_SUPPORT=yes . However, this feature is a bit undocumented yet, and it didn't work correctly for me. But you can always test it out. > On Tue, 2008-07-29 at 14:51 +0100, Randy Bush wrote: > > OutBackdingo wrote: > > > As in the whole including boot from ZFs, not sure that code is in here > > > yet?? had the boot code been migrated to allow for booting from a ZFS > > > partition?? or are we still in the recommended / or /boot being on UFS > > >> http://wiki.freebsd.org/JeremyChadwick/FreeBSD_7.x_on_a_ZFS_pool > > > > i guess you did not follow the url. > > > > boot from zfs is not supported (yet). > > > > imiho, jeremy's instrs should be combined with those on creating sliced > > boot gmirror. > > > > randy > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080729/c2f1eb22/attachment.pgp From lulf at FreeBSD.org Tue Jul 29 21:17:10 2008 From: lulf at FreeBSD.org (Ulf Lilleengen) Date: Tue Jul 29 21:17:22 2008 Subject: ZFS patches. In-Reply-To: <5f67a8c40807291223j52f0ccf7r27021bf882b13ad6@mail.gmail.com> References: <20080727125413.GG1345@garage.freebsd.pl> <5f67a8c40807291223j52f0ccf7r27021bf882b13ad6@mail.gmail.com> Message-ID: <20080729211653.GA28692@nobby.studby.ntnu.no> On Tue, Jul 29, 2008 at 03:23:13PM -0400, Zaphod Beeblebrox wrote: > On Sun, Jul 27, 2008 at 8:54 AM, Pawel Jakub Dawidek wrote: > > > Hi. > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > > Please test, test, test. If I get enough positive feedback, I may be > > able to squeeze it into 7.1-RELEASE, but this might be hard. > > > If the near term goal is to have this in 7.1, it may help to post a patch > that works with 7-STABLE to test. I've tried several bits of advice from > this thread to compile it. I even nuked my src tree and pulled a virgin one > from cvsup. > Quoting Pawel: "The patch is against HEAD and HEAD only. Don't expect patch against 7-STABLE soon." I agree it's preferable with a patch against 7-STABLE, but from a developers view, it's easier to debug one branch at a time :) -- Ulf Lilleengen -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080729/87aa87ab/attachment.pgp From rmacklem at uoguelph.ca Tue Jul 29 22:33:20 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Jul 29 22:33:27 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: <86ej5c681b.fsf@ds4.des.no> References: <86myk06e18.fsf@ds4.des.no> <86ej5c681b.fsf@ds4.des.no> Message-ID: On Tue, 29 Jul 2008, Dag-Erling Sm?rgrav wrote: > Rick Macklem writes: >> >> Should I email Doug or submit a bug report, to see if someone is willing >> to work on fixing this? > > You should contact Doug directly. > Thanks. > I wonder what this has to do with filesystems, though... > Only tangentially. NFSv4 uses a gssd daemon which is what I need it for. (I'll admit I was trying to avoid joining a mailing list, just to ask one question, and I succeeded:-) rick From rmacklem at uoguelph.ca Wed Jul 30 01:19:36 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Jul 30 01:19:43 2008 Subject: NFSv4 client and server for FreeBSD7 needs testing Message-ID: I have just put a patch for FreeBSD7.0 up anonymous ftp that includes client and server NFS code. It support NFSv4 (as well as NFSv2 and NFSv3) and includes support for Kerberized NFSv3 as well as NFSv4. The client port should be considered Beta test at this point, although this client seems to be working well in OpenBSD4.2 and Mac OS X 10.5 Leopard. (This client port has nothing to do with the NFSv4 client currently in FreeBSD7.0, but borrows heavily from FreeBSD7's generic NFSv2 and NFSv3 client.) I will be creating a similar patch for FreeBSD-CURRENT soon (within a week, maybe). Testing would be appreciated. There is an email list called openbsd-nfsv4@sfobug.org for questions, comments, bugs, etc. (I don't know if others would mind posts to freebsd-fs@freebsd.org or not. I'll see any posts made there, as well.) If you are interested in trying it out, please go to: ftp://ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD7 Have fun with it, if you try it, rick From 000.fbsd at quip.cz Wed Jul 30 08:02:52 2008 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Wed Jul 30 08:02:58 2008 Subject: ZFS on whole disk vs. slice vs. partition? Message-ID: <48902042.3030609@quip.cz> Hi all, I am preparing myself to next try with ZFS and I would like to know if there are any recomendations / performance differences between using whole disk device (ad0) or slice (ad0s2) or partition (ad0s1e). For example, if I have machine with 2 disks and I want to setup small part of the disk gmirrored with UFS2 (/ + /usr) and the rest of space for data on ZFS mirror - is it better to use ad0s1 + ad1s1 for gmirror and ad0s2 + ad1s2 for ZFS mirror? Or is it better to use ad0s1e + ad1s1e for ZFS mirror? Next example could be machine with 4 disks (1TB disks in RAIDZ / RAIDZ2 as array for backups). It would be nice to user ad0 + ad1 + ad2 + ad3, but then I cannot boot of it, so again - I can use small piece of each disk as bootable UFS2 root with gmirror of 4 drives (first slice of each disk - ad0s1, ad1s1, ad2s1, ad3s1) and the rest for ZFS. Or is there significant reason not to split disks, use whole device for ZFS pool and setup UFS2 root on some other media like CF card with CF 2 IDE convertor? Thanks for any useful informations, tips, trick, links etc. Miroslav Lachman From ari at ish.com.au Wed Jul 30 09:44:16 2008 From: ari at ish.com.au (Aristedes Maniatis) Date: Wed Jul 30 09:44:23 2008 Subject: ZFS patches. In-Reply-To: <20080729201725.GA89512@eos.sc1.parodius.com> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> <1217347882.10413.5.camel@dingo-laptop> <20080729201725.GA89512@eos.sc1.parodius.com> Message-ID: On 30/07/2008, at 6:17 AM, Jeremy Chadwick wrote: > I believe it is possible (with or without the patch) to boot purely > off > of ZFS. The ish.com.au document describes how to do this in "Step > Three: solving the ZFS boot problem". > > https://www.ish.com.au/solutions/articles/freebsdzfs > > I simply choose not to utilise that method. I'm a bit paranoid about > non-UFS root filesystems. My main concern revolves around booting > into > single-user, which is an important part of the whole build/install > world > process -- does it actually work with ZFS as a root fs, and if so, is > any sort of craziness required to accomplish it? Our article referenced above does still involve a UFS root filesystem, but once the boot process gets under way it is moved out of the way and replaced with the live ZFS root partition. We've had no problem booting into single user mode with this setup. The main down side is that you have a small extra bootable UFS partition (ours are 1Gb) with a kernel and absolutely basic system which is used for nothing more than bootstrapping the system. Oh, and it takes a while to wrap your brain around the whole concept. Having the extra partition means that every time you do make installkernel you'll also need to copy that kernel from the live ZFS root into the UFS partition. That's a nuisance to have to remember. On the plus side, you get to have your entire live filesystem under ZFS and whatever snapshot/RAID/backup/encryption/other ZFS neat feature you care to throw at it. Ari Maniatis --------------------------> ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A From phoemix at harmless.hu Wed Jul 30 09:45:05 2008 From: phoemix at harmless.hu (CZUCZY Gergely) Date: Wed Jul 30 09:45:13 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080730114502.41b3a655@twoflower.in.publishing.hu> May I ask whether these patches include the opensolaris crypto extension? http://opensolaris.org/os/project/zfs-crypto/ On Sun, 27 Jul 2008 14:54:13 +0200 Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > -- ?dv?lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080730/896db493/signature.pgp From pjd at FreeBSD.org Wed Jul 30 09:54:04 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Jul 30 09:54:31 2008 Subject: ZFS patches. In-Reply-To: <20080730114502.41b3a655@twoflower.in.publishing.hu> References: <20080727125413.GG1345@garage.freebsd.pl> <20080730114502.41b3a655@twoflower.in.publishing.hu> Message-ID: <20080730095402.GD4543@garage.freebsd.pl> On Wed, Jul 30, 2008 at 11:45:02AM +0200, CZUCZY Gergely wrote: > May I ask whether these patches include the opensolaris crypto extension? > http://opensolaris.org/os/project/zfs-crypto/ I'll work on zfs-crypto once it is integrated into OpenSolaris source, not before. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080730/46f41a2d/attachment.pgp From nork at FreeBSD.org Wed Jul 30 16:32:32 2008 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Wed Jul 30 16:32:38 2008 Subject: ZFS patches. In-Reply-To: <20080729211137.GA52154@nobby.studby.ntnu.no> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> <1217347882.10413.5.camel@dingo-laptop> <20080729211137.GA52154@nobby.studby.ntnu.no> Message-ID: <20080731013229.9d342ee5.nork@FreeBSD.org> On Tue, 29 Jul 2008 23:11:37 +0200 Ulf Lilleengen wrote: > This patch does include ZFS boot support (for i386 only. Look in > sys/boot/i386/zfsboot and boot/zfs) I was unable to make it work though, but > I was able to install a ZFS-supporting loader, by building the loader with > LOADER_ZFS_SUPPORT=yes . > However, this feature is a bit undocumented yet, and it didn't work correctly > for me. But you can always test it out. I'm using zfsboot on my note PC, and not using UFS. I know many problems about it:-). 1. zpool configuration is too limited, only single and mirror usable. If you want to zfsboot, you can't use RAIDZ, striping and cache(zpool add ... cache ...):-(. 2. On some environment (old BIOS?), zfsboot1 can't chain to zfsboot2. Because, by size (512bytes), zfsboot1 didn't support CHS mode. 3. Yes, a bit undocumented. So you must be careful. 4. I tried to test about zfsboot supported liveCD, but I can't make it yet. Because zfsbootable loader can't boot from CD. SEE ALSO: http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004895.html http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/125878 From caelian at gmail.com Wed Jul 30 19:05:29 2008 From: caelian at gmail.com (Pascal Hofstee) Date: Wed Jul 30 19:06:46 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: On Sun, Jul 27, 2008 at 2:54 PM, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! Well .. so far everything runs smoothly on my FreeBSD/amd64 8.0-CURRENT. I upgraded a simple one-disk ZPOOL and all ZFS filesystems on it without problems. The only thing that caught my eye were the console warnings like the ones below: WARNING pid 1413 (zpool): ioctl sign-extension ioctl ffffffffcc285a09 WARNING pid 1473 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 1473 (zfs): ioctl sign-extension ioctl ffffffffcc285a15 I only saw the zpool one once the zfs ones are mostly the ffffffffcc285a12. I am not exactly sure what those warnings are trying to tell me but thought i should at least mention them here. -- Pascal Hofstee From kometen at gmail.com Thu Jul 31 15:34:58 2008 From: kometen at gmail.com (Claus Guttesen) Date: Thu Jul 31 15:35:04 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: On Sun, Jul 27, 2008 at 2:54 PM, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). I applied your patch to a current as of July the 31'st. I had to remove /usr/src and perform a clean csup and remove the two empty files as mentioned in this thread. I have a areca arc-1680 sas-card and an external sas-cabinet with 16 sas-drives each 1 TB (931 binary GB). They have been setup in three raidz-partitions with five disks each in one zpool and one spare. There does seem to be a speed-improvement. I nfs-mounted a partition from solaris 9 on sparc and is copying approx.400 GB using rsync. I saw write of 429 MB/s. The spikes occured every 10 secs. to begin with. After some minutes I get writes almost every sec. (watching zpool iostat 1). The limit is clearly the network-connection between the two hosts. I'll do some internal copying later. It's to early to say whether zfs is stable (enough) allthough I haven't been able to make it halt unless I removed a disk. This was with version 6. I'll remove a disk tomorrow and see how it goes. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare From kevinxlinuz at 163.com Thu Jul 31 16:19:34 2008 From: kevinxlinuz at 163.com (kevin) Date: Thu Jul 31 16:20:15 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <4891E27B.4010205@163.com> Claus Guttesen wrote: > On Sun, Jul 27, 2008 at 2:54 PM, Pawel Jakub Dawidek wrote: > >> Hi. >> >> http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >> >> The patch above contains the most recent ZFS version that could be found >> in OpenSolaris as of today. Apart for large amount of new functionality, >> I belive there are many stability (and also performance) improvements >> compared to the version from the base system. >> >> Check out OpenSolaris website to find out the differences between base >> system version and patch version. >> >> Please test, test, test. If I get enough positive feedback, I may be >> able to squeeze it into 7.1-RELEASE, but this might be hard. >> >> If you have any questions, please use mailing lists >> (freebsd-fs@FreeBSD.org would be the best). >> > > I applied your patch to a current as of July the 31'st. I had to > remove /usr/src and perform a clean csup and remove the two empty > files as mentioned in this thread. > > I have a areca arc-1680 sas-card and an external sas-cabinet with 16 > sas-drives each 1 TB (931 binary GB). They have been setup in three > raidz-partitions with five disks each in one zpool and one spare. > > There does seem to be a speed-improvement. I nfs-mounted a partition > from solaris 9 on sparc and is copying approx.400 GB using rsync. I > saw write of 429 MB/s. The spikes occured every 10 secs. to begin > with. After some minutes I get writes almost every sec. (watching > zpool iostat 1). The limit is clearly the network-connection between > the two hosts. I'll do some internal copying later. > > It's to early to say whether zfs is stable (enough) allthough I > haven't been able to make it halt unless I removed a disk. This was > with version 6. I'll remove a disk tomorrow and see how it goes. > > Hi, I think the new patch still have some problem.I run zfs on my laptop,and it panic on zfs umount. The problem ( http://www.freebsd.org/cgi/query-pr.cgi?pr=124200 ) relate to zfs? It alway panic in spa_zio_intr_1 and txg_thread_enter. Benjsc is working on it.If any one interest in problem 124200, you can visit http://www.clearchain.com/~benjsc/downloads/FreeBSD/ . From brooks at freebsd.org Thu Jul 31 18:07:16 2008 From: brooks at freebsd.org (Brooks Davis) Date: Thu Jul 31 18:07:23 2008 Subject: NFSv4 client and server for FreeBSD7 needs testing In-Reply-To: References: Message-ID: <20080731180754.GA13820@lor.one-eyed-alien.net> On Tue, Jul 29, 2008 at 09:30:34PM -0400, Rick Macklem wrote: > I have just put a patch for FreeBSD7.0 up anonymous ftp that includes client > and server NFS code. It support NFSv4 (as well as NFSv2 and NFSv3) and includes > support for Kerberized NFSv3 as well as NFSv4. The client port should be > considered Beta test at this point, although this client seems to be working > well in OpenBSD4.2 and Mac OS X 10.5 Leopard. (This client port has nothing to > do with the NFSv4 client currently in FreeBSD7.0, but borrows heavily from > FreeBSD7's generic NFSv2 and NFSv3 client.) > > I will be creating a similar patch for FreeBSD-CURRENT soon (within a week, > maybe). > > Testing would be appreciated. There is an email list called > openbsd-nfsv4@sfobug.org for questions, comments, bugs, etc. > (I don't know if others would mind posts to freebsd-fs@freebsd.org or not. I'll > see any posts made there, as well.) > > If you are interested in trying it out, please go to: > ftp://ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD7 > > Have fun with it, if you try it, rick I've done some very basic testing on amd64. I had to make a few changes to get it to compile, but they were mostly straightforward. Replacing the various incarnations of %q with %j and casts to (intmax_t) handled most of it. I also had to change the third argument of nfsvno_pathconf() to register_t to match the old code and perculate the change through. I've only done some auth sys mounts so far and some very basic reading of files etc. One feature that seems to be missing relative to other systems is what I'd describe as recusive mounting with a single mount entry. For example if you export these files systems on solaris: /export/home /export/home/foo /export/home/bar and then mount /export/home with nfsv4 on a linux system, you can access the contents of /export/home/foo and /export/home/bar (find breaks interestingly on RHEL 5.1 Server, but that's another story :). -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080731/1dab9566/attachment.pgp From rmacklem at uoguelph.ca Thu Jul 31 19:09:12 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Jul 31 19:09:19 2008 Subject: NFSv4 client and server for FreeBSD7 needs testing In-Reply-To: <20080731180754.GA13820@lor.one-eyed-alien.net> References: <20080731180754.GA13820@lor.one-eyed-alien.net> Message-ID: On Thu, 31 Jul 2008, Brooks Davis wrote: > > I've done some very basic testing on amd64. I had to make a few > changes to get it to compile, but they were mostly straightforward. > Replacing the various incarnations of %q with %j and casts to (intmax_t) > handled most of it. I also had to change the third argument of > nfsvno_pathconf() to register_t to match the old code and perculate the > change through. > If you email me the diffs, that would be appreciated, since I only have i386 hardware to test with. > I've only done some auth sys mounts so far and some very basic reading of files > etc. One feature that seems to be missing relative to other systems is > what I'd describe as recusive mounting with a single mount entry. For example > if you export these files systems on solaris: > > /export/home > /export/home/foo > /export/home/bar > > and then mount /export/home with nfsv4 on a linux system, you can access the > contents of /export/home/foo and /export/home/bar (find breaks interestingly on > RHEL 5.1 Server, but that's another story :). > Yes, unlike NFSv2 and v3, NFSv4 allows the server to cross mount points on the server. I haven't implemented that in my server for two reasons: 1 - The Solaris 10 client gets confused at mount point crossings. I don't know if the current release of OpenSolaris has this fixed? 2 - It makes the file systems look exactly the same for NFSv4 as v2 and v3, if you don't allow mount point crossings. (ie. You can mount each separately, if you want to.) However, good point. Maybe it should be a "sysctl" option to enable it. (It's been a while since I tried it, but I think all I have to do is set the CROSSMOUNT flag on VOP_LOOKUP() to make it work.) I'll admit I don't get around to testing using the Linux client often, but if I can get it to work, I'll add a "sysctl" variable to enable it. Thanks and good luck with the testing, rick From rmacklem at uoguelph.ca Thu Jul 31 20:19:34 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Jul 31 20:19:41 2008 Subject: NFSv4 Referrals, was Re: NFSv4 client and server... Message-ID: I see that, along with what they call Mirror mounts, the OpenSolaris team is building Referral support into their client. I put some code for Referrals in a long time ago, but it is incomplete and untested. (A referral can be thought of as a symbolic link/pointer to a file system on another server.) What is needed is an entry in the local file system that can be recognized as an nfsv4 referral and stores the referral data (just something like "otherserver:/path"). As a dirty hack, I had thought of using a symbolic link, but with a funny set of mode bits. (Local access on the server would just see a symbolic link with funny mode bits and no extant file, but the nfs server would recognize it as a referral and use it.) But there must be a better way? Any ideas? rick From matt at corp.spry.com Thu Jul 31 20:58:35 2008 From: matt at corp.spry.com (Matt Simerson) Date: Thu Jul 31 20:58:47 2008 Subject: ZFS hang issue and prefetch_disable - UPDATE References: <20253C48-38CB-4A77-9C59-B993E7E5D78A@corp.spry.com> Message-ID: <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> My announcement that vfs.zfs.prefetch_disable=1 resulted in a stable system was premature. One of my backup servers (see specs below) hung. When I got onto the console via KVM, it looked normal with no errors but didn't respond to Control-Alt-Delete. After a power cycle, zpool status showed 8 disks FAULTED and the action state was: http://www.sun.com/msg/ZFS-8000-5E Basically, that meant my ZFS file system and 7.5TB of data was gone. Ouch. I'm using a pair of ARECA 1231ML RAID controllers. Previously, I had them configured in JBOD with raidz2. This time around, I configured both controllers with one 12 disk RAID 6 volume. Now FreeBSD just sees two 10TB disks which I stripe with ZFS: zpool create back01 /dev/ da0 /dev/da1 I also did a bit more fiddling with /boot/loader.conf. Previous I had: vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.prefetch_disable=1 This resulted in ZFS using 1.1GB of RAM (as measured using the technique described on the wiki) during normal use. The system in question hung during the nightly processing (which backs up some other systems via rsync) and my suspicions are that when I/O load picked up, it exhausted the available kernel memory and hung the system. So now I have these settings on one system: vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.arc_min="16M" vfs.zfs.arc_max="64M" vfs.zfs.prefetch_disable=1 and the same except vfs.zfs.arc_max="256M" on the other. The one with 64M uses 256MB of RAM for ZFS and the one set at 256M uses 600MB of RAM. These are measured under heavy network and disk IO load being generated by multiple rsync processes pulling backups from remote nodes and storing it on ZFS. I am using ZFS compression. I get much better performance now with RAID 6 on the controller and ZFS striping than using raidz2. Unless tuning the arc_ settings made the difference. Either way, the system I just rebuilt is now quite a bit faster with RAID 6 than JBOD + raidz2. Hopefully tuning vfs.zfs.arc_max will result in stability. If it doesn't, my next choice is upgrading to -HEAD with the recent ZFS patch or ditching ZFS entirely and using geom_stripe. I don't like either option. Matt > From: Matt Simerson > Date: July 22, 2008 1:25:42 PM PDT > To: freebsd-fs@freebsd.org > Subject: ZFS hang issue and prefetch_disable > > Symptoms > > Deadlocks under heavy IO load on the ZFS file system with > prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a > stable system. > > Configuration > > Two machines. Identically built. Both exhibit identical behavior. > 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. > FreeBSD 7.0 amd64 > dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt > > Boot disk is a read only 1GB compact flash > # cat /etc/fstab > /dev/ad0s1a / ufs ro,noatime 2 2 > > # df -h / > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad0s1a 939M 555M 309M 64% / > > RAM has been boosted as suggested in ZFS Tuning Guide > # cat /boot/loader.conf > vm.kmem_size= 1610612736 > vm.kmem_size_max= 1610612736 > vfs.zfs.prefetch_disable=1 > > I haven't mucked much with the other memory settings as I'm using > amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. > I've tried higher settings for kmem but that resulted in a failed > boot. I have ample RAM And would love to use as much as possible for > network and disk I/O buffers as that's principally all this system > does. > > Disks & ZFS options > > Sun's "Best Practices" suggests limiting the number of disks in a > raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: http://matt.simerson.net/computing/zfs/zpool.txt > > I'm using all of the ZFS default properties except: atime=off, > compression=on. > > Environment > > I'm using these machines as backup servers. I wrote an application > that generates a list of the thousands of VPS accounts we host. For > each host, it generates a rsnapshot configuration file and backs up > up their VPS to these systems via rsync. The application manages > concurrency and will spawn additional rsync processes if system i/o > load is below a defined threshhold. Which is to say, I can crank up > or down the amount of disk IO the system sees. > > With vfs.zfs.prefetch_disable=0, I can trigger a hang within a few > hours (no more than a day). If I keep the i/o load (measured via > iostat) down to a low level (< 200 iops) then I still get hangs but > less frequently (1-6 days). The only way I have found to prevent > the hangs is by setting vfs.zfs.prefetch_disable=1. From pfgshield-freebsd at yahoo.com Thu Jul 31 23:37:08 2008 From: pfgshield-freebsd at yahoo.com (Pedro Giffuni) Date: Thu Jul 31 23:37:38 2008 Subject: Should we change dirent for 64 bit directory cookies ? Message-ID: <809288.56058.qm@web32703.mail.mud.yahoo.com> Hello fs gurus; I've been sort of following the DragonFly list wrt to the changes Matt made for his HAMMER fs. I don't know if anyone is considering a port: he added a lot of stuff to the base system that will be a pain to port, but he also triggered some bugs in the old BSD code that would be nice to fix on FreeBSD too. One of the not-*too*-tough things to consider adopting would be 64 directory cookies: Main commit: http://leaf.dragonflybsd.org/mailarchive/commits/2007-11/msg00151.html Follow up for the linuxulator: http://leaf.dragonflybsd.org/mailarchive/commits/2007-11/msg00153.html Here is a excerpt of a discussion from the DragonFly Kernel ML (Re: [Tux3] Comparison to Hammer fs design), that pretty much sums up the issues: ________ ... :> The cookies are 64 bits in DragonFly. I'm not sure why Linux would :> still be using 32 bit cookies, file offsets are 64 bits so you :> should be able to use 64 bit cookies. : :It is not Linux that perpetrates this outrage, it is NVFS v2. We can't :just tell everybody that their NFS v2 clients are now broken. Oh, we don't care about NFSv2 all that much any more. NFSv3 is the bare minimum. NFSv2 is extremely old, nobody should be using it any more. Even NFSv3 is getting fairly long in the tooth now. :> For NFS in DragonFly I use a 64 bit cookie where 32 bits is a hash key :> and 32 bits is an iterator to deal with hash collisions. Poof, :> problem solved. : :Which was my original proposal to solve the problem. Then Ted told me :about NFS v2 :-O : :Actually, NFS hands you a 62 bit cookie with the high bits of both s32 :parts unused. NFS v2 gives you a 31 bit cookie. Bleah. I'd recommend dropping support for NFSv2. It is not really worth supporting any more. Does it even support 64 bit inodes? (I don't remember), or 64 bit file offsets? NFSv2 is garbage. You should be able to use 63 bits of the cookie, I don't know why you wouldn't use the high bit of the lsb 32 bit part. There is no requirement that that bit be 0. In fact, the RFC says the cookie is a 64 bit unsigned integer and you should be able to use all 64 bits. If linux is not allowing all 64 bits to be used then it's a serious bug in linux. The cookies are supposed to be opaque, just like the file handle. ... _________ Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina From Benjamin.Close at clearchain.com Thu Jul 31 23:40:28 2008 From: Benjamin.Close at clearchain.com (Benjamin Close) Date: Thu Jul 31 23:40:40 2008 Subject: ZFS patches. In-Reply-To: <4891E27B.4010205@163.com> References: <20080727125413.GG1345@garage.freebsd.pl> <4891E27B.4010205@163.com> Message-ID: <489249D0.2000203@clearchain.com> kevin wrote: > Claus Guttesen wrote: >> On Sun, Jul 27, 2008 at 2:54 PM, Pawel Jakub Dawidek >> wrote: >> >>> Hi. >>> >>> http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >>> >>> The patch above contains the most recent ZFS version that could be >>> found >>> in OpenSolaris as of today. Apart for large amount of new >>> functionality, >>> I belive there are many stability (and also performance) improvements >>> compared to the version from the base system. >>> >>> Check out OpenSolaris website to find out the differences between base >>> system version and patch version. >>> >>> Please test, test, test. If I get enough positive feedback, I may be >>> able to squeeze it into 7.1-RELEASE, but this might be hard. >>> >>> If you have any questions, please use mailing lists >>> (freebsd-fs@FreeBSD.org would be the best). >>> >> >> I applied your patch to a current as of July the 31'st. I had to >> remove /usr/src and perform a clean csup and remove the two empty >> files as mentioned in this thread. >> >> I have a areca arc-1680 sas-card and an external sas-cabinet with 16 >> sas-drives each 1 TB (931 binary GB). They have been setup in three >> raidz-partitions with five disks each in one zpool and one spare. >> >> There does seem to be a speed-improvement. I nfs-mounted a partition >> from solaris 9 on sparc and is copying approx.400 GB using rsync. I >> saw write of 429 MB/s. The spikes occured every 10 secs. to begin >> with. After some minutes I get writes almost every sec. (watching >> zpool iostat 1). The limit is clearly the network-connection between >> the two hosts. I'll do some internal copying later. >> >> It's to early to say whether zfs is stable (enough) allthough I >> haven't been able to make it halt unless I removed a disk. This was >> with version 6. I'll remove a disk tomorrow and see how it goes. >> >> > Hi, > I think the new patch still have some problem.I run zfs on my > laptop,and it panic on zfs umount. > The problem ( http://www.freebsd.org/cgi/query-pr.cgi?pr=124200 ) > relate to zfs? It alway panic in spa_zio_intr_1 and txg_thread_enter. > Benjsc is working on it.If any one interest in problem 124200, you can > visit http://www.clearchain.com/~benjsc/downloads/FreeBSD/ . This issue is not zfs related, zfs however being such a big user of threads and condvars, triggers it more often. Cheers, Benjamin