From bugmaster at FreeBSD.org Mon Mar 2 03:07:32 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Mar 2 03:11:04 2009 Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org Message-ID: <200903021106.n22B6pll057292@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/131575 geom [geom_label] [msdosfs] [umass] Immediate crash after p o kern/131037 geom [geli] Unable to create disklabel on .eli-Device o kern/130528 geom gjournal fsck during boot o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid o bin/128398 geom [patch] glabel(8): teach geom_label to recognise gpt l f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/126902 geom [geom] [geom_label] Kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/124130 geom [gmirror] [usb] gmirror fails to start usb devices tha o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123630 geom [patch] [gmirror] gmirror doesnt allow the original dr o kern/123122 geom [geom] GEOM / gjournal kernel lock f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120231 geom [geom] GEOM_CONCAT error adding second drive o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis o kern/120021 geom [geom] [panic] net-p2p/qbittorrent crashes system when o kern/119743 geom [geom] geom label for cds is keeped after dismount and f kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error s kern/89102 geom [geom] [panic] panic when forced unmount FS from unplu o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 45 problems total. From linimon at FreeBSD.org Mon Mar 2 10:30:48 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Mar 2 10:31:00 2009 Subject: kern/132242: [gmirror] gmirror.ko fails to fully initialize Message-ID: <200903021830.n22IUhuk001923@freefall.freebsd.org> Old Synopsis: gmirror.ko fails to fully initialize New Synopsis: [gmirror] gmirror.ko fails to fully initialize Responsible-Changed-From-To: freebsd-bugs->freebsd-geom Responsible-Changed-By: linimon Responsible-Changed-When: Mon Mar 2 18:30:24 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=132242 From jh at saunalahti.fi Mon Mar 2 12:10:27 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Mon Mar 2 12:10:34 2009 Subject: gpart probing problem Message-ID: <20090302195126.GA6974@a91-153-125-115.elisa-laajakaista.fi> I noticed a regression after gpart (GEOM_PART_*) was made default. I have a disk which has remnants of an old GPT table but which has a valid MBR table. Previously I got following messages to the log but the MBR partition was da0s1 was properly detected. GEOM: da0: corrupt or invalid GPT detected. GEOM: da0: GPT rejected -- may not be recoverable. GEOM_LABEL: Label for provider da0s1 is msdosfs/FOO. Now with gpart as default the MBR table is not detected and I can't access the da0s1 partition. These messages appear to the log: GEOM: da0: corrupt or invalid GPT detected. GEOM: da0: GPT rejected -- may not be recoverable. g_part_gpt_probe() only does a check for GPT header signature existence but it doesn't check if the table is actually valid. gpart doesn't try other schemes after it has decided to use GPT. -- Jaakko From gavin at FreeBSD.org Tue Mar 3 08:13:28 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Tue Mar 3 08:15:05 2009 Subject: kern/132273: [glabel] failing on journaled partition Message-ID: <200903031613.n23GDRi5015019@freefall.freebsd.org> Old Synopsis: glabel failing on journaled partition New Synopsis: [glabel] failing on journaled partition Responsible-Changed-From-To: freebsd-bugs->freebsd-geom Responsible-Changed-By: gavin Responsible-Changed-When: Tue Mar 3 16:12:45 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=132273 From xcllnt at mac.com Tue Mar 3 08:58:35 2009 From: xcllnt at mac.com (Marcel Moolenaar) Date: Tue Mar 3 08:58:41 2009 Subject: gpart probing problem In-Reply-To: <20090302195126.GA6974@a91-153-125-115.elisa-laajakaista.fi> References: <20090302195126.GA6974@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <6E283F9A-30B1-431E-B6F1-142E17647FB2@mac.com> On Mar 2, 2009, at 11:51 AM, Jaakko Heinonen wrote: > I noticed a regression after gpart (GEOM_PART_*) was made default. > > I have a disk which has remnants of an old GPT table but which has a > valid MBR table. Previously I got following messages to the log but > the > MBR partition was da0s1 was properly detected. > > GEOM: da0: corrupt or invalid GPT detected. > GEOM: da0: GPT rejected -- may not be recoverable. > GEOM_LABEL: Label for provider da0s1 is msdosfs/FOO. > > Now with gpart as default the MBR table is not detected and I can't > access the da0s1 partition. These messages appear to the log: > > GEOM: da0: corrupt or invalid GPT detected. > GEOM: da0: GPT rejected -- may not be recoverable. > > g_part_gpt_probe() only does a check for GPT header signature > existence > but it doesn't check if the table is actually valid. gpart doesn't try > other schemes after it has decided to use GPT. It's actually not a regression. There's always a MBR in front of a GPT and a corrupted GPT should not be tossed aside and ignored. The behaviour of gpart is correct in that the operator/user needs to remove the ambiguity. Either by fixing the GPT or otherwise by removing it altogether. Under no circumstance should the kernel use the MBR and pretend nothing is wrong. FYI, -- Marcel Moolenaar xcllnt@mac.com From nsayer at kfu.com Tue Mar 3 18:40:04 2009 From: nsayer at kfu.com (Nick Sayer) Date: Tue Mar 3 18:40:14 2009 Subject: kern/132273: [glabel] [patch] failing on journaled partition Message-ID: <200903040240.n242e40Z079107@freefall.freebsd.org> The following reply was made to PR kern/132273; it has been noted by GNATS. From: Nick Sayer To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/132273: [glabel] [patch] failing on journaled partition Date: Tue, 3 Mar 2009 18:37:49 -0800 So I actually modified the patch to exclude both G_PF_ORPHAN and G_PF_WITHER flags from the duplicate check. That worked. I am not 100% sure *which* one actually did the trick. I suspect now that it's the WITHER check that actually fixed the problem, but I'll leave that to the experts. But as it is now, the patch that checks both the ORPHAN and WITHER flag and continues if either is set fixes my problem. From jh at saunalahti.fi Wed Mar 4 07:14:24 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Wed Mar 4 07:14:31 2009 Subject: gpart probing problem In-Reply-To: <6E283F9A-30B1-431E-B6F1-142E17647FB2@mac.com> References: <20090302195126.GA6974@a91-153-125-115.elisa-laajakaista.fi> <6E283F9A-30B1-431E-B6F1-142E17647FB2@mac.com> Message-ID: <20090304151420.GA804@a91-153-125-115.elisa-laajakaista.fi> Thanks for your reply. On 2009-03-03, Marcel Moolenaar wrote: > There's always a MBR in front of a GPT and a corrupted GPT should not > be tossed aside and ignored. I see the point. However this could cause problems for people moving disks between operating systems (as it caused for me) because some popular operating systems show MBR partition(s) in this case. I tested these operating systems: Mac OS X 10.4 Linux (Ubuntu kernel 2.6.27-7-generic) OpenSolaris (2008.11 release) Windows XP/2003 > Under no circumstance should the kernel use the MBR and pretend > nothing is wrong. Is this behaviour defined in the EFI specification? Are the OSes listed above buggy and/or do they violate the specification? -- Jaakko From xcllnt at mac.com Wed Mar 4 10:11:25 2009 From: xcllnt at mac.com (Marcel Moolenaar) Date: Wed Mar 4 10:11:31 2009 Subject: gpart probing problem In-Reply-To: <20090304151420.GA804@a91-153-125-115.elisa-laajakaista.fi> References: <20090302195126.GA6974@a91-153-125-115.elisa-laajakaista.fi> <6E283F9A-30B1-431E-B6F1-142E17647FB2@mac.com> <20090304151420.GA804@a91-153-125-115.elisa-laajakaista.fi> Message-ID: <325ECB30-003C-41BB-A3B3-FC8A684E4F0E@mac.com> On Mar 4, 2009, at 7:14 AM, Jaakko Heinonen wrote: > > Thanks for your reply. > > On 2009-03-03, Marcel Moolenaar wrote: >> There's always a MBR in front of a GPT and a corrupted GPT should not >> be tossed aside and ignored. > > I see the point. However this could cause problems for people moving > disks between operating systems (as it caused for me) because some > popular operating systems show MBR partition(s) in this case. That's a mistake IMO. > I tested these operating systems: > > Mac OS X 10.4 > Linux (Ubuntu kernel 2.6.27-7-generic) > OpenSolaris (2008.11 release) > Windows XP/2003 > >> Under no circumstance should the kernel use the MBR and pretend >> nothing is wrong. > > Is this behaviour defined in the EFI specification? Are the OSes > listed > above buggy and/or do they violate the specification? The EFI specification has been obsoleted by events to some extend. For one, the MBR in front of a GPT was there *only* for backward compatibility with MBR tools that don't know about GPT. Those tools would see a disk that's entirely in use rather than a disk that's unpartitioned. This is not the case anymore. The MBR now contains valid partitions for use by OSes that don't know anything about GPT. However: the key statement there is "OSes that don't know about GPT". Any OS that knows about GPT should use the GPT. That, if not explicitly stated in the EFI specification, is definitely the intend and is actually current practice: The dual MBR+GPT setup can only work if GPT-aware OSes use the GPT and GPT-unaware OSes use the MBR. A such: if a GPT is found, but corrupted, the OS should *not* silently use the MBR (it can only find a GPT if it's a GPT-aware OS). It should use the GPT if possible (the redundancy has been added by design so that you can recover from the most common forms of corruption), or fail if it's inconsistent to the extend that there's no clear interpretation (i.e. conflicting/ambiguous). So yes, the OSes you listed above are buggy. I can't claim they violate the specification, because the specification is hardly applicable anymore. This, of course, is the root cause the problem... -- Marcel Moolenaar xcllnt@mac.com From linimon at FreeBSD.org Wed Mar 4 17:38:35 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Mar 4 17:38:41 2009 Subject: kern/131353: [geom] gjournal(8) kernel lock Message-ID: <200903050138.n251cYuG053483@freefall.freebsd.org> Synopsis: [geom] gjournal(8) kernel lock Responsible-Changed-From-To: freebsd-fs->freebsd-geom Responsible-Changed-By: linimon Responsible-Changed-When: Thu Mar 5 01:38:19 UTC 2009 Responsible-Changed-Why: Probably a more appropriate assignment. http://www.freebsd.org/cgi/query-pr.cgi?pr=131353 From jespasac at minibofh.org Thu Mar 5 01:46:35 2009 From: jespasac at minibofh.org (Jordi Espasa Clofent) Date: Thu Mar 5 01:46:42 2009 Subject: gmirror(8): round-robin vs. load algorithm Message-ID: <49AF9B6C.8090908@minibofh.org> Hi all, It's supposed that 'round-robin' algorithm has the best performance in gmirror(8). However, looking directly at code (1) I see that the only difference between 'round-robin' and 'load' algorithm is: [...] binuptime(&curtime); /* * Find a disk which the smallest load. */ disk = NULL; LIST_FOREACH(dp, &sc->sc_disks, d_next) { if (dp->d_state != G_MIRROR_DISK_STATE_ACTIVE) continue; /* If disk wasn't used for more than 2 sec, use it. */ if (curtime.sec - dp->d_last_used.sec >= 2) { disk = dp; break; } if (disk == NULL || bintime_cmp(&dp->d_delay, &disk->d_delay) < 0) { disk = dp; } } [...] The key is in "/* If disk wasn't used for more than 2 sec, use it. */", but, for the rest, 'load' algorithm implementation seems the same as round-robin. If I'm right, my question is ?where is the use of 'load' recommended? I understand that 'load' will be good in servers with a lot of disk activity (large disk I/O could provoque that gmirror expulses a mirror member or even reboot or panics the system) because it try always the low-loaded disk. (1) http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/geom/mirror/g_mirror.c?rev=1.93.6.1;content-type=text%2Fplain -- Thanks, Jordi Espasa Clofent From jespasac at minibofh.org Thu Mar 5 03:31:57 2009 From: jespasac at minibofh.org (Jordi Espasa Clofent) Date: Thu Mar 5 03:32:04 2009 Subject: gmirror(8): round-robin vs. load algorithm In-Reply-To: <49AFB41F.8030204@freebsd.org> References: <49AF9B6C.8090908@minibofh.org> <49AFB41F.8030204@freebsd.org> Message-ID: <49AFB7C4.4030202@minibofh.org> Ivan Voras escribi?: > There are some performance-improving patches for gmirror out there, for > example here: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/113885 Yes, I know it (indeed I've read it previously to write my post). But according my own benchmark (using postmark tool) the performance of 'load' it's the same as 'round-robin'. So, I tend to view 'load' algorithm (even in this current implementation) as a kind of 'secure round-robin' for a servers with high I/O. > Maybe you should contact pjd. He's so busy. I hope he could read this message in list. -- Thanks, Jordi Espasa Clofent From ivoras at freebsd.org Thu Mar 5 03:37:47 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Mar 5 03:37:53 2009 Subject: gmirror(8): round-robin vs. load algorithm In-Reply-To: <49AF9B6C.8090908@minibofh.org> References: <49AF9B6C.8090908@minibofh.org> Message-ID: <49AFB41F.8030204@freebsd.org> Jordi Espasa Clofent wrote: > Hi all, > > It's supposed that 'round-robin' algorithm has the best performance in > gmirror(8). There are some performance-improving patches for gmirror out there, for example here: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/113885 Maybe you should contact pjd. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090305/016c0810/signature.pgp From rizzo at iet.unipi.it Sun Mar 8 13:51:26 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Sun Mar 8 13:51:33 2009 Subject: updated geom_sched code In-Reply-To: References: <20090308005515.GA68934@onelab2.iet.unipi.it> Message-ID: <20090308203811.GA4042@onelab2.iet.unipi.it> On Sun, Mar 08, 2009 at 07:38:30PM +0100, Ivan Voras wrote: > Luigi Rizzo wrote: ... > > Apart from that (which needs to be fixed by adding a field to the > > struct bio), we believe the code to be quite stable now, so future ... > Hi, > > Do you have some documentation about the long-term plans? You will have > to add an additional field to bio, of course, but there is one more > thing: the schedulers will have to be integrated into the GEOM in an > "invisible" way. I.e. instead of /dev/ad0-sched-s1f the users should see > only /dev/ad0 like they're used to, and get the schedulers by default. > Otherwise, your work will not get much use. well, if one is comfortable with the schedulers he can just put the right names in /etc/fstab. I think the usefulness of 'invisible' classes is more when adding or removing geom classes on the fly, but perhaps this requires support in the geom nodes themselves. In any case I was hoping to discuss this issue later, either at bsdcan or in the mailing list, once I understand the implications a bit more. cheers luigi From bugmaster at FreeBSD.org Mon Mar 9 10:15:04 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Mar 9 10:16:05 2009 Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org Message-ID: <200903091715.n29HF3t0045237@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/132273 geom [glabel] [patch] failing on journaled partition o kern/132242 geom [gmirror] gmirror.ko fails to fully initialize o kern/131575 geom [geom_label] [msdosfs] [umass] Immediate crash after p o kern/131353 geom [geom] gjournal(8) kernel lock o kern/131037 geom [geli] Unable to create disklabel on .eli-Device o kern/130528 geom gjournal fsck during boot o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid o bin/128398 geom [patch] glabel(8): teach geom_label to recognise gpt l f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/126902 geom [geom] [geom_label] Kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/124130 geom [gmirror] [usb] gmirror fails to start usb devices tha o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123630 geom [patch] [gmirror] gmirror doesnt allow the original dr o kern/123122 geom [geom] GEOM / gjournal kernel lock f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120231 geom [geom] GEOM_CONCAT error adding second drive o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis o kern/120021 geom [geom] [panic] net-p2p/qbittorrent crashes system when o kern/119743 geom [geom] geom label for cds is keeped after dismount and f kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error s kern/89102 geom [geom] [panic] panic when forced unmount FS from unplu o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 48 problems total. From guido at gvr.org Tue Mar 10 09:00:14 2009 From: guido at gvr.org (Guido van Rooij) Date: Tue Mar 10 09:00:21 2009 Subject: kern/113957: [gmirror] gmirror is intermittently reporting a degraded mirror array upon reboot. Message-ID: <200903101600.n2AG0DPr046258@freefall.freebsd.org> The following reply was made to PR kern/113957; it has been noted by GNATS. From: Guido van Rooij To: bug-followup@FreeBSD.org, ayochum@pair.com Cc: Subject: Re: kern/113957: [gmirror] gmirror is intermittently reporting a degraded mirror array upon reboot. Date: Tue, 10 Mar 2009 16:27:55 +0100 Does http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/rc.d/swap1.diff?r1=1.11;r2=1.12 fix the problem you reported? (see http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/113957) -Guido From claytonf at bitheaven.net Thu Mar 12 01:13:19 2009 From: claytonf at bitheaven.net (Clayton Fuller) Date: Thu Mar 12 01:13:26 2009 Subject: Safe to ignore GEOM warnings on ZFS raidz? Message-ID: <6D5ADA2B-9471-4404-892D-1DFAB9011C09@bitheaven.net> I am setting up a new home file server on CURRENT-8.0 using 4 1.5 TB SATA drives in a ZFS raidz pool I created the pool without first partitioning or labeling the drives. during the boot sequence, I get the following warnings for each of the drives: GEOM: ad6: corrupt or invalid GPT detected. GEOM: ad6: GPT rejected -- may not be recoverable. here's the drive info (I have 3 identical drives at ad8, ad12 and ad14) ad6: 1430799MB at ata3-master SATA300 The storage pool seems to work fine, as tested by transferring nearly a terabyte of data, simulating a failed drive, resilvering, etc and all seems to be working well. invoking the zpool scrub command will bring up the corrupt GPT errors again, but all other read/write operations on my storage pool seem to be fine. Is GEOM in any way necessary for managing this ZFS pool? Can I safely ignore the warnings? Is there a better way to configure this to keep GEOM happy? I'd be happy to supply more specifics if germane, but thought I'd try to keep it simple. Any suggestions would be greatly appreciated. -Clayton From ulf.lilleengen at gmail.com Thu Mar 12 03:19:05 2009 From: ulf.lilleengen at gmail.com (Ulf Lilleengen) Date: Thu Mar 12 03:19:11 2009 Subject: Safe to ignore GEOM warnings on ZFS raidz? In-Reply-To: <6D5ADA2B-9471-4404-892D-1DFAB9011C09@bitheaven.net> References: <6D5ADA2B-9471-4404-892D-1DFAB9011C09@bitheaven.net> Message-ID: <917871cf0903120257u40537e4fp47b6d3ab973f706@mail.gmail.com> On Thu, Mar 12, 2009 at 8:52 AM, Clayton Fuller wrote: > I am setting up a new home file server on CURRENT-8.0 using 4 1.5 TB SATA > drives in a ZFS raidz pool > I created the pool without first partitioning or labeling the drives. > > during the boot sequence, I get the following warnings for each of the > drives: > GEOM: ad6: corrupt or invalid GPT detected. > GEOM: ad6: GPT rejected -- may not be recoverable. > > here's the drive info (I have 3 identical drives at ad8, ad12 and ad14) > ad6: 1430799MB at ata3-master SATA300 > > The storage pool seems to work fine, as tested by transferring nearly a > terabyte of data, simulating a failed drive, resilvering, etc and all seems > to be working well. > > invoking the zpool scrub command will bring up the corrupt GPT > errors again, but all other read/write operations on my storage pool seem to > be fine. > > Is GEOM in any way necessary for managing this ZFS pool? Can I safely > ignore the warnings? Is there a better way to configure this to keep GEOM > happy? > This has not necessarily anything to do with the pool itself, but gpart will display this message in case it finds remains of an old GPT partition. I think you can ignore the warnings. At least they have posed no problem from me for a while now. > > I'd be happy to supply more specifics if germane, but thought I'd try to > keep it simple. > > Any suggestions would be greatly appreciated. > > -Clayton > _______________________________________________ > freebsd-geom@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-geom > To unsubscribe, send any mail to "freebsd-geom-unsubscribe@freebsd.org" > -- Ulf Lilleengen From avg at freebsd.org Fri Mar 13 10:57:43 2009 From: avg at freebsd.org (Andriy Gapon) Date: Fri Mar 13 10:57:50 2009 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <4922FB81.50608@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> <4922FB81.50608@icyb.net.ua> Message-ID: <49BA9E93.4060609@freebsd.org> Very belated but somebody reading archives might find this useful. Only today I discovered zdb command and its -C option: $ zdb -C tank version=6 name='tank' state=0 txg=1997530 pool_guid=15723282379537418671 hostid=714261228 hostname='' vdev_tree type='root' id=0 guid=15723282379537418671 children[0] type='disk' id=0 guid=1732303387090405178 path='/dev/ad6s2d' devid='ad:GEA534RF0TK35A' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=493659881472 DTL=182 on 18/11/2008 19:29 Andriy Gapon said the following: > I just remembered that I saved old zpool.cache file before "migrating" > the pool. > I looked at the diff of hexdumps and there are a number of differences, > it's hard to understand them because the file is binary (actually it > seems to contain serialized name-value pairs), but one difference is > prominent: > ... > 00000260 64 65 76 69 64 00 00 00 00 00 00 09 00 00 00 01 > |devid...........| > ... > -00000270 00 00 00 15 61 64 3a 47 45 41 35 33 34 52 46 30 > |....ad:GEA534RF0| > -00000280 54 4b 33 35 41 73 31 73 33 00 00 00 00 00 00 28 > |TK35As1s3......(| > ... > +00000270 00 00 00 11 61 64 3a 47 45 41 35 33 34 52 46 30 > |....ad:GEA534RF0| > +00000280 54 4b 33 35 41 00 00 00 00 00 00 28 00 00 00 28 > |TK35A......(...(| > ... > > It looks like old "devid" value is "ad:GEA534RF0TK35As1s3" and new one > is "ad:GEA534RF0TK35A". Just a reminder: actual zpool device is ad6s2d. > > The new value is what is reported by diskinfo: > $ diskinfo -v ad6 > ad6 > ... > ad:GEA534RF0TK35A # Disk ident. > > $ diskinfo -v ad6s2 > ad6s2 > ... > ad:GEA534RF0TK35A # Disk ident. > > $ diskinfo -v ad6s2d > ad6s2d > ... > ad:GEA534RF0TK35A # Disk ident. > > Hmm, "indent" is reported to be the same for all three entities. > > I don't remember what diskinfo reported with pre-gpart kernel, but I > suspect that it was something different. > Could anybody please check this? (on 7.X machine without GEOM_PART). > > I quickly glimpsed through sources and it seems that this comes from > DIOCGIDENT GEOM ioctl i.e. "GEOM::ident" attribute. It seems that > geom_slice.c code has some special handling for that. > -- Andriy Gapon From bugmaster at FreeBSD.org Mon Mar 16 04:06:55 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Mar 16 04:08:00 2009 Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org Message-ID: <200903161106.n2GB6sGV043238@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/132273 geom [glabel] [patch] failing on journaled partition o kern/132242 geom [gmirror] gmirror.ko fails to fully initialize o kern/131575 geom [geom_label] [msdosfs] [umass] Immediate crash after p o kern/131353 geom [geom] gjournal(8) kernel lock o kern/131037 geom [geli] Unable to create disklabel on .eli-Device o kern/130528 geom gjournal fsck during boot o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid o bin/128398 geom [patch] glabel(8): teach geom_label to recognise gpt l f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/126902 geom [geom] [geom_label] Kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/124130 geom [gmirror] [usb] gmirror fails to start usb devices tha o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123630 geom [patch] [gmirror] gmirror doesnt allow the original dr o kern/123122 geom [geom] GEOM / gjournal kernel lock f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120231 geom [geom] GEOM_CONCAT error adding second drive o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis o kern/120021 geom [geom] [panic] net-p2p/qbittorrent crashes system when o kern/119743 geom [geom] geom label for cds is keeped after dismount and f kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error s kern/89102 geom [geom] [panic] panic when forced unmount FS from unplu o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 48 problems total. From lulf at FreeBSD.org Mon Mar 16 09:58:00 2009 From: lulf at FreeBSD.org (Ulf Lilleengen) Date: Mon Mar 16 09:58:18 2009 Subject: [HEADS UP] Merge of projects/gvinum to head Message-ID: <20090316155800.GA2257@carrot> Hello, This is a heads-up for a merge of gvinum project code into HEAD. This means that gvinum implementation will be changed some. The code is based on the work done by Lukas Ertl as well as the work I did before/during Google SoC 2007 and afterwards. It has been staying in p4 for some time, and then moved to the subversion project repository not long ago. The main reason for the delay of getting this into HEAD have been the lack of reviewers of the code, but after some discussion and help from testers, I've decided this is a good time to get it in (should perhaps have been merged a bit earlier) Testers have spotted several differences from the original gvinum, and I've tried to make it behave as the old implementation wherever that seemed the best way to go. Luckily, the work has gotten a bit of attention lately, thanks to Rick C. Petty for helping out with testing and bugfixing, as well as all others who have dared to run the new gvinum. So, what does this update offer? From the user aspect: - Implementation of many of the missing commands from the old vinum: attach/detach, start (was partially implemented), stop (was partially implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes with one plex of these organizations). - Support for fixing degraded plexes while mounted. - Support for growing for striped and raid5-plexes, meaning that one can extend the volumes for these plex types in addition to the concat type. Also works while the volume is mounted. - The parity check and rebuild no longer goes between userland/kernel, meaning that your gvinum command will not stay and wait forever for the rebuild to finish. You can instead watch the status with the list command. - Many problems with gvinum have been reported since 5.x, and some has been hard to fix due to the complicated architecture. Hopefully, it should be more stable and better handle edge cases that previously made gvinum crash. - Failed drives no longer disappears entirely, but now leave behind a dummy drive that makes sure the original state is not forgotten in case the system is rebooted between drive failures/swaps. From the technical aspect: - Gvinum now uses one single workerthread instead of one thread for each volume and each plex. The reason for this is that the previous scheme was very complex, and was the cause of many of the bugs discovered in gvinum. Instead, gvinum now uses one worker thread with an event queue, quite similar to what used in gmirror. - The rebuild/grow/initialize/parity check routines no longer runs in separate threads, but are run as regular I/O requests with special flags. This made it easier to support on-mount growing and parity rebuild. Probably, there are many other issues that have been fixed, perhaps new issues introduced. This is why this is dropped in HEAD now, to give it more attention from the uses before the 8.0 branch. All in all, this will make gvinum an easier beast to handle for users. If you have any issues related to this, send in problem reports or drop me an e-mail, and I'll take a look. For interested reviewers, the code resides in svn://svn.freebsd.org/base/projects/gvinum -- Ulf Lilleengen -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090316/8961a2ac/attachment.pgp From lulf at FreeBSD.org Mon Mar 16 09:59:58 2009 From: lulf at FreeBSD.org (Ulf Lilleengen) Date: Mon Mar 16 10:00:04 2009 Subject: [HEADS UP] Merge of projects/gvinum to head In-Reply-To: <20090316155800.GA2257@carrot> References: <20090316155800.GA2257@carrot> Message-ID: <20090316155957.GA2385@carrot> On man, mar 16, 2009 at 04:58:00pm +0100, Ulf Lilleengen wrote: > Hello, > > This is a heads-up for a merge of gvinum project code into HEAD. This means > that gvinum implementation will be changed some. The code is based on the > work done by Lukas Ertl as well as the work I did before/during Google SoC > 2007 and afterwards. It has been staying in p4 for some time, and then moved > to the subversion project repository not long ago. The main reason for the > delay of getting this into HEAD have been the lack of reviewers of the code, > but after some discussion and help from testers, I've decided this is a good > time to get it in (should perhaps have been merged a bit earlier) Testers > have spotted several differences from the original gvinum, and I've tried to > make it behave as the old implementation wherever that seemed the best way to > go. Luckily, the work has gotten a bit of attention lately, thanks to Rick C. > Petty for helping out with testing and bugfixing, as well as all others who > have dared to run the new gvinum. So, what does this update offer? And I plan on importing it within 1-2 weeks :) -- Ulf Lilleengen From avg at freebsd.org Wed Mar 18 08:51:35 2009 From: avg at freebsd.org (Andriy Gapon) Date: Wed Mar 18 08:51:41 2009 Subject: gpart bootcode and /boot/boot In-Reply-To: References: <4978C24D.9040706@icyb.net.ua> <49A6C33E.70402@freebsd.org> Message-ID: <49C11880.8000506@freebsd.org> on 26/02/2009 18:54 Marcel Moolenaar said the following: > > On Feb 26, 2009, at 8:28 AM, Andriy Gapon wrote: > >> on 22/01/2009 21:00 Andriy Gapon said the following: >>> Sorry for being lazy - what is "gpart bootcode" equivalent of bsdlabel >>> -B [-b boot] ? >>> >> >> Marcel, guys, I am still curious. >> >> disklabel -B refuses do to anything because of what it thinks is >> incorrect >> partition info. >> >> And the following command gets "Operation not permitted" when it opens >> ad6s1 for >> writing: >> gpart bootcode -p /boot/boot -i 1 ad6 > > Don't you mean "gpart bootcode -b /boot/boot ad6s1" > > If you say "-i 1 ad6", then what you want is bootcode > in a partition that is controlled by the scheme on ad6 > (the scheme on ad6 is the MBR). You can't do that, > because that partition is sub-partitioned by the BSD > scheme. > > Since you want bootcode in the BSD scheme, you need > to add bootcode to ad6s1. Thank you very much for the explanation! >> >> This is amd64, stable/7, gpart-only kernel (gpart_mbr, gpart_bsd). > > Bootcode for the BSD scheme is not present in 7-STABLE yet. > I just recently added that to -CURRENT (I must have missed > it) and I need to get around MFCing it. Feel free to do the > MFC for me. The code has been in -CURRENT long enough to do > a MFC. Thank you for the offer and sorry that I haven't used the opportunity. And thank you for the MFC - I am using the code in stable/7 and successfully updated boot2 blocks via the command you suggested. Just for the record - I did it to use jhb's fix for real<->protected switch in boot code. BTW, is there a way to read/backup bootcode (for BSD scheme)? -- Andriy Gapon From xcllnt at mac.com Wed Mar 18 09:23:54 2009 From: xcllnt at mac.com (Marcel Moolenaar) Date: Wed Mar 18 09:24:00 2009 Subject: gpart bootcode and /boot/boot In-Reply-To: <49C11880.8000506@freebsd.org> References: <4978C24D.9040706@icyb.net.ua> <49A6C33E.70402@freebsd.org> <49C11880.8000506@freebsd.org> Message-ID: <0B27FB91-9F2B-4F96-B4A0-330CAD725AF1@mac.com> On Mar 18, 2009, at 8:51 AM, Andriy Gapon wrote: > on 26/02/2009 18:54 Marcel Moolenaar said the following: >> >> On Feb 26, 2009, at 8:28 AM, Andriy Gapon wrote: >> >>> on 22/01/2009 21:00 Andriy Gapon said the following: >>>> Sorry for being lazy - what is "gpart bootcode" equivalent of >>>> bsdlabel >>>> -B [-b boot] ? >>>> >>> >>> Marcel, guys, I am still curious. >>> >>> disklabel -B refuses do to anything because of what it thinks is >>> incorrect >>> partition info. >>> >>> And the following command gets "Operation not permitted" when it >>> opens >>> ad6s1 for >>> writing: >>> gpart bootcode -p /boot/boot -i 1 ad6 >> >> Don't you mean "gpart bootcode -b /boot/boot ad6s1" >> >> If you say "-i 1 ad6", then what you want is bootcode >> in a partition that is controlled by the scheme on ad6 >> (the scheme on ad6 is the MBR). You can't do that, >> because that partition is sub-partitioned by the BSD >> scheme. >> >> Since you want bootcode in the BSD scheme, you need >> to add bootcode to ad6s1. > > > Thank you very much for the explanation! > >>> >>> This is amd64, stable/7, gpart-only kernel (gpart_mbr, gpart_bsd). >> >> Bootcode for the BSD scheme is not present in 7-STABLE yet. >> I just recently added that to -CURRENT (I must have missed >> it) and I need to get around MFCing it. Feel free to do the >> MFC for me. The code has been in -CURRENT long enough to do >> a MFC. > > Thank you for the offer and sorry that I haven't used the opportunity. > And thank you for the MFC - I am using the code in stable/7 and > successfully > updated boot2 blocks via the command you suggested. > Just for the record - I did it to use jhb's fix for real<->protected > switch in > boot code. > > BTW, is there a way to read/backup bootcode (for BSD scheme)? There's no gpart function for it yet, but you can use dd as a work-around. Unfortunately, that means you need to know how much to read: MBR: 1 sector (512B) starting at 0 BSD: 16 sectors (8KB) starting at 0 These raw dumps can be restored using gpart: gpart bootcode -b $mbr.dd ad6 gpart bootcode -b $bsd.dd ad6s1 No, it will not destroy any partitioning data :-) -- Marcel Moolenaar xcllnt@mac.com From avg at freebsd.org Wed Mar 18 09:53:35 2009 From: avg at freebsd.org (Andriy Gapon) Date: Wed Mar 18 09:53:42 2009 Subject: gpart bootcode and /boot/boot In-Reply-To: <0B27FB91-9F2B-4F96-B4A0-330CAD725AF1@mac.com> References: <4978C24D.9040706@icyb.net.ua> <49A6C33E.70402@freebsd.org> <49C11880.8000506@freebsd.org> <0B27FB91-9F2B-4F96-B4A0-330CAD725AF1@mac.com> Message-ID: <49C12708.6050702@freebsd.org> on 18/03/2009 18:23 Marcel Moolenaar said the following: > > On Mar 18, 2009, at 8:51 AM, Andriy Gapon wrote: >> BTW, is there a way to read/backup bootcode (for BSD scheme)? > > There's no gpart function for it yet, but you can use dd > as a work-around. Unfortunately, that means you need to > know how much to read: > MBR: 1 sector (512B) starting at 0 > BSD: 16 sectors (8KB) starting at 0 > > These raw dumps can be restored using gpart: > gpart bootcode -b $mbr.dd ad6 > gpart bootcode -b $bsd.dd ad6s1 > > No, it will not destroy any partitioning data :-) O great, thanks! So gpart simply ignores bytes of "input" data that correspond to partition/label data. Good to know. -- Andriy Gapon From linimon at FreeBSD.org Wed Mar 18 23:07:09 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Mar 18 23:07:14 2009 Subject: kern/131575: [geom_label] [msdosfs] [umass] Immediate crash after plug of an USB key Message-ID: <200903190607.n2J678n1041174@freefall.freebsd.org> Synopsis: [geom_label] [msdosfs] [umass] Immediate crash after plug of an USB key State-Changed-From-To: open->closed State-Changed-By: linimon State-Changed-When: Thu Mar 19 06:06:26 UTC 2009 State-Changed-Why: Committed and MFCed to 7 by lulf. Responsible-Changed-From-To: freebsd-geom->lulf Responsible-Changed-By: linimon Responsible-Changed-When: Thu Mar 19 06:06:26 UTC 2009 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=131575 From rizzo at iet.unipi.it Thu Mar 19 01:33:14 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 19 01:33:20 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) Message-ID: <20090319081936.GA32750@onelab2.iet.unipi.it> Hi, Fabio Checconi and I have been thinking on how to implement "proxy" geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer port, do not do any data transformation, and can be transparently inserted or removed on top of a provider port while the tree is actively moving data. Our immediate need was to add/remove a scheduler while a disk is mounted, but there are possibly other uses e.g. if one wants to "sniff" the traffic through a disk, or do other ops that are transparent for the data stream. We would like opinion on the following solution, which seems extremely simple in terms of implementation. The idea is to intercept requests coming on a provider port, pp, and redirect them to a geom node acting as a proxy if the port is configured in this way: +=====...===...======+ H H H H H H +=====...====== cp ==+ | +---------------+ V | V +=====.....==== pp ==+ | +======= proxy_pp ==+ H 'ad0s1' H | H H H ------->--+ H H H gp -------<--+ H proxy_node H H H | H H +=======....===...===+ | +======= proxy_cp ==+ | V +---------------+ Normally the proxy does not exist, and the geom tree works as it does now. When we create a 'proxy' node, with something like geom my_proxy_class proxy ad0s1 we do something very similar to a 'create', but: - the proxy node is marked specially in gp->flags, so the core will not generate a g_new_provider_event when the provider port is created (this means there is no taste() call and nobody should be able to attach to the port). - the provider port we attach to is linked, with two pointers, to the provider and consumer ports of the proxy_node. In this situation, g_io_request() finds that port pp has a proxy attached to it, and immediately redirects the requests to the proxy, which does everything a geom node does (cloning requests, etc). When the proxy wants to pass the request down, it sends it again to pp, but now there is no redirection because the source can be identified as the proxy. The pointers in the bio insure a correct flow of the requests on the reverse path. Disconnecting a proxy is almost trivial: apart from handling possible races on the data path, we just need to clear pp->proxy_pp and pp->proxy_cp. After that, we can just send the regular destroy events to the proxy node, who will have to take care of flushing any pending bio's (e.g. see our geom_sched node that already does this). Overall the change is very small (see attached patch): a couple of lines in g_io_request, two extra fields in the g_provider, and the addition of a flag to gp->flags to control the generation of g_new_provider_event. There is basically no overhead on regular operation, and only a couple of extra pointers in struct g_provider (we use a spare bit in gp->flags to mark G_GEOM_PROXY nodes). The only things missing in the patch should be: - a check to avoid races on creation&destruction of a proxy. I am not so sure on how to achieve this, but creation and destruction are rare and can normally wait, so we could just piggyback the small critical section (manipulating pp->proxy_cp and pp->proxy_cp) into some other piece of code that is guaranteed to be race-free. - a check to prevent attaching to a provider port of a proxy (not a problem, i believe); - a check to prevent attaching a proxy to a provider port that already has one. Of course you can attach a proxy to another proxy, and if you want to change the order it is as simple as removing the existing proxy and reattaching it after the new one. Feedback welcome. cheers luigi and fabio From pjd at FreeBSD.org Thu Mar 19 03:12:00 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Thu Mar 19 03:12:44 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090319081936.GA32750@onelab2.iet.unipi.it> References: <20090319081936.GA32750@onelab2.iet.unipi.it> Message-ID: <20090319094505.GA1539@garage.freebsd.pl> On Thu, Mar 19, 2009 at 09:19:36AM +0100, Luigi Rizzo wrote: > Hi, > > Fabio Checconi and I have been thinking on how to implement "proxy" > geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer > port, do not do any data transformation, and can be transparently > inserted or removed on top of a provider port while the tree is > actively moving data. > > Our immediate need was to add/remove a scheduler while a disk is > mounted, but there are possibly other uses e.g. if one wants to > "sniff" the traffic through a disk, or do other ops that are > transparent for the data stream. > > We would like opinion on the following solution, which seems > extremely simple in terms of implementation. > > The idea is to intercept requests coming on a provider port, pp, and > redirect them to a geom node acting as a proxy if the port > is configured in this way: > > +=====...===...======+ > H H > H H > H H > +=====...====== cp ==+ > | +---------------+ > V | V > +=====.....==== pp ==+ | +======= proxy_pp ==+ > H 'ad0s1' H | H H > H ------->--+ H H > H gp -------<--+ H proxy_node H > H H | H H > +=======....===...===+ | +======= proxy_cp ==+ > | V > +---------------+ > > Normally the proxy does not exist, and the geom tree works as it does now. > > When we create a 'proxy' node, with something like > > geom my_proxy_class proxy ad0s1 > > we do something very similar to a 'create', but: > > - the proxy node is marked specially in gp->flags, so the core will > not generate a g_new_provider_event when the provider port is created > (this means there is no taste() call and nobody should be able > to attach to the port). > > - the provider port we attach to is linked, with two pointers, > to the provider and consumer ports of the proxy_node. > > In this situation, g_io_request() finds that port pp has a proxy attached > to it, and immediately redirects the requests to the proxy, which > does everything a geom node does (cloning requests, etc). > When the proxy wants to pass the request down, it sends it again to pp, > but now there is no redirection because the source can be identified > as the proxy. The pointers in the bio insure a correct flow of the > requests on the reverse path. The one advantage I see for this over using regular GEOM rules is that new consumers go through proxy automatically. When I was working on similar functionality I more wanted to do something like this: consumer1 consumer2 \ / \ / provider Insert the proxy in the middle of any provider-consumer pair: consumer1 consumer2 | | proxy_provider | | / proxy_consumer / \ / provider This can be done (almost I think) atomically: /* First attach to the destination provider. */ g_attach(proxy_consumer, provider); /* Then switch original consumer to use proxy_provider (should be almost atomic). */ consumer1->provider = proxy_provider; /* handle access counts */ In-flight I/O requests know how to go back, because they have source and destination stored in bio_from and bio_to fields, so no races here. > Disconnecting a proxy is almost trivial: apart from handling possible > races on the data path, we just need to clear pp->proxy_pp and pp->proxy_cp. > After that, we can just send the regular destroy events to the proxy > node, who will have to take care of flushing any pending bio's (e.g. > see our geom_sched node that already does this). > > Overall the change is very small (see attached patch): > a couple of lines in g_io_request, two extra fields in the g_provider, > and the addition of a flag to gp->flags to control the generation > of g_new_provider_event. > There is basically no overhead on regular operation, and only > a couple of extra pointers in struct g_provider (we use a spare > bit in gp->flags to mark G_GEOM_PROXY nodes). > > The only things missing in the patch should be: > > - a check to avoid races on creation&destruction of a proxy. > I am not so sure on how to achieve this, but creation and destruction > are rare and can normally wait, so we could just piggyback the > small critical section (manipulating pp->proxy_cp and pp->proxy_cp) > into some other piece of code that is guaranteed to be race-free. > > - a check to prevent attaching to a provider port of a proxy > (not a problem, i believe); > > - a check to prevent attaching a proxy to a provider port that already > has one. Of course you can attach a proxy to another proxy, and > if you want to change the order it is as simple as removing the > existing proxy and reattaching it after the new one. Could you provide link for the patch, as it was removed from your e-mail? -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090319/8e89dfc7/attachment.pgp From rizzo at iet.unipi.it Thu Mar 19 04:08:53 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 19 04:09:00 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090319094505.GA1539@garage.freebsd.pl> References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319094505.GA1539@garage.freebsd.pl> Message-ID: <20090319111339.GA38075@onelab2.iet.unipi.it> On Thu, Mar 19, 2009 at 10:45:05AM +0100, Pawel Jakub Dawidek wrote: > > Hi, > > > > Fabio Checconi and I have been thinking on how to implement "proxy" > > geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer > > port, do not do any data transformation, and can be transparently > > inserted or removed on top of a provider port while the tree is > > actively moving data. ... > > Overall the change is very small (see attached patch): > > a couple of lines in g_io_request, two extra fields in the g_provider, > > and the addition of a flag to gp->flags to control the generation > > of g_new_provider_event. It seems that the attachment was removed so here it is http://info.iet.unipi.it/~luigi/FreeBSD/20090319-geom-proxy.patch cheers luigi From rizzo at iet.unipi.it Thu Mar 19 04:17:08 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 19 04:17:25 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090319094505.GA1539@garage.freebsd.pl> References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319094505.GA1539@garage.freebsd.pl> Message-ID: <20090319112152.GB38075@onelab2.iet.unipi.it> On Thu, Mar 19, 2009 at 10:45:05AM +0100, Pawel Jakub Dawidek wrote: > On Thu, Mar 19, 2009 at 09:19:36AM +0100, Luigi Rizzo wrote: ... > The one advantage I see for this over using regular GEOM rules is that > new consumers go through proxy automatically. When I was working on > similar functionality I more wanted to do something like this: > > consumer1 consumer2 > \ / > \ / > provider > > Insert the proxy in the middle of any provider-consumer pair: > > consumer1 consumer2 > | | > proxy_provider | > | / > proxy_consumer / > \ / > provider ok this is slightly different from what we have implemented, as we hook into the provider whereas you hook into the consumer. In our case we really need the hook to be in the provider so it intercepts accesses from all consumers, e.g. from /dev/ad0 and from the filesystems mounted on top of it. Given that the geom_disk node does not have a consumer on the bottom, we cannot do it differently. I can imagine that one might want to attach a proxy to a consumer port, but I cannot make a specific case where this would be needed. Also, I wonder how do i name a consumer port in the geom model ?? cheers luigi From marius at nuenneri.ch Thu Mar 19 05:06:32 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Thu Mar 19 05:06:49 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090319081936.GA32750@onelab2.iet.unipi.it> References: <20090319081936.GA32750@onelab2.iet.unipi.it> Message-ID: 2009/3/19 Luigi Rizzo : > Hi, > > Fabio Checconi and I have been thinking on how to implement "proxy" > geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer > port, do not do any data transformation, and can be transparently > inserted or removed on top of a provider port while the tree is > actively moving data. > > Our immediate need was to add/remove a scheduler while a disk is > mounted, but there are possibly other uses e.g. if one wants to > "sniff" the traffic through a disk, or do other ops that are > transparent for the data stream. > > We would like opinion on the following solution, which seems > extremely simple in terms of implementation. > > The idea is to intercept requests coming on a provider port, pp, and > redirect them to a geom node acting as a proxy if the port > is configured in this way: > > ? ? +=====...===...======+ > ? ? H ? ? ? ? ? ? ? ? ? ?H > ? ? H ? ? ? ? ? ? ? ? ? ?H > ? ? H ? ? ? ? ? ? ? ? ? ?H > ? ? +=====...====== cp ==+ > ? ? ? ? ? ? ? ? ? ? | ? ? ? ? ?+---------------+ > ? ? ? ? ? ? ? ? ? ? V ? ? ? ? ?| ? ? ? ? ? ? ? V > ? ? +=====.....==== pp ==+ ? ? | ? ?+======= proxy_pp ==+ > ? ? H ? ? ? ? ? 'ad0s1' ?H ? ? | ? ?H ? ? ? ? ? ? ? ? ? H > ? ? H ? ? ? ? ? ? ? ?------->--+ ? ?H ? ? ? ? ? ? ? ? ? H > ? ? H ? ? ? ?gp ? ? ?-------<--+ ? ?H ? ?proxy_node ? ? H > ? ? H ? ? ? ? ? ? ? ? ? ?H ? ? | ? ?H ? ? ? ? ? ? ? ? ? H > ? ? +=======....===...===+ ? ? | ? ?+======= proxy_cp ==+ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ? ? ? ? ? ? V > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+---------------+ > > Normally the proxy does not exist, and the geom tree works as it does now. > > When we create a 'proxy' node, with something like > > ? ? ? ?geom my_proxy_class proxy ad0s1 > > we do something very similar to a 'create', but: > > - the proxy node is marked specially in gp->flags, so the core will > ?not generate a g_new_provider_event when the provider port is created > ?(this means there is no taste() call and nobody should be able > ?to attach to the port). > > - the provider port we attach to is linked, with two pointers, > ?to the provider and consumer ports of the proxy_node. > > In this situation, g_io_request() finds that port pp has a proxy attached > to it, and immediately redirects the requests to the proxy, which > does everything a geom node does (cloning requests, etc). > ? When the proxy wants to pass the request down, it sends it again to pp, > but now there is no redirection because the source can be identified > as the proxy. ?The pointers in the bio insure a correct flow of the > requests on the reverse path. > > Disconnecting a proxy is almost trivial: apart from handling possible > races on the data path, we just need to clear pp->proxy_pp and pp->proxy_cp. > After that, we can just send the regular destroy events to the proxy > node, who will have to take care of flushing any pending bio's (e.g. > see our geom_sched node that already does this). > > Overall the change is very small (see attached patch): > a couple of lines in g_io_request, two extra fields in the g_provider, > and the addition of a flag to gp->flags to control the generation > of g_new_provider_event. > There is basically no overhead on regular operation, and only > a couple of extra pointers in struct g_provider (we use a spare > bit in gp->flags to mark G_GEOM_PROXY nodes). > > The only things missing in the patch should be: > > - a check to avoid races on creation&destruction of a proxy. > ?I am not so sure on how to achieve this, but creation and destruction > ?are rare and can normally wait, so we could just piggyback the > ?small critical section (manipulating pp->proxy_cp and pp->proxy_cp) > ?into some other piece of code that is guaranteed to be race-free. > > - a check to prevent attaching to a provider port of a proxy > ?(not a problem, i believe); > > - a check to prevent attaching a proxy to a provider port that already > ?has one. Of course you can attach a proxy to another proxy, and > ?if you want to change the order it is as simple as removing the > ?existing proxy and reattaching it after the new one. > > Feedback welcome. I wonder if it's really necessary to alter the GEOM infrastructure or if it is possible to do this with what's there already. Just an idea: Lock g_topology, put g_down and g_up to sleep, alter the consumer and provider pointers where you need it so the everything is routed through your proxy class (which isn't special in any way) and restart g_down and g_up. From rizzo at iet.unipi.it Thu Mar 19 05:56:52 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 19 05:57:03 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: References: <20090319081936.GA32750@onelab2.iet.unipi.it> Message-ID: <20090319130137.GB40489@onelab2.iet.unipi.it> On Thu, Mar 19, 2009 at 12:41:13PM +0100, Marius N?nnerich wrote: > 2009/3/19 Luigi Rizzo : > > Hi, > > > > Fabio Checconi and I have been thinking on how to implement "proxy" > > geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer > > port, do not do any data transformation, and can be transparently > > inserted or removed on top of a provider port while the tree is > > actively moving data. ... > > The idea is to intercept requests coming on a provider port, pp, and > > redirect them to a geom node acting as a proxy if the port > > is configured in this way: ... > I wonder if it's really necessary to alter the GEOM infrastructure or > if it is possible to do this with what's there already. Just an idea: > Lock g_topology, put g_down and g_up to sleep, alter the consumer and > provider pointers where you need it so the everything is routed > through your proxy class (which isn't special in any way) and restart > g_down and g_up. we'll look into this, thanks. If we can spare the extra fields in the g_provider, the thing is even easier to do. I just don't know how your suggestion interferes with the naming: if I change the pointers, the name of a provider will not be anymore a prefix of the name of the node attached above. But maybe that is not an architectural requirements but just a convenient convention. cheers luigi From linimon at FreeBSD.org Fri Mar 20 12:08:30 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Fri Mar 20 12:08:41 2009 Subject: bin/132845: [geom] [patch] ggated(8) does not close files opened after disconnect Message-ID: <200903201908.n2KJ8T6k039993@freefall.freebsd.org> Synopsis: [geom] [patch] ggated(8) does not close files opened after disconnect Responsible-Changed-From-To: freebsd-bugs->freebsd-geom Responsible-Changed-By: linimon Responsible-Changed-When: Fri Mar 20 19:08:18 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=132845 From ivoras at freebsd.org Fri Mar 20 17:08:22 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Mar 20 17:08:28 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090319130137.GB40489__3492.42561865157$1237467521$gmane$org@onelab2.iet.unipi.it> References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319130137.GB40489__3492.42561865157$1237467521$gmane$org@onelab2.iet.unipi.it> Message-ID: Luigi Rizzo wrote: > On Thu, Mar 19, 2009 at 12:41:13PM +0100, Marius N?nnerich wrote: >> 2009/3/19 Luigi Rizzo : >>> Hi, >>> >>> Fabio Checconi and I have been thinking on how to implement "proxy" >>> geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer >>> port, do not do any data transformation, and can be transparently >>> inserted or removed on top of a provider port while the tree is >>> actively moving data. > ... >>> The idea is to intercept requests coming on a provider port, pp, and >>> redirect them to a geom node acting as a proxy if the port >>> is configured in this way: > ... >> I wonder if it's really necessary to alter the GEOM infrastructure or >> if it is possible to do this with what's there already. Just an idea: >> Lock g_topology, put g_down and g_up to sleep, alter the consumer and >> provider pointers where you need it so the everything is routed >> through your proxy class (which isn't special in any way) and restart >> g_down and g_up. > > we'll look into this, thanks. If we can spare the extra fields > in the g_provider, the thing is even easier to do. > > I just don't know how your suggestion interferes with the naming: > if I change the pointers, the name of a provider will not > be anymore a prefix of the name of the node attached above. > But maybe that is not an architectural requirements but just > a convenient convention. Not only with naming and device creation - the proxy classes cannot be "normal" classes because it's common that "normal" classes do a lot of initialization in .taste. (i.e. there at least needs to be a flag for proxy classes) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090321/dc97f518/signature.pgp From marius at nuenneri.ch Fri Mar 20 18:25:23 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Fri Mar 20 18:25:30 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319130137.GB40489__3492.42561865157$1237467521$gmane$org@onelab2.iet.unipi.it> Message-ID: On Sat, Mar 21, 2009 at 00:34, Ivan Voras wrote: > Luigi Rizzo wrote: >> On Thu, Mar 19, 2009 at 12:41:13PM +0100, Marius N?nnerich wrote: >>> 2009/3/19 Luigi Rizzo : >>>> Hi, >>>> >>>> Fabio Checconi and I have been thinking on how to implement "proxy" >>>> geom nodes, i.e. nodes that have exactly 1 provider and 1 consumer >>>> port, do not do any data transformation, and can be transparently >>>> inserted or removed on top of a provider port while the tree is >>>> actively moving data. >> ... >>>> The idea is to intercept requests coming on a provider port, pp, and >>>> redirect them to a geom node acting as a proxy if the port >>>> is configured in this way: >> ... >>> I wonder if it's really necessary to alter the GEOM infrastructure or >>> if it is possible to do this with what's there already. Just an idea: >>> Lock g_topology, put g_down and g_up to sleep, alter the consumer and >>> provider pointers where you need it so the everything is routed >>> through your proxy class (which isn't special in any way) and restart >>> g_down and g_up. >> >> we'll look into this, thanks. If we can spare the extra fields >> in the g_provider, the thing is even easier to do. >> >> I just don't know how your suggestion interferes with the naming: >> if I change the pointers, the name of a provider will not >> be anymore a prefix of the name of the node attached above. >> But maybe that is not an architectural requirements but just >> a convenient convention. > > Not only with naming and device creation - the proxy classes cannot be > "normal" classes because it's common that "normal" classes do a lot of > initialization in .taste. (i.e. there at least needs to be a flag for > proxy classes) Take a look at geom_nop, it doesn't taste. It is like a proxy already as far as I can see. I don't know what to do about the naming though. From ivoras at freebsd.org Sat Mar 21 04:46:11 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Sat Mar 21 04:46:17 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319130137.GB40489__3492.42561865157$1237467521$gmane$org@onelab2.iet.unipi.it> Message-ID: <9bbcef730903210420l132c5287yb0a474901424b7da@mail.gmail.com> 2009/3/21 Marius N?nnerich : > Take a look at geom_nop, it doesn't taste. It is like a proxy already > as far as I can see. I don't know what to do about the naming though. Maybe I expressed it wrongly - what I meant to say is that existing "normal" GEOM classes, even if they are proxy-like in nature (like GELI) cannot simply "be used" as a proxy in this context. From pjd at FreeBSD.org Sat Mar 21 13:20:38 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sat Mar 21 13:20:51 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: References: <20090319081936.GA32750@onelab2.iet.unipi.it> <20090319130137.GB40489__3492.42561865157$1237467521$gmane$org@onelab2.iet.unipi.it> Message-ID: <20090321200334.GB3102@garage.freebsd.pl> On Sat, Mar 21, 2009 at 12:34:37AM +0100, Ivan Voras wrote: > Not only with naming and device creation - the proxy classes cannot be > "normal" classes because it's common that "normal" classes do a lot of > initialization in .taste. (i.e. there at least needs to be a flag for > proxy classes) There are many leaf GEOM classes where taste is never used and also non-leaf classes that don't use tasting (like gbde or geli). Those aren't special GEOM classes in any way. I recommend reading phk's GEOM tutorial from BSDCon 2003. You can find an interesting slide which seems to sum up this thread quite nicely: Special GEOM classes. --------------------- - There are no special GEOM classes. I wonder if phk changed his opinion over time. :) Maybe instead of adding special providers and GEOM classes, the infrastructure should be extended in some way, so that we won't use provider term to describe something that isn't really a regular GEOM provider. On the other hand disk scheduler provides kind of transformation, the only real problem is usability - doing it in the same way as the other GEOM classes will make it much more problematic to use. Current patch makes it much easier to use, but violates infrastructure. Another thing came to my mind. Currently GEOM does nothing to prevent situation where there are two GEOM providers using the same name - we just end up with two /dev/foo entires which is a bit surprising. I was wondering if we couldn't clean this up (at least partially) and implement insert/remove functionality for GEOM in one go. For example we have /dev/ad0 provider created by the DISK class. We create another provider /dev/ad0 of SCHED class. Then we insert SCHED:/dev/ad0 on top of DISK:/dev/ad0. GEOM will reconnect all existing consumers from DISK:/dev/ad0 to SCHED:/dev/ad0 and connect SCHED consumer to DISK:/dev/ad0. Also, GEOM will show only one /dev/ad0 entry in /dev/ - the one which comes from a geom with higher rank count. Now whoever wants to open /dev/ad0 will end up opening SCHED:/dev/ad0 (so there won't be a problem with new consumers). Of course providers that come from geoms with the same rank count would still be a problem, but... Also instead of rank count we could only allow connected providers to cover their children. What do you guys think? -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090321/454573bd/attachment.pgp From phk at phk.freebsd.dk Sat Mar 21 13:43:08 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sat Mar 21 13:43:14 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: Your message of "Sat, 21 Mar 2009 21:03:34 +0100." <20090321200334.GB3102@garage.freebsd.pl> Message-ID: <42965.1237667050@critter.freebsd.dk> In message <20090321200334.GB3102@garage.freebsd.pl>, Pawel Jakub Dawidek write s: > Special GEOM classes. > --------------------- > > - There are no special GEOM classes. > >I wonder if phk changed his opinion over time. :) He didn't. >Maybe instead of adding special providers and GEOM classes, the >infrastructure should be extended in some way, so that we won't use >provider term to describe something that isn't really a regular GEOM >provider. I have not had time to read this entire thread, being somewhat snowed under with work elsewhere. First up, I am not sure I understand why the proxy nodes would be the (or even 'a') right solution for I/O scheduling. In fact, it is not very clear to me at all that scheduling should happen inside geom at all. I would tend to think that it belongs in the devicedriver, where intelligent information about things like tagged queuing abilities can be taken into account. For any kind of scheduling to do anything non-trivial, requests needs to be piled up so they can be reordered, doing that in places where bio's dont naturally pile up would require a damn good argument and strong numbers to convince me. Where the already do pile up, the existing disksort mechanism and API can be used. (If you want to mess with the disksort *algorithm*, by all means do so, but that should not require you to hack up any apis, apart from the one to select algorithm). With that said: I always envisioned the ability to insert and delete transparant nodes, with the poster boy example being: insert a mirror geom add a mirror on some other provider sync them. delete the old mirro copy pull the mirror mirror geom out again and (tada!) you have migrated a live partition from one disk to another. For that to work, the new class has to end up between the consumer(s) and the geom-class, and I generally planned to stick a {geom-consumer-provider} combination in between the provider and its class, rather than a {provider-geom-consumer} between the consumer and its provider. The reason for this, is that it can be done without stalling the I/O stream since bios all have built in return tickets. So I think, my opinion on this proposal is: 1. Why would you do scheduling in the middle of the GEOM mesh ?? Very strong arguments and numbers will have to be forwarded for that to be a reason to deviate from: 2. There still are not, and should not be created any special GEOM classes. GEOM derives much of it's strength from the fact that there are no special cases to handle, that shouldn't be sold too cheaply. 3. Do it properly instead: Implement the general insert/remove properly, so that we can do things like the "move" example above. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From ivoras at freebsd.org Sat Mar 21 14:47:26 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Sat Mar 21 14:47:34 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <42965.1237667050@critter.freebsd.dk> References: <20090321200334.GB3102@garage.freebsd.pl> <42965.1237667050@critter.freebsd.dk> Message-ID: Poul-Henning Kamp wrote: > In message <20090321200334.GB3102@garage.freebsd.pl>, Pawel Jakub Dawidek write > s: > >> Special GEOM classes. >> --------------------- >> >> - There are no special GEOM classes. >> >> I wonder if phk changed his opinion over time. :) > > He didn't. Well, let's not call them "GEOM classes" then - call them "GEOM proxies" :) Seriously, *if* there should be things like they are described in this thread, we should write down what they should do and then decide if it's worth reusing the "GEOM class" name and infrastructure. My attempt at this (still under the *if* part) is: * They should probably modify bio's in-place, if it's possible, or in some other way ensure 1:1 mapping of the IO requests that pass through them * The following "GEOM class" and "GEOM instance" methods are NOT needed: destroy_geom, taste, access, dumpconf * ... the following ARE needed: ctlreq, start, stop * ... the following may be needed: init, fini, orphan, spoiled (I'm trying to keep them light-weight :) ). >> Maybe instead of adding special providers and GEOM classes, the >> infrastructure should be extended in some way, so that we won't use >> provider term to describe something that isn't really a regular GEOM >> provider. > > I have not had time to read this entire thread, being somewhat > snowed under with work elsewhere. > > First up, I am not sure I understand why the proxy nodes would > be the (or even 'a') right solution for I/O scheduling. > > In fact, it is not very clear to me at all that scheduling should > happen inside geom at all. > > I would tend to think that it belongs in the devicedriver, where > intelligent information about things like tagged queuing abilities > can be taken into account. Except for "dumb" controllers and/or drivers. AFAIK our ATA doesn't do NCQ. > For any kind of scheduling to do anything non-trivial, requests > needs to be piled up so they can be reordered, doing that in > places where bio's dont naturally pile up would require a damn > good argument and strong numbers to convince me. > > Where the already do pile up, the existing disksort mechanism > and API can be used. (If you want to mess with the disksort > *algorithm*, by all means do so, but that should not require > you to hack up any apis, apart from the one to select algorithm). Well, each layer in common Intel servers today does its own scheduling: * The OS - it's was a big deal when Linux and Vista implemented them * The (smart) disk controllers * The drives themselves. I don't know enough to clam it is good or bad, but there's probably something in there. > 2. There still are not, and should not be created any special GEOM > classes. GEOM derives much of it's strength from the fact that > there are no special cases to handle, that shouldn't be sold > too cheaply. > 3. Do it properly instead: Implement the general insert/remove > properly, so that we can do things like the "move" example above. Doesn't the inclusion of "hot-swappiness" in the topology tree mean that either the existing classes need to be modified extensively, or that there must be some kind of flag telling which classes support this mechanism and which don't, effectively segregating them? Also, some classes don't have a meaning in the "proxy" context, like stripe, concat, raid3, vinum, etc. What I'm trying to say is that it isn't the classes (in the narrow sense) that have the magical "no special cases" property, since they are obviously constrained by the nature of their task, but the framework. Any "GEOM proxy" (if we go that route...) will obviously be made usable in any place in the topology. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090321/e8b44975/signature.pgp From rizzo at iet.unipi.it Sat Mar 21 18:16:45 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Sat Mar 21 18:16:51 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <42965.1237667050@critter.freebsd.dk> References: <20090321200334.GB3102@garage.freebsd.pl> <42965.1237667050@critter.freebsd.dk> Message-ID: On Sat, Mar 21, 2009 at 9:24 PM, Poul-Henning Kamp wrote: > In message <20090321200334.GB3102@garage.freebsd.pl>, Pawel Jakub Dawidek write > s: > >> ? ? ? Special GEOM classes. >> ? ? ? --------------------- >> >> ? ? ? - There are no special GEOM classes. >> >>I wonder if phk changed his opinion over time. :) > > He didn't. > >>Maybe instead of adding special providers and GEOM classes, the >>infrastructure should be extended in some way, so that we won't use >>provider term to describe something that isn't really a regular GEOM >>provider. > > I have not had time to read this entire thread, being somewhat > snowed under with work elsewhere. ... > With that said: I always envisioned the ability to insert and > delete transparant nodes, with the poster boy example being: > > ? ? ? ?insert a mirror geom > ? ? ? ?add a mirror on some other provider > ? ? ? ?sync them. > ? ? ? ?delete the old mirro copy > ? ? ? ?pull the mirror mirror geom out again > > and (tada!) you have migrated a live partition from one disk to > another. > > For that to work, the new class has to end up between the > consumer(s) and the geom-class, and I generally planned to > stick a {geom-consumer-provider} ?combination in between > the provider and its class, rather than a {provider-geom-consumer} > between the consumer and its provider. > > The reason for this, is that it can be done without stalling > the I/O stream since bios all have built in return tickets. > > So I think, my opinion on this proposal is: > ... > > 2. There still are not, and should not be created any special GEOM > ? classes. ?GEOM derives much of it's strength from the fact that > ? there are no special cases to handle, that shouldn't be sold > ? too cheaply. > > 3. Do it properly instead: Implement the general insert/remove > ? properly, so that we can do things like the "move" example above. > > Poul-Henning > > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > With the scheduling issues hopefully addressed in the other email/thread: the only thing we asked in this thread is whether a transparent insert/remove in GEOM is already possible, or it must be implemented. It looks like we are in the latter case, so one option we suggested (and implemented) was to stick "something" between the provider and its class, with this "something" being a regular geom class. http://info.iet.unipi.it/~luigi/FreeBSD/20090319-geom-proxy.patch This seems to be almost (see [1]) perfectly in line with your suggestion above, does not cause deviations from the model, and does not introducte special classes (see [2]). The only thing we need is adding two pointers to decouple the provider from its geom. I'd love to know if a better way exists, maybe the behaviour described in note [1] below is what you had in mind ? [1]: The way i can read your sententence ... stick a {geom-consumer-provider} combination in between the provider and its class, is the following: take the existing provider "pp" attached to geom "gp" and make it the provider of the new geom "new_gp". Then create a new provider, "new_pp", link it to "gp", and link the consumer of "new_gp" to "new_pp". So we have the following: (each node is in square brackets): BEFORE ---> [ pp --> gp ... ] AFTER ---> [ pp --> new_gp --> new_cp ] ---> [ new_pp --> gp ... ] On removal, relink "pp" to "gp" and destroy all the new_* stuff. This should save the extra pointers in the struct g_provider, and perhaps not much harder to implement than what we did ? [2] the GEOM_PROXY flag that we suggested is just an optimization to avoid calling taste() on a provider that nobody should be interested in attaching to. I think its presence does not change the model, but nothing bad happens if we don't use this flag. How does it sound now ? cheers luigi From rizzo at iet.unipi.it Sat Mar 21 18:29:16 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Sat Mar 21 18:29:23 2009 Subject: disk scheduling (was: Re: RFC: adding 'proxy' nodes to provider ports (with patch)) Message-ID: On Sat, Mar 21, 2009 at 9:24 PM, Poul-Henning Kamp wrote: > In message <20090321200334.GB3102@garage.freebsd.pl>, Pawel Jakub Dawidek write > s: > >> ? ? ? Special GEOM classes. >> ? ? ? --------------------- >> >> ? ? ? - There are no special GEOM classes. >> >>I wonder if phk changed his opinion over time. :) > > He didn't. > >>Maybe instead of adding special providers and GEOM classes, the >>infrastructure should be extended in some way, so that we won't use >>provider term to describe something that isn't really a regular GEOM >>provider. > > I have not had time to read this entire thread, being somewhat > snowed under with work elsewhere. > > First up, I am not sure I understand why the proxy nodes would > be the (or even 'a') right solution for I/O scheduling. > > In fact, it is not very clear to me at all that scheduling should > happen inside geom at all. > > I would tend to think that it belongs in the devicedriver, where > intelligent information about things like tagged queuing abilities > can be taken into account. > > For any kind of scheduling to do anything non-trivial, requests > needs to be piled up so they can be reordered, doing that in > places where bio's dont naturally pile up would require a damn > good argument and strong numbers to convince me. > > Where the already do pile up, the existing disksort mechanism > and API can be used. ?(If you want to mess with the disksort > *algorithm*, by all means do so, but that should not require > you to hack up any apis, apart from the one to select algorithm). The thread was meant to be on inserting transparent nodes in GEOM. Scheduling was just an example on where the problem came out, but since you ask let's take a short diversion (and let me relabel this thread so we can discuss things separately). + nobody objects that the ideal place for scheduling is where requests naturally "pile up". Too bad that this ideal place is sometimes one we cannot access, i.e. the firmware of the disk drive. + some scheduling algorithms are "non work conserving", and they work by delaying some requests in the hope to save some seeks. They can be very effective (we sent numbers in our previous posting in january, but you can look at the literature on anticipatory scheduling for more). For the way they work, these algorithms artificially cause queues to build up. As such you can implement them effectively even above the device driver. + changing disksort can do some things but not all one would want. E.g. if you need to delay requests (as you do in several disk schedulers) then you need to interact heavily with the driver, e.g. to make sure it does not assume that the scheduler is work-conserving (some do, we found out in the GSoC 2005 work on disk schedulers), and to find out which kind of locking to use when it is time to reinject delayed requests. So, implementing certain scheduling algorithms in the device driver requires specific code on each and every driver. + of course adding or not a disk scheduler in one's system is completely optional, and there is no intention to change any current default. if you want a quick example on how can you fix some severe problems with the current disk scheduler even doing scheduling above the device driver, try the same experiments we did, first without scheduler, then with the geom_sched module that we posted: 1. run a few 'dd' in parallel on top of an ATA or SATA disk, and look at the overal throughput with and without scheduler; 2. run a cvs update (or other seeky application) in parallel with a sequential dd reader, and look at how slowly 'dd' runs without scheduler; 3. run a cvs update (or other seeky application) in parallel with a sequential dd writer, and look at how slowly cvs goes without scheduler. This is mostly an effect of Examples #1 and #2 are a direct result of the request patterns issued by readers, and cannot be fixed with work-conserving changes to disksort. Readers only have one pending request each, so the disk is doing a seek on each request, and the throughput degrades heavily. With anticipation, after one request you give the process a little bit of time to present another one, so you can serve a short burst of requests from each reader, boosting both individual and overall throughput. Example #3 is a result of the "capture effect" of our disksort: writers have many pending requests and if they are for contiguous blocks, once one of them is served the disk keeps serving the same process starving the others. Here you can do a lot of useful stuff even above the device driver, e.g. do not serve more than so many contiguous requests in a row. cheers luigi From phk at phk.freebsd.dk Sun Mar 22 01:13:04 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun Mar 22 01:13:11 2009 Subject: disk scheduling (was: Re: RFC: adding 'proxy' nodes to provider ports (with patch)) In-Reply-To: Your message of "Sun, 22 Mar 2009 02:00:59 +0100." Message-ID: <45710.1237709582@critter.freebsd.dk> In message , Luigi Rizzo writes: >The thread was meant to be on inserting transparent nodes in GEOM. > >Scheduling was just an example on where the problem came out, Scheduling is the *only* application I have seen mentioned for this special case geom construct ? >+ nobody objects that the ideal place for scheduling is where > requests naturally "pile up". Too bad that this ideal > place is sometimes one we cannot access, i.e. the firmware > of the disk drive. Do you seriously propose that we could compete in scheduling quality, with the disk drives firmware on drives that can have multiple outstanding requests ? >+ [anticipatory scheduling] > As such you can implement them effectively even above the device driver. I have yet to see any study propose that they could do any good inside the geom mesh, as opposed to right in front of the device driver ? >+ changing disksort can do some things but not all one would want. > [...] Then the correct answer is to insert a perfectly normal geom class above the disk drive to implement that. I totally fail to se what special kind of classes would buy you ? >if you want a quick example [...] I know what anticipatory disk-scheduling is, what it does, and what the downsides of it are. (I also know that with SSD's it becomes all but pointless). The question is not if we should improve disksorting, the question is if we need to hack up GEOM for it. The answer is "no". Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Sun Mar 22 01:22:03 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun Mar 22 01:22:09 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: Your message of "Sun, 22 Mar 2009 02:10:46 +0100." Message-ID: <45752.1237710121@critter.freebsd.dk> In message , Luigi Rizzo writes: > BEFORE ---> [ pp --> gp ... ] > AFTER ---> [ pp --> new_gp --> new_cp ] ---> [ new_pp --> gp ... ] Correct. There are many reasons for doing it this way, but the two major ones are: Providers see essentially one-way traffic (going down), because the bio's have their return-path recorded (admittedly: for this very reason), whereas consumers see two way traffic. If you wanted to substitute another provider, you would have to stall I/O activity on the consumers in order to get all the pointers set up right to not derail any bios while doing so. If instead you insert under the provider, you can hold topology, fiddle the pointers in the right order, and release topology all while bios zip up and down over the construction site. >[2] the GEOM_PROXY flag that we suggested is just an optimization to > avoid calling taste() on a provider that nobody should be interested > in attaching to. I think its presence does not change the model, > but nothing bad happens if we don't use this flag. You would not call taste() anyway, because all the new stuff is already open and active. But you need to add a new g_ctl verb to instantiate a transparant instance of the class, and this is where you can tell if inserting a given glass is even possible: classes that cannot will error out. Similarly, you need a verb to remove a transparent geom, which will fail if the class doesn't understand this, or do not consider that geom to be transparant. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From rizzo at iet.unipi.it Sun Mar 22 02:51:32 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Sun Mar 22 02:51:37 2009 Subject: disk scheduling (was: Re: RFC: adding 'proxy' nodes to provider ports (with patch)) In-Reply-To: <45710.1237709582@critter.freebsd.dk> References: <45710.1237709582@critter.freebsd.dk> Message-ID: On Sun, Mar 22, 2009 at 9:13 AM, Poul-Henning Kamp wrote: > In message , Luigi > Rizzo writes: > >>The thread was meant to be on inserting transparent nodes in GEOM. >> >>Scheduling was just an example on where the problem came out, > > Scheduling is the *only* application I have seen mentioned for > this special case geom construct ? man 4 geom has a section which explicitly mentions this construct, with the same example that you posted in the thread: ... SPECIAL TOPOLOGICAL MANEUVERS INSERT/DELETE are very special operations which allow a new geom to be instantiated between a consumer and a provider attached to each other and to remove it again. To understand the utility of this, imagine a provider being mounted as a file system. Between the DEVFS geom's consumer and its provider we insert a mirror module which configures itself with one mirror copy and consequently is transparent to the I/O requests on the path. We can now configure yet a mirror copy on the mirror geom, request a synchroniza- tion, and finally drop the first mirror copy. We have now, in essence, moved a mounted file system from one disk to another while it was being used. At this point the mirror geom can be deleted from the path again; it has served its purpose. >>+ changing disksort can do some things but not all one would want. >> [...] > > Then the correct answer is to insert a perfectly normal geom > class above the disk drive to implement that. I totally fail > to se what special kind of classes would buy you ? > >>if you want a quick example [...] > > I know what anticipatory disk-scheduling is, what it does, > and what the downsides of it are. (I also know that with > SSD's it becomes all but pointless). > > The question is not if we should improve disksorting, the question > is if we need to hack up GEOM for it. > > The answer is "no". Ok good, we are back on track on the geom architecture: then the question was just whether the INSERT/DELETE mentioned in the manpage was already supported or not, and how to implement it in a clean way. Hopefully the discussion in the main thread now contains enough detail to do it the right way. cheers luigi From ivoras at freebsd.org Sun Mar 22 06:02:37 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Mar 22 06:02:43 2009 Subject: disk scheduling (was: Re: RFC: adding 'proxy' nodes to provider ports (with patch)) In-Reply-To: <45710.1237709582@critter.freebsd.dk> References: <45710.1237709582@critter.freebsd.dk> Message-ID: <9bbcef730903220602q736b96dflab447e2d6d996754@mail.gmail.com> 2009/3/22 Poul-Henning Kamp : > In message , Luigi > ?Rizzo writes: > >>The thread was meant to be on inserting transparent nodes in GEOM. >> >>Scheduling was just an example on where the problem came out, > > Scheduling is the *only* application I have seen mentioned for > this special case geom construct ? I've joined this thread because once upon a time I was working on what has grown into gjournal, and one aspect of the original project was a logging "safety net" mode. The idea was to insert this class (or whatever) just before a file system consumer then do risky things with the file system metadata (like fsck-ing a badly damaged file system), with the option of commiting it or rolling it back. It has even grown into another SoC project. I see now it doesn't comply with my idea of a "lightweight" proxy (the first item, about 1:1 mappings) - so proxies look more and more like they should be classes. Also, gcache looks like a candidate. From pjd at FreeBSD.org Sun Mar 22 23:02:56 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sun Mar 22 23:03:03 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <45752.1237710121@critter.freebsd.dk> References: <45752.1237710121@critter.freebsd.dk> Message-ID: <20090323060325.GN3102@garage.freebsd.pl> On Sun, Mar 22, 2009 at 08:22:01AM +0000, Poul-Henning Kamp wrote: > In message , Luigi > Rizzo writes: > > > BEFORE ---> [ pp --> gp ... ] > > AFTER ---> [ pp --> new_gp --> new_cp ] ---> [ new_pp --> gp ... ] > > Correct. > > There are many reasons for doing it this way, but the two major ones > are: > > Providers see essentially one-way traffic (going down), because the > bio's have their return-path recorded (admittedly: for this very > reason), whereas consumers see two way traffic. > > If you wanted to substitute another provider, you would have to stall > I/O activity on the consumers in order to get all the pointers set > up right to not derail any bios while doing so. > > If instead you insert under the provider, you can hold topology, > fiddle the pointers in the right order, and release topology > all while bios zip up and down over the construction site. There is still a naming problem. pp and new_pp will end up with the same name. I'd suggest instructing GEOM to expose only parent in /dev/. > >[2] the GEOM_PROXY flag that we suggested is just an optimization to > > avoid calling taste() on a provider that nobody should be interested > > in attaching to. I think its presence does not change the model, > > but nothing bad happens if we don't use this flag. > > You would not call taste() anyway, because all the new stuff is > already open and active. The taste is still going to be send on new class arrival and on the last pp write close. > But you need to add a new g_ctl verb to instantiate a transparant > instance of the class, and this is where you can tell if inserting > a given glass is even possible: classes that cannot will error out. > > Similarly, you need a verb to remove a transparent geom, which > will fail if the class doesn't understand this, or do not consider > that geom to be transparant. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090323/da9b4b48/attachment.pgp From phk at phk.freebsd.dk Sun Mar 22 23:58:19 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun Mar 22 23:58:26 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: Your message of "Mon, 23 Mar 2009 07:03:25 +0100." <20090323060325.GN3102@garage.freebsd.pl> Message-ID: <42618.1237791496@critter.freebsd.dk> In message <20090323060325.GN3102@garage.freebsd.pl>, Pawel Jakub Dawidek write s: >There is still a naming problem. pp and new_pp will end up with the same >name. I'd suggest instructing GEOM to expose only parent in /dev/. who said the new provider had to have same name ? >The taste is still going to be send on new class arrival and on the last >pp write close. We decide that. Since we are inserting in an already open path, I think it makes very good sense to supress tasting, at least until close. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Mon Mar 23 02:42:22 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon Mar 23 02:42:28 2009 Subject: disk scheduling (was: Re: RFC: adding 'proxy' nodes to provider ports (with patch)) In-Reply-To: Your message of "Sun, 22 Mar 2009 10:51:29 +0100." Message-ID: <43415.1237801339@critter.freebsd.dk> In message , Luigi Rizzo writes: >>>Scheduling was just an example on where the problem came out, >> >> Scheduling is the *only* application I have seen mentioned for >> this special case geom construct ? > >man 4 geom has a section which explicitly mentions this construct, >with the same example that you posted in the thread: You will notice that there is no mention of "special classes" or "proxy nodes with special properties". If you want to do it, do it right. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From bugmaster at FreeBSD.org Mon Mar 23 04:06:57 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Mar 23 04:08:00 2009 Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org Message-ID: <200903231106.n2NB6tb6003992@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/132845 geom [geom] [patch] ggated(8) does not close files opened a o kern/132273 geom glabel(8): [patch] failing on journaled partition o kern/132242 geom [gmirror] gmirror.ko fails to fully initialize o kern/131353 geom [geom] gjournal(8) kernel lock o kern/131037 geom [geli] Unable to create disklabel on .eli-Device o kern/130528 geom gjournal fsck during boot o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid o bin/128398 geom [patch] glabel(8): teach geom_label to recognise gpt l f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/126902 geom [geom] geom_label: kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/124130 geom [gmirror] [usb] gmirror fails to start usb devices tha o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123630 geom [patch] [gmirror] gmirror doesnt allow the original dr o kern/123122 geom [geom] GEOM / gjournal kernel lock f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120231 geom [geom] GEOM_CONCAT error adding second drive o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis o kern/120021 geom [geom] [panic] net-p2p/qbittorrent crashes system when o kern/119743 geom [geom] geom label for cds is keeped after dismount and f kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error s kern/89102 geom [geom] [panic] panic when forced unmount FS from unplu o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 48 problems total. From rizzo at iet.unipi.it Mon Mar 23 13:02:22 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Mon Mar 23 13:02:29 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <42618.1237791496@critter.freebsd.dk> References: <20090323060325.GN3102@garage.freebsd.pl> <42618.1237791496@critter.freebsd.dk> Message-ID: <20090323200712.GA28660@onelab2.iet.unipi.it> On Mon, Mar 23, 2009 at 06:58:16AM +0000, Poul-Henning Kamp wrote: > In message <20090323060325.GN3102@garage.freebsd.pl>, Pawel Jakub Dawidek write > s: > > >There is still a naming problem. pp and new_pp will end up with the same > >name. I'd suggest instructing GEOM to expose only parent in /dev/. > > who said the new provider had to have same name ? > > >The taste is still going to be send on new class arrival and on the last > >pp write close. > > We decide that. > > Since we are inserting in an already open path, I think it makes very > good sense to supress tasting, at least until close. To summarize, here is how I have implemented a node that supports both regular "create" and the transparent "insert" we are discussing. Say we want to attach to an existing provider "pp" whose name is "ad0" BEFORE ---> [ pp --> old_gp ...] Then we can do either "geom xx create ad0" which results in AFTER create ---> [ newpp --> gp --> cp ] ---> [ pp --> old_gp ... ] or "geom xx insert ad0", which results in AFTER insert ---> [ pp --> gp --> cp ] ---> [ newpp --> old_gp ... ] The names of the various objects are the same in both cases so old_gp->name = "ad0" pp->name = "ad0" gp->name = "ad0.xx." newpp->name = "ad0.xx." This lets new clients connect to provider "ad0" without having to know about any insertion. Also, to remove the newly inserted pieces, in both cases you can run the same command "geom xx destroy ad0.xx." (remembering that in this case you are naming the geom, not the provider). In terms of code, no changes to the infrastructure, and the create/insert and destroy functions are the following (error checking removed for clarity) g_xx_create(struct g_provider *pp, struct g_class *mp, int insert ...) { snprintf(name, sizeof(name), "%s%s", pp->name, MY_SUFFIX); gp = g_new_geomf(mp, name); ... allocate and fill softc and geom... newpp = g_new_providerf(insert ? pp->geom : gp, gp->name); ... initialize mediasize and sectorsize cp = g_new_consumer(gp); g_attach(cp, insert ? newpp : pp); if (insert) { g_cancel_event(newpp); /* no taste() on this*/ /* link pp to old_gp */ LIST_REMOVE(pp, provider); pp->geom = gp; LIST_INSERT_HEAD(&gp->provider, pp, provider); g_access(cp, 1, 1, 1); /* we can move data */ sc->sc_insert = 1; /* remember for the destroy */ } g_error_provider(newpp, 0); } Here it is a bit inefficient to have to call g_cancel_event() but short of changing g_new_providerf() there is no way to avoid the g_new_provider event. g_xx_destroy(struct g_geom *gp) { ... if (sc->sc_insert) { pp = LIST_FIRST(&gp->provider); cp = LIST_FIRST(&gp->consumer); newpp = cp->provider; /* Link provider to the original geom. */ LIST_REMOVE(pp, provider); pp->geom = newpp->geom; LIST_INSERT_HEAD(&pp->geom->provider, pp, provider); g_access(cp, -1, -1, -1); /* I am not sure if we need the following 3 */ g_detach(cp); LIST_REMOVE(newpp, provider); g_destroy_provider(newpp); } ... /* regular destroy path */ } Above, I am not totally sure if we need to explicitly call g_detach() and destroy the provider, or if it will come for free as a result of the regular destoy code. The block "if (sc->sc_insert) {..}" is reasonably generic (and large, when you put in the error checking) to possibly deserve a function in geom_subr.c -- but until there are no other clients, it makes no sense. As usual, feedback welcome. cheers luigi From pjd at FreeBSD.org Mon Mar 23 13:19:21 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Mar 23 13:19:28 2009 Subject: RFC: adding 'proxy' nodes to provider ports (with patch) In-Reply-To: <20090323200712.GA28660@onelab2.iet.unipi.it> References: <20090323060325.GN3102@garage.freebsd.pl> <42618.1237791496@critter.freebsd.dk> <20090323200712.GA28660@onelab2.iet.unipi.it> Message-ID: <20090323201948.GA1723@garage.freebsd.pl> On Mon, Mar 23, 2009 at 09:07:12PM +0100, Luigi Rizzo wrote: > On Mon, Mar 23, 2009 at 06:58:16AM +0000, Poul-Henning Kamp wrote: > > In message <20090323060325.GN3102@garage.freebsd.pl>, Pawel Jakub Dawidek write > > s: > > > > >There is still a naming problem. pp and new_pp will end up with the same > > >name. I'd suggest instructing GEOM to expose only parent in /dev/. > > > > who said the new provider had to have same name ? > > > > >The taste is still going to be send on new class arrival and on the last > > >pp write close. > > > > We decide that. > > > > Since we are inserting in an already open path, I think it makes very > > good sense to supress tasting, at least until close. > > To summarize, here is how I have implemented a node that > supports both regular "create" and the transparent "insert" > we are discussing. > Say we want to attach to an existing provider "pp" whose name is "ad0" > > BEFORE ---> [ pp --> old_gp ...] > > Then we can do either "geom xx create ad0" which results in > > AFTER create ---> [ newpp --> gp --> cp ] ---> [ pp --> old_gp ... ] > > or "geom xx insert ad0", which results in > > AFTER insert ---> [ pp --> gp --> cp ] ---> [ newpp --> old_gp ... ] > > The names of the various objects are the same in both cases so > > old_gp->name = "ad0" > pp->name = "ad0" > gp->name = "ad0.xx." > newpp->name = "ad0.xx." > > This lets new clients connect to provider "ad0" without having to > know about any insertion. > Also, to remove the newly inserted pieces, in both cases you can > run the same command "geom xx destroy ad0.xx." (remembering that > in this case you are naming the geom, not the provider). > > In terms of code, no changes to the infrastructure, and the > create/insert and destroy functions are the following (error checking > removed for clarity) > > g_xx_create(struct g_provider *pp, struct g_class *mp, int insert ...) > { > snprintf(name, sizeof(name), "%s%s", pp->name, MY_SUFFIX); > gp = g_new_geomf(mp, name); > ... allocate and fill softc and geom... > newpp = g_new_providerf(insert ? pp->geom : gp, gp->name); > ... initialize mediasize and sectorsize > cp = g_new_consumer(gp); > g_attach(cp, insert ? newpp : pp); > if (insert) { > g_cancel_event(newpp); /* no taste() on this*/ > /* link pp to old_gp */ > LIST_REMOVE(pp, provider); > pp->geom = gp; newpp->private = pp->private; pp->private = NULL; newpp->index = pp->index; pp->index = 0; > LIST_INSERT_HEAD(&gp->provider, pp, provider); > g_access(cp, 1, 1, 1); /* we can move data */ > sc->sc_insert = 1; /* remember for the destroy */ > } > g_error_provider(newpp, 0); > } > > Here it is a bit inefficient to have to call g_cancel_event() > but short of changing g_new_providerf() there is no way to > avoid the g_new_provider event. > > g_xx_destroy(struct g_geom *gp) > { > ... > if (sc->sc_insert) { > pp = LIST_FIRST(&gp->provider); > cp = LIST_FIRST(&gp->consumer); > newpp = cp->provider; > /* Link provider to the original geom. */ > LIST_REMOVE(pp, provider); > pp->geom = newpp->geom; pp->private = newpp->private; newpp->private = NULL; pp->index = newpp->index; newpp->index = 0; > LIST_INSERT_HEAD(&pp->geom->provider, pp, provider); > g_access(cp, -1, -1, -1); > /* I am not sure if we need the following 3 */ > g_detach(cp); > LIST_REMOVE(newpp, provider); > g_destroy_provider(newpp); > } > ... > /* regular destroy path */ > } > > Above, I am not totally sure if we need to explicitly call g_detach() > and destroy the provider, or if it will come for free as a result of > the regular destoy code. > > The block "if (sc->sc_insert) {..}" is reasonably generic > (and large, when you put in the error checking) to possibly deserve > a function in geom_subr.c -- but until there are no other clients, > it makes no sense. > > As usual, feedback welcome. I don't think this is good idea to try to squeeze creation and insertion in one function. IMHO it would be better to have generic functions for insert/remove functionality: int g_insert(struct g_class *class, struct g_provider *oldpp); int g_remove(struct g_provider *oldpp); (In g_insert() class name can be attached to new provider's name for example.) -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090323/83a8da72/attachment.pgp From bzeeb-lists at lists.zabbadoz.net Wed Mar 25 15:15:29 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Wed Mar 25 15:15:37 2009 Subject: gpart on top of eli inside a slice is not working Message-ID: <20090325214318.Q67075@maildrop.int.zabbadoz.net> Hi, assume you get a laptop with the usual pre-install that you must not and cannot change but still want to add a freebsd to the wastefull emptyness at the end of the large disk. So you have 3 classic slices: 1 compaq recovery 2 ntfs 3 dos (free) So I tried to play a bit with that and tried to install freebsd on slice 3 inside eli. To try gjournal as well I thought I go with gpart directly as it will be the tool in the future instead od bsdlabel and created 3 paritions: 1 for the journal, 2 for swap and 2 for the data. All was fine. I rebooted. and there was garbage. Here?s a script to reproduce this on head. It will create a swap backed memory disk and a $0.key file. If unsure run the steps by hand to avoid the script accidentally go wild. It's quickly hacked together so it's not nice but did the job on a 8-current here. It does not yet create the journal or newfs anything as I tried the minimum to reproduce this. Leaving out the fdisk and the s=${md}s3 it will work fine. So I guess it a problem of stacking things. And before you are going to ask - changing slice 3 to 165 (freebsd) does not change anything. Any ideas? ----- 8< 8< 8<---------------------------------------------------------- #!/bin/sh case `id -u` in 0) ;; *) echo "Run as super user" >&2 exit 1 ;; esac md=`mdconfig -a -t swap -s 32901120 -x 63 -y 64` case "${md}" in md*) echo "Created swapped backed memory disk ${md}" ;; *) echo "ERROR creating memory disk">&2 exit 1 ;; esac echo ?creating initial set? echo " ..fdisk" # cannot get this to work fdisk -q -i -f - /dev/${md} < References: <20090325214318.Q67075@maildrop.int.zabbadoz.net> Message-ID: On Mar 25, 2009, at 2:57 PM, Bjoern A. Zeeb wrote: > Here?s a script to reproduce this on head. First of all: exemplary problem reporting! > Any ideas? The probe method of the GPT scheme explicitly disallows nesting. This is inconsistent with the create method, which happily allows creating a GPT underneath a MBR. The bug is in the create method: GPT cannot be created inside a MBR slice (or any other partioning for that matter). I'll fix that shortly. FYI, -- Marcel Moolenaar xcllnt@mac.com From bzeeb-lists at lists.zabbadoz.net Wed Mar 25 23:30:07 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Wed Mar 25 23:30:14 2009 Subject: gpart on top of eli inside a slice is not working In-Reply-To: References: <20090325214318.Q67075@maildrop.int.zabbadoz.net> Message-ID: <20090326062604.X67075@maildrop.int.zabbadoz.net> On Wed, 25 Mar 2009, Marcel Moolenaar wrote: > > On Mar 25, 2009, at 2:57 PM, Bjoern A. Zeeb wrote: > >> Here?s a script to reproduce this on head. > > First of all: exemplary problem reporting! > >> Any ideas? > > The probe method of the GPT scheme explicitly disallows nesting. > This is inconsistent with the create method, which happily allows > creating a GPT underneath a MBR. > > The bug is in the create method: GPT cannot be created inside a > MBR slice (or any other partioning for that matter). I'll fix > that shortly. Well technically it is created inside some random garbage from eli and not directly inside the MBR slice. So the only possible solutions for those would be: 1) Somehow convert the entire disk to part and then exposing the 3 freebsd-* partitions and have a dedicated eli inside each. 2) try (and stick with) bsdlabel on top of the eli inside the mbr slice? So can you explain why there is the restriction that part cannot be used inside a MBR slice or rather somewhere on top of such? /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From rizzo at iet.unipi.it Thu Mar 26 03:55:52 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 26 03:55:58 2009 Subject: geom debugging tools ? Message-ID: <20090326110048.GA48516@onelab2.iet.unipi.it> do we have a tool that can list all active geoms, providers and consumers ? "geom list" does part of the job, but I don't know how to get the list of available classes. The following trick ls /lib/geom | sed 's/geom_//;s/\.so//' | xargs -n 1 -J % geom % list can only give a partial list of names. From rizzo at iet.unipi.it Thu Mar 26 04:30:03 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Thu Mar 26 04:30:16 2009 Subject: geom debugging tools ? In-Reply-To: <8425.1238066250@critter.freebsd.dk> References: <20090326110048.GA48516@onelab2.iet.unipi.it> <8425.1238066250@critter.freebsd.dk> Message-ID: <20090326113459.GA50106@onelab2.iet.unipi.it> On Thu, Mar 26, 2009 at 11:17:30AM +0000, Poul-Henning Kamp wrote: > In message <20090326110048.GA48516@onelab2.iet.unipi.it>, Luigi Rizzo writes: > >do we have a tool that can list all active geoms, providers > >and consumers ? > > > >"geom list" does part of the job, but I don't > >know how to get the list of available classes. > > sysctl -b kern.geom.confxml wonderful, thanks From phk at phk.freebsd.dk Thu Mar 26 04:34:28 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu Mar 26 04:34:34 2009 Subject: geom debugging tools ? In-Reply-To: Your message of "Thu, 26 Mar 2009 12:00:48 +0100." <20090326110048.GA48516@onelab2.iet.unipi.it> Message-ID: <8425.1238066250@critter.freebsd.dk> In message <20090326110048.GA48516@onelab2.iet.unipi.it>, Luigi Rizzo writes: >do we have a tool that can list all active geoms, providers >and consumers ? > >"geom list" does part of the job, but I don't >know how to get the list of available classes. sysctl -b kern.geom.confxml -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From alex.wilkinson at dsto.defence.gov.au Thu Mar 26 05:48:54 2009 From: alex.wilkinson at dsto.defence.gov.au (Wilkinson, Alex) Date: Thu Mar 26 05:49:00 2009 Subject: geom debugging tools ? In-Reply-To: <8425.1238066250@critter.freebsd.dk> References: <20090326110048.GA48516@onelab2.iet.unipi.it> <8425.1238066250@critter.freebsd.dk> Message-ID: <20090326112222.GE10080@stlux503.dsto.defence.gov.au> 0n Thu, Mar 26, 2009 at 11:17:30AM +0000, Poul-Henning Kamp wrote: >sysctl -b kern.geom.confxml Curios, why doesn't 'sysctl -a' display "kern.geom.confxml" ? e.g. #sysctl -a | grep -i kern.geom kern.geom.collectstats: 1 kern.geom.debugflags: 0 kern.geom.label.debug: 0 # -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. From marius at nuenneri.ch Thu Mar 26 06:22:42 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Thu Mar 26 06:23:05 2009 Subject: geom debugging tools ? In-Reply-To: <20090326112222.GE10080@stlux503.dsto.defence.gov.au> References: <20090326110048.GA48516@onelab2.iet.unipi.it> <8425.1238066250@critter.freebsd.dk> <20090326112222.GE10080@stlux503.dsto.defence.gov.au> Message-ID: On Thu, Mar 26, 2009 at 12:22, Wilkinson, Alex wrote: > > ? ?0n Thu, Mar 26, 2009 at 11:17:30AM +0000, Poul-Henning Kamp wrote: > > ? ?>sysctl -b kern.geom.confxml > > Curios, why doesn't 'sysctl -a' display "kern.geom.confxml" ? e.g. > > ? #sysctl -a | grep -i kern.geom > ? kern.geom.collectstats: 1 > ? kern.geom.debugflags: 0 > ? kern.geom.label.debug: 0 > ? # > Because sysctl's with binary data are not shown with the -a parameter. From xcllnt at mac.com Thu Mar 26 09:06:05 2009 From: xcllnt at mac.com (Marcel Moolenaar) Date: Thu Mar 26 09:06:11 2009 Subject: gpart on top of eli inside a slice is not working In-Reply-To: <20090326062604.X67075@maildrop.int.zabbadoz.net> References: <20090325214318.Q67075@maildrop.int.zabbadoz.net> <20090326062604.X67075@maildrop.int.zabbadoz.net> Message-ID: <1FA0EF30-7FCC-4384-8151-36843EFBE01D@mac.com> On Mar 25, 2009, at 11:29 PM, Bjoern A. Zeeb wrote: > On Wed, 25 Mar 2009, Marcel Moolenaar wrote: > >> >> On Mar 25, 2009, at 2:57 PM, Bjoern A. Zeeb wrote: >> >>> Here?s a script to reproduce this on head. >> >> First of all: exemplary problem reporting! >> >>> Any ideas? >> >> The probe method of the GPT scheme explicitly disallows nesting. >> This is inconsistent with the create method, which happily allows >> creating a GPT underneath a MBR. >> >> The bug is in the create method: GPT cannot be created inside a >> MBR slice (or any other partioning for that matter). I'll fix >> that shortly. > > Well technically it is created inside some random garbage from eli and > not directly inside the MBR slice. When I refer to nesting, I mean the on-disk layout. It's almost meaningless to talk in terms of GEOM nesting, because you can't assume anything. Thus: the fact that geli is in between the two gpart instances is irrelevant. > So the only possible solutions for those would be: > 1) Somehow convert the entire disk to part and then exposing the 3 > freebsd-* partitions and have a dedicated eli inside each. I don't understand what you're trying to say. Can you elaborate? > 2) try (and stick with) bsdlabel on top of the eli inside the mbr > slice? A different scheme, one that is allowed to be nested (again, from an on-disk layout PoV), is the right thing to do. > So can you explain why there is the restriction that part cannot be > used inside a MBR slice or rather somewhere on top of such? There's no such restriction in gpart. If there was, then gpart would not be able to implement the BSD scheme or the EBR scheme. FYI, -- Marcel Moolenaar xcllnt@mac.com From phk at phk.freebsd.dk Thu Mar 26 15:36:43 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu Mar 26 15:36:50 2009 Subject: geom debugging tools ? In-Reply-To: Your message of "Thu, 26 Mar 2009 20:22:22 +0900." <20090326112222.GE10080@stlux503.dsto.defence.gov.au> Message-ID: <1094.1238102193@critter.freebsd.dk> In message <20090326112222.GE10080@stlux503.dsto.defence.gov.au>, "Wilkinson, A lex" writes: > > 0n Thu, Mar 26, 2009 at 11:17:30AM +0000, Poul-Henning Kamp wrote: > > >sysctl -b kern.geom.confxml > >Curios, why doesn't 'sysctl -a' display "kern.geom.confxml" ? e.g. Because we don't want to spam the sysctl -a output with so much output. There are many other sysctls which also are not shown during sysctl -a because they return binary structures. The '-b' flag means "I know it may be binary" and we (ab)use that to supress the XML output from sysctl -a. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From bzeeb-lists at lists.zabbadoz.net Fri Mar 27 05:30:08 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Fri Mar 27 05:30:15 2009 Subject: gpart on top of eli inside a slice is not working In-Reply-To: <1FA0EF30-7FCC-4384-8151-36843EFBE01D@mac.com> References: <20090325214318.Q67075@maildrop.int.zabbadoz.net> <20090326062604.X67075@maildrop.int.zabbadoz.net> <1FA0EF30-7FCC-4384-8151-36843EFBE01D@mac.com> Message-ID: <20090327092226.K67075@maildrop.int.zabbadoz.net> On Thu, 26 Mar 2009, Marcel Moolenaar wrote: Hi, >>> The bug is in the create method: GPT cannot be created inside a >>> MBR slice (or any other partioning for that matter). I'll fix >>> that shortly. >> >> Well technically it is created inside some random garbage from eli and >> not directly inside the MBR slice. > > When I refer to nesting, I mean the on-disk layout. It's almost > meaningless to talk in terms of GEOM nesting, because you can't > assume anything. > > Thus: the fact that geli is in between the two gpart instances is > irrelevant. ok. >> So the only possible solutions for those would be: >> 1) Somehow convert the entire disk to part and then exposing the 3 >> freebsd-* partitions and have a dedicated eli inside each. > > I don't understand what you're trying to say. Can you > elaborate? Well if you convert the entire thing to GPT (not part; still wrongly using it as a synonym 'cause of the old gpt(8) name and being confuse;-) ) you'd have md0p1 Compaq Recovery (however this would work) md0p2 NTFS (however this would work) md0p3 freebsd-ufs md0p4 freebsd-swap md0p5 freebsd-ufs and then you would do md0p3.eli md0p4.eli md0p5.eli In this case the "freebsd-*" is publicly exposed in GPT. But in contrast, with the fdisk version, where slice 3 is "DOS" md0s1 "Compaq Recovery" md0s2 "NTFS" md0s3 "DOS" md0s3.eli random garbage md0s3.elia equivalent to md0p3 md0s3.elib equivalent to md0p4 md0s3.elid equivalent to md0p5 the freebsd parts are not publicly visible. >> 2) try (and stick with) bsdlabel on top of the eli inside the mbr >> slice? > > A different scheme, one that is allowed to be nested (again, from > an on-disk layout PoV), is the right thing to do. I went with gpart + BSD for now. >> So can you explain why there is the restriction that part cannot be >> used inside a MBR slice or rather somewhere on top of such? > > There's no such restriction in gpart. If there was, then gpart > would not be able to implement the BSD scheme or the EBR scheme. So again here, s,part,GPT, ;-) What is the EBR scheme? Are the schemes somewhere documented in more detail? Well they are someway but perhaps gpart(8)[?] could talk a bit more what a "scheme" is and what the affiliation to the geom classes and options. So I have to say I much more liked gpart to create the traditional BSD disklabels than the old bsdlabel. Things can be scripted more eassily etc. Two things would significantly improve usability though are 1 ability to give -s size in human readable way instead of having to do all the math. 2 be able to give -b start to just say "start at next free offset" w/o looking it up or doing the math. Example: gpart add -b next -s 64G -t freebsd-ufs da3 /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From rizzo at iet.unipi.it Fri Mar 27 05:56:11 2009 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Mar 27 05:56:18 2009 Subject: usage and format of kern.geom.conf* sysctl variables In-Reply-To: <20090326110048.GA48516@onelab2.iet.unipi.it> References: <20090326110048.GA48516@onelab2.iet.unipi.it> Message-ID: <20090327130108.GA96723@onelab2.iet.unipi.it> I have a few questions on the following sysctl variables, implemented in sys/geom/geom_kern.c kern.geom.conftxt kern.geom.confdot kern.geom.confxml QUESTION #1 All the variables return a trailing NUL when printed, because their handler is error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1); I wonder if the trailing NUL is intentional in all cases, and if so, did you add it to hide the output in the default output, or because the string may contain non-low-ascii character ? (I am asking because on the console the trailing NUL is printed as an extra space; xterm output is fine). QUESTION #2 Is it reasonable to put in the variables also information accumulated at runtime (eg. usage stats and so on), or we should stick to pure "configuration" information ? QUESTION #3 (content) Should we limit the content of these variables to 'configuration' info (e.g. name, topology, media sizes) or is it reasonable to have fields for stats and other info accumulated at runtime ? QUESTION #4 (conftxt record separator) It seems that the format is one line per provider, so e.g. if a provider has to print a lot of info (e.g. an array of numbers) it still has to put everything on one line, right ? QUESTION #4 (conftxt format) Any reason why conftxt is limited to DISK and MD classes ? Also, for each provider, conftxt does not print the name of the geom the provider is attached to, but only its class; this doesn't let me figure out the linkage, e.g. in the case below how do I know that ntfs/WINXP is on ad0s1 and not on, say, another disk with the same mediasize ? ... 0 DISK ad0 160041885696 512 hd 16 sc 63 1 SCHED ad0.sched. 160041885696 512 2 MBR ad0.sched.s3 113993842688 512 i 2 o 46046117888 ty 15 3 MBREXT ad0.sched.s5 113992794112 512 i 0 o 1048576 ty 7 2 MBR ad0.sched.s2 3093299200 512 i 1 o 42952818688 ty 27 2 MBR ad0.sched.s1 42952379904 512 i 0 o 32256 ty 7 1 MBR ad0s3 113993842688 512 i 2 o 46046117888 ty 15 2 MBREXT ad0s5 113992794112 512 i 0 o 1048576 ty 7 3 LABEL ntfs/DATA 113992794112 512 i 0 o 0 1 MBR ad0s2 3093299200 512 i 1 o 42952818688 ty 27 2 LABEL ntfs/RECOVERY 3093299200 512 i 0 o 0 1 MBR ad0s1 42952379904 512 i 0 o 32256 ty 7 2 LABEL ntfs/WINXP 42952379904 512 i 0 o 0 ... cheers luigi From phk at phk.freebsd.dk Fri Mar 27 06:31:54 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri Mar 27 06:32:04 2009 Subject: usage and format of kern.geom.conf* sysctl variables In-Reply-To: Your message of "Fri, 27 Mar 2009 14:01:08 +0100." <20090327130108.GA96723@onelab2.iet.unipi.it> Message-ID: <4844.1238160711@critter.freebsd.dk> In message <20090327130108.GA96723@onelab2.iet.unipi.it>, Luigi Rizzo writes: >QUESTION #1 > All the variables return a trailing NUL when printed, > because their handler is > > error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1); > > I wonder if the trailing NUL is intentional in all cases. Yes. >QUESTION #2 > Is it reasonable to put in the variables also information > accumulated at runtime (eg. usage stats and so on), or > we should stick to pure "configuration" information ? No. Statistics are collected via the shared-memory interface which gstat(8) uses. This is much more efficient since there is no syscall overhead to update the values. >QUESTION #3 (content) > Should we limit the content of these variables to > 'configuration' info (e.g. name, topology, media sizes) or is > it reasonable to have fields for stats and other info accumulated > at runtime ? The intention is that confxml is definitive with respect to relevant configuration information. That is not the same as to say that _everything_ should be included in it. >QUESTION #4 (conftxt record separator) > It seems that the format is one line per provider, so e.g. > if a provider has to print a lot of info (e.g. an array of > numbers) it still has to put everything on one line, right ? contxt is specifically and *only* for the use of sysinstall. This use should be discontinued as soon as possible. >QUESTION #4 (conftxt format) > Any reason why conftxt is limited to DISK and MD classes ? See above. I wrote a couple of "blue print" articles about this stuff for daemonnews many years ago, I hope they still exist somewhere on the net, because they are still relevant. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From xcllnt at mac.com Fri Mar 27 09:44:49 2009 From: xcllnt at mac.com (Marcel Moolenaar) Date: Fri Mar 27 09:45:05 2009 Subject: gpart on top of eli inside a slice is not working In-Reply-To: <20090327092226.K67075@maildrop.int.zabbadoz.net> References: <20090325214318.Q67075@maildrop.int.zabbadoz.net> <20090326062604.X67075@maildrop.int.zabbadoz.net> <1FA0EF30-7FCC-4384-8151-36843EFBE01D@mac.com> <20090327092226.K67075@maildrop.int.zabbadoz.net> Message-ID: <5BDA79FB-5678-4FF2-9BD1-D5915DDFC3C3@mac.com> On Mar 27, 2009, at 5:26 AM, Bjoern A. Zeeb wrote: >>> So the only possible solutions for those would be: >>> 1) Somehow convert the entire disk to part and then exposing the 3 >>> freebsd-* partitions and have a dedicated eli inside each. >> >> I don't understand what you're trying to say. Can you >> elaborate? > > Well if you convert the entire thing to GPT (not part; still wrongly > using it as a synonym 'cause of the old gpt(8) name and being > confuse;-) ) you'd have > > md0p1 Compaq Recovery (however this would work) > md0p2 NTFS (however this would work) > md0p3 freebsd-ufs > md0p4 freebsd-swap > md0p5 freebsd-ufs > > and then you would do > md0p3.eli > md0p4.eli > md0p5.eli > > In this case the "freebsd-*" is publicly exposed in GPT. I see. Yes, of course. This shows the downside of having a flat partitioning. While it does the job, it may not be the most aesthetically pleasing in some cases... > >>> 2) try (and stick with) bsdlabel on top of the eli inside the mbr >>> slice? >> >> A different scheme, one that is allowed to be nested (again, from >> an on-disk layout PoV), is the right thing to do. > > I went with gpart + BSD for now. Sounds good. With gpart you can create BSD disklabels with up to 20 partitions, so it can still be used when you want to carve up in more than 7 (usable) partitions. >>> So can you explain why there is the restriction that part cannot be >>> used inside a MBR slice or rather somewhere on top of such? >> >> There's no such restriction in gpart. If there was, then gpart >> would not be able to implement the BSD scheme or the EBR scheme. > > So again here, s,part,GPT, ;-) Ah :-) With at least 128 partition entries the need for nesting was eliminated. It's explicitly allowed to sub-partition a GPT partition with a MBR, but this was not so much done for the sake of nesting I think, but rather virtualization. Also: GPT was designed as part of EFI. You don't want to add unnecessary complications to firmware and nested GPTs surely do that. > What is the EBR scheme? The EBR scheme is used to create logical partitions: http://en.wikipedia.org/wiki/Extended_boot_record > Are the > schemes somewhere documented in more detail? They're as well documented as they were before :-) In other words no. > Well they are someway but perhaps gpart(8)[?] could talk a bit more > what a "scheme" is and what the affiliation to the geom classes > and options. I think the visibility has increased from before. Previously partitioning was thought of in terms of utilities. There was a 1-to-1 mapping between scheme and tool. Now there's a single tool and users need to select a scheme when they create a partitioning. It definitely makes sense to elaborate more in the manpage for gpart(8), or even create gpart(9) pages. I'll keep it in mind. > Two things would significantly improve usability though are > 1 ability to give -s size in human readable way instead of having to > do all the math. > 2 be able to give -b start to just say "start at next free offset" w/o > looking it up or doing the math. Yes on both accounts. We'll get that fleshed out and implemented in time. It's just "syntactic sugaring"; UI padding... Thanks for the feedback. -- Marcel Moolenaar xcllnt@mac.com From bugmaster at FreeBSD.org Mon Mar 30 04:06:53 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Mar 30 04:07:52 2009 Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org Message-ID: <200903301106.n2UB6qNm054732@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/132845 geom [geom] [patch] ggated(8) does not close files opened a o kern/132273 geom glabel(8): [patch] failing on journaled partition o kern/132242 geom [gmirror] gmirror.ko fails to fully initialize o kern/131353 geom [geom] gjournal(8) kernel lock o kern/131037 geom [geli] Unable to create disklabel on .eli-Device o kern/130528 geom gjournal fsck during boot o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid o bin/128398 geom [patch] glabel(8): teach geom_label to recognise gpt l f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/126902 geom [geom] geom_label: kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/124130 geom [gmirror] [usb] gmirror fails to start usb devices tha o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123630 geom [patch] [gmirror] gmirror doesnt allow the original dr o kern/123122 geom [geom] GEOM / gjournal kernel lock f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120231 geom [geom] GEOM_CONCAT error adding second drive o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis o kern/120021 geom [geom] [panic] net-p2p/qbittorrent crashes system when o kern/119743 geom [geom] geom label for cds is keeped after dismount and f kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error s kern/89102 geom [geom] [panic] panic when forced unmount FS from unplu o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 48 problems total.