From jamie at gritton.org Sat Jun 7 06:22:37 2008 From: jamie at gritton.org (James Gritton) Date: Sat Jun 7 06:22:41 2008 Subject: R_xxx counterparts to the V_xxx macros Message-ID: <484A23EE.8080308@gritton.org> There are places where a variable has been replaced with a V_ macro, only to be set explicitly to the "virtual" data from thread0 or the like. For example, I know of a few places where V_hostname is set like this. It would make sense to have an R_hostname as well, an easy shortcut the the real hostname instead of the virtual one. You'd need either a static "vprocg0" structure, or a pointer somewhere to the main entry (could be thread0 again, I suppose). Likewise with the other structures where other globals may live. Perhaps many (most?) variables will only ever be referred to in their fully virtual state. But for those where the intention is to use the machine's "true" parameter, it would be more clear if that was made explicit in the macro. - Jamie From julian at elischer.org Sat Jun 7 16:51:29 2008 From: julian at elischer.org (Julian Elischer) Date: Sat Jun 7 16:51:33 2008 Subject: R_xxx counterparts to the V_xxx macros In-Reply-To: <484A23EE.8080308@gritton.org> References: <484A23EE.8080308@gritton.org> Message-ID: <484ABC90.9070301@elischer.org> James Gritton wrote: > There are places where a variable has been replaced with a V_ macro, > only to be set explicitly to the "virtual" data from thread0 or the > like. For example, I know of a few places where V_hostname is set like > this. > > It would make sense to have an R_hostname as well, an easy shortcut the > the real hostname instead of the virtual one. You'd need either a > static "vprocg0" structure, or a pointer somewhere to the main entry > (could be thread0 again, I suppose). Likewise with the other structures > where other globals may live. > > Perhaps many (most?) variables will only ever be referred to in their > fully virtual state. But for those where the intention is to use the > machine's "true" parameter, it would be more clear if that was made > explicit in the macro. It's an interesting idea, however what is the "real" hostname? what makes one vimage the 'real' one? theoretically you can get it via: INIT_VNET_INET(thread0.td_vnet) as you noted.. a macro that does that could be done I guess.. #define R_hostname ... I wouldn't do it for all variables however.. let us keep it in mind and when we have vimage in we'll see how useful it is. > > - Jamie > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From jamie at gritton.org Sat Jun 7 17:04:00 2008 From: jamie at gritton.org (James Gritton) Date: Sat Jun 7 17:04:05 2008 Subject: R_xxx counterparts to the V_xxx macros In-Reply-To: <484ABC90.9070301@elischer.org> References: <484A23EE.8080308@gritton.org> <484ABC90.9070301@elischer.org> Message-ID: <484ABF7E.3050601@gritton.org> Julian Elischer wrote: > It's an interesting idea, however what is the "real" hostname? > what makes one vimage the 'real' one? You could ask this if all images were indeed equal. However, they're hierarchical, and there's one that is there when the box starts up, and can't be gotten rid of. If there weren't a "real" hostname, there would never be a place in the code where you don't have a better choice than using thread0's credential. > I wouldn't do it for all variables however.. Quite true. I suspect that most variables replaced by vnet always have a context that they naturally belong to, and would never need to know about a "real" value. I think it's the few that are outside of vnet that would most show this. My focus right now is jail integration, so these are the parts that I notice, as vnet wouldn't be integrated with jails so much as placed on top of them (as they are now placed on top of vimage which is almost but not quite entirely vnet). - Jamie From julian at elischer.org Sat Jun 7 17:26:18 2008 From: julian at elischer.org (Julian Elischer) Date: Sat Jun 7 17:26:24 2008 Subject: R_xxx counterparts to the V_xxx macros In-Reply-To: <484ABF7E.3050601@gritton.org> References: <484A23EE.8080308@gritton.org> <484ABC90.9070301@elischer.org> <484ABF7E.3050601@gritton.org> Message-ID: <484AC4B9.2060207@elischer.org> James Gritton wrote: > Julian Elischer wrote: >> It's an interesting idea, however what is the "real" hostname? >> what makes one vimage the 'real' one? > > You could ask this if all images were indeed equal. However, they're > hierarchical, and there's one that is there when the box starts up, and > can't be gotten rid of. If there weren't a "real" hostname, there would > never be a place in the code where you don't have a better choice than > using thread0's credential. > >> I wouldn't do it for all variables however.. > > Quite true. I suspect that most variables replaced by vnet always have > a context that they naturally belong to, and would never need to know > about a "real" value. I think it's the few that are outside of vnet > that would most show this. > > My focus right now is jail integration, so these are the parts that I > notice, as vnet wouldn't be integrated with jails so much as placed on > top of them (as they are now placed on top of vimage which is almost but > not quite entirely vnet). excellent. Be aware that the patch ( http://www.freebsd.org/~julian/vimage.diff ) (updated today) may not be EXACTLY what is committed as we are re-doing it in order to abide by the plan worked out at the Ottawa devsummit. This plan wants us to break up the patch into parts so it may result in a very slightly different outcome in a few places.. I do believe we do want to integrate the jails and vimage a bit more.. let us know what you are thinking... > > - Jamie > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From julian at elischer.org Mon Jun 9 05:58:42 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 05:58:50 2008 Subject: kinda headsup.. Message-ID: <484CC690.9020303@elischer.org> At the BSDCAn devsummit we discussed how to proceed with committing Vimage to -current. the Milestones included something like: June 8 (today) Headsup.... June 15 commit changes that add macros for vnet (network module) and vinet(inet virtualisation) with macros defined in such a way to make 0 actual differences. provable by md5 etc. Documentat s/hostname/g//V_hostname/ #define V_hostname hostname 2 weeks settle time, next step prepared, tested and reviewed. June 29 Add changes to convert all globals to members of per-module structures. Done in a reversible way (i.e. compilable out). Macros defined so that depending on compile options structures or globals are used (one global structure). Performance implications of using structures are evaluated. Structures possibly tuned. Initialisation routines added, checked and tuned. example: #if VIMAGE_USE_STRUCTS #define V_hostname sys_globals.hostname ... #else #define V_hostname hostname ... #endif July 13 globals removed in vnet, vinet. ifdefs and compile option removed or scaled back to make code clean to read again. Destructor routines added where needed. Remaining "NULL Macros" (compile to nothing at this point) committed to reduce the size of the MEAT diffs. Review of Meat diffs formally under way for final comment. example: #define INIT_VNET_INET(x) /* nothing */ add "INIT_VNET_INET(curvnet);"(and similar) where needed. remove globals (e.g. 'hostname') July 21 JAIL+Vimage framework committed. e.g. add new syscall, program, etc. (part one of meat diffs) structures still only global instances. vimage inhansed jails can be created but act jus tlike normal jails? July 28 Ability to created > 1 vimage enabled. Vimage enhanced jails now have private network stacks etc. August start on converting more modules as needed and time allows. Marko and I have been working towards splitting up the current diffs (which do the whole thing) so allow this schedule to be followed. We may or may not be ready for the June 15 step by then, but if not it may be a week there-after. So this should be considered the heads-up. discussion will be on freebsd-virtualization@ and the perforce branch that we have as a current working system is branch 'vimage'. //depot/projects/vimage/... diffs can be found at: http://www.freebsd.org/~julian/vimage.diff and it are usually fairly up to date. From kris at FreeBSD.org Mon Jun 9 09:57:03 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jun 9 09:57:08 2008 Subject: kinda headsup.. In-Reply-To: <484CC690.9020303@elischer.org> References: <484CC690.9020303@elischer.org> Message-ID: <484CFE6E.7040305@FreeBSD.org> Julian Elischer wrote: > At the BSDCAn devsummit we discussed how to proceed with committing > Vimage to -current. > > the Milestones included something like: > > June 8 (today) Headsup.... > > June 15 commit changes that add macros for vnet > (network module) and vinet(inet virtualisation) > with macros defined in such a way to make 0 actual > differences. provable by md5 etc. > Documentat > s/hostname/g//V_hostname/ > #define V_hostname hostname > 2 weeks settle time, next step prepared, tested > and reviewed. ... > diffs can be found at: > http://www.freebsd.org/~julian/vimage.diff and it are usually > fairly up to date. Did Marko fix the panic I saw back in May? I wasn't even able to boot a vimage kernel yet, let alone begin testing it :) Kris From jamie at gritton.org Mon Jun 9 17:07:20 2008 From: jamie at gritton.org (James Gritton) Date: Mon Jun 9 17:07:24 2008 Subject: jail_set Message-ID: <484D6342.1080901@gritton.org> I've gotten the first stage working of the extensible name-based jail settings framework, with a patch available at http://gritton.org/jail_set.diff This is based around a new jail_set() system call, much like nmount() - in fact it even uses the same vfs options calls. It allows for modules (the existing "prison services" hooks that zfs use) to have be controlled via this interface, both to enable or disable the entire module, or to have their own module-specific parameters. The old jail() system call still exists and is compatible with this setup - it just becomes a stub to jail_set with the "path", "hostname", and "ip_number" parameters. There's also a sysctl tree security.jail.jid, that shows all parameters for current jails, once again with hooks for per-module parameters. The expectation is that vimage's vnet and vinet will become prison services under this framework, and the other more minor vimage bits will be rolled in as well. This would fit in with the goals of the 21 Jul deadline in Julian's recently posted schedule. Work still to do: Allow for hierarchical jails (which vimage needs). Actually integrate this with vimage. Integrate with other subsystems, more for proof of concept than anything else. SYSV IPC perhaps, since I've already done similar work on them. Or replace the one-off "pr_linux" hook on the prison structure with the standard services hook. Perhaps add a jail_get() system call, to read jail parameters. Currently, they can be read via sysctl, but that might not be the best way around this. If there's both a jail_set and jail_get, there may be no need for the extra effort of the sysctl tree. - Jamie From julian at elischer.org Mon Jun 9 17:41:13 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 17:41:20 2008 Subject: kinda headsup.. In-Reply-To: <484CFE6E.7040305@FreeBSD.org> References: <484CC690.9020303@elischer.org> <484CFE6E.7040305@FreeBSD.org> Message-ID: <484D6B38.3020207@elischer.org> Kris Kennaway wrote: > Julian Elischer wrote: >> At the BSDCAn devsummit we discussed how to proceed with committing >> Vimage to -current. >> >> the Milestones included something like: >> >> June 8 (today) Headsup.... >> >> June 15 commit changes that add macros for vnet >> (network module) and vinet(inet virtualisation) >> with macros defined in such a way to make 0 actual >> differences. provable by md5 etc. >> Documentat >> s/hostname/g//V_hostname/ >> #define V_hostname hostname >> 2 weeks settle time, next step prepared, tested >> and reviewed. > ... > >> diffs can be found at: >> http://www.freebsd.org/~julian/vimage.diff and it are usually >> fairly up to date. > > Did Marko fix the panic I saw back in May? I wasn't even able to boot a > vimage kernel yet, let alone begin testing it :) is this the ipv6 panic? I must admit I have not heard.. he was looking at it back thne and has committed stuff since then.. I use it with IPV4 quite successfully quite often. note that the first commits are pretty much quaranteed to not have that problem.. as they are effective NOPs I'll get back to you on it.. > > Kris From kris at FreeBSD.org Mon Jun 9 18:34:59 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jun 9 18:35:05 2008 Subject: kinda headsup.. In-Reply-To: <484D6B38.3020207@elischer.org> References: <484CC690.9020303@elischer.org> <484CFE6E.7040305@FreeBSD.org> <484D6B38.3020207@elischer.org> Message-ID: <484D77D2.9050102@FreeBSD.org> Julian Elischer wrote: > Kris Kennaway wrote: >> Julian Elischer wrote: >>> At the BSDCAn devsummit we discussed how to proceed with committing >>> Vimage to -current. >>> >>> the Milestones included something like: >>> >>> June 8 (today) Headsup.... >>> >>> June 15 commit changes that add macros for vnet >>> (network module) and vinet(inet virtualisation) >>> with macros defined in such a way to make 0 actual >>> differences. provable by md5 etc. >>> Documentat >>> s/hostname/g//V_hostname/ >>> #define V_hostname hostname >>> 2 weeks settle time, next step prepared, tested >>> and reviewed. >> ... >> >>> diffs can be found at: >>> http://www.freebsd.org/~julian/vimage.diff and it are usually >>> fairly up to date. >> >> Did Marko fix the panic I saw back in May? I wasn't even able to boot >> a vimage kernel yet, let alone begin testing it :) > > is this the ipv6 panic? > I must admit I have not heard.. > he was looking at it back thne and has committed stuff since then.. > > I use it with IPV4 quite successfully quite often. > > note that the first commits are pretty much quaranteed to not have that > problem.. as they are effective NOPs > > > I'll get back to you on it.. Yes, the panic occurs when one runs a vimage kernel on a CVS world. It's presumably a case of incomplete validation of the input from userland. I'd still like someone to validate the initial commits and establish a framework for ongoing testing, because we've seen cases recently where things as simple as structure alignment changes can have >30% performance impact, so if it's not entirely a NOP then there is still potential for trouble. Kris From julian at elischer.org Mon Jun 9 18:53:44 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 18:54:01 2008 Subject: kinda headsup.. In-Reply-To: <484D77D2.9050102@FreeBSD.org> References: <484CC690.9020303@elischer.org> <484CFE6E.7040305@FreeBSD.org> <484D6B38.3020207@elischer.org> <484D77D2.9050102@FreeBSD.org> Message-ID: <484D7C36.2030607@elischer.org> Kris Kennaway wrote: > Julian Elischer wrote: >> Kris Kennaway wrote: >>> Julian Elischer wrote: >>>> At the BSDCAn devsummit we discussed how to proceed with committing >>>> Vimage to -current. >>>> >>>> the Milestones included something like: >>>> >>>> June 8 (today) Headsup.... >>>> >>>> June 15 commit changes that add macros for vnet >>>> (network module) and vinet(inet virtualisation) >>>> with macros defined in such a way to make 0 actual >>>> differences. provable by md5 etc. >>>> Documentat >>>> s/hostname/g//V_hostname/ >>>> #define V_hostname hostname >>>> 2 weeks settle time, next step prepared, tested >>>> and reviewed. >>> ... >>> >>>> diffs can be found at: >>>> http://www.freebsd.org/~julian/vimage.diff and it are usually >>>> fairly up to date. >>> >>> Did Marko fix the panic I saw back in May? I wasn't even able to >>> boot a vimage kernel yet, let alone begin testing it :) >> >> is this the ipv6 panic? >> I must admit I have not heard.. >> he was looking at it back thne and has committed stuff since then.. >> >> I use it with IPV4 quite successfully quite often. >> >> note that the first commits are pretty much quaranteed to not have >> that problem.. as they are effective NOPs >> >> >> I'll get back to you on it.. > > Yes, the panic occurs when one runs a vimage kernel on a CVS world. It's > presumably a case of incomplete validation of the input from userland. > > I'd still like someone to validate the initial commits and establish a > framework for ongoing testing, because we've seen cases recently where > things as simple as structure alignment changes can have >30% we have set asside a step to confirm this.. but initial testing has shown no impact. > performance impact, so if it's not entirely a NOP then there is still > potential for trouble. > > Kris From zec at fer.hr Mon Jun 9 19:03:23 2008 From: zec at fer.hr (Marko Zec) Date: Mon Jun 9 19:10:05 2008 Subject: kinda headsup.. In-Reply-To: <484CFE6E.7040305@FreeBSD.org> References: <484CC690.9020303@elischer.org> <484CFE6E.7040305@FreeBSD.org> Message-ID: <200806092046.02847.zec@fer.hr> On Monday 09 June 2008 11:57:02 Kris Kennaway wrote: > Julian Elischer wrote: > > At the BSDCAn devsummit we discussed how to proceed with committing > > Vimage to -current. > > > > the Milestones included something like: > > > > June 8 (today) Headsup.... > > > > June 15 commit changes that add macros for vnet > > (network module) and vinet(inet virtualisation) > > with macros defined in such a way to make 0 actual > > differences. provable by md5 etc. > > Documentat > > s/hostname/g//V_hostname/ > > #define V_hostname hostname > > 2 weeks settle time, next step prepared, tested > > and reviewed. > > ... > > > diffs can be found at: > > http://www.freebsd.org/~julian/vimage.diff and it are usually > > fairly up to date. > > Did Marko fix the panic I saw back in May? I wasn't even able to > boot a vimage kernel yet, let alone begin testing it :) I just submitted a p4 change that allows for a machine to boot with rpc.lockd enabled, but haven't tested it with any NFS mounts yet. But do you really need to have rpc.lockd running to be able to do any testing? Pls. note that the vimage branch tracking HEAD has other misterious lockups that I haven't been able to track down yet. The user/zec/vimage_7 branch is in much better shape... Marko From kris at FreeBSD.org Mon Jun 9 19:11:04 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jun 9 19:11:06 2008 Subject: kinda headsup.. In-Reply-To: <200806092046.02847.zec@fer.hr> References: <484CC690.9020303@elischer.org> <484CFE6E.7040305@FreeBSD.org> <200806092046.02847.zec@fer.hr> Message-ID: <484D8046.7020403@FreeBSD.org> Marko Zec wrote: > On Monday 09 June 2008 11:57:02 Kris Kennaway wrote: >> Julian Elischer wrote: >>> At the BSDCAn devsummit we discussed how to proceed with committing >>> Vimage to -current. >>> >>> the Milestones included something like: >>> >>> June 8 (today) Headsup.... >>> >>> June 15 commit changes that add macros for vnet >>> (network module) and vinet(inet virtualisation) >>> with macros defined in such a way to make 0 actual >>> differences. provable by md5 etc. >>> Documentat >>> s/hostname/g//V_hostname/ >>> #define V_hostname hostname >>> 2 weeks settle time, next step prepared, tested >>> and reviewed. >> ... >> >>> diffs can be found at: >>> http://www.freebsd.org/~julian/vimage.diff and it are usually >>> fairly up to date. >> Did Marko fix the panic I saw back in May? I wasn't even able to >> boot a vimage kernel yet, let alone begin testing it :) > > I just submitted a p4 change that allows for a machine to boot with > rpc.lockd enabled, but haven't tested it with any NFS mounts yet. But > do you really need to have rpc.lockd running to be able to do any > testing? Probably not in an absolute sense, but it presumably is something that must be fixed anyway :) > Pls. note that the vimage branch tracking HEAD has other misterious > lockups that I haven't been able to track down yet. The > user/zec/vimage_7 branch is in much better shape... OK, I will wait until the HEAD branch is in good shape. Please let me know when I can start testing. Kris From bz at FreeBSD.org Mon Jun 9 19:25:09 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Mon Jun 9 19:25:14 2008 Subject: kinda headsup.. In-Reply-To: <484CC690.9020303@elischer.org> References: <484CC690.9020303@elischer.org> Message-ID: <20080609174826.Q83875@maildrop.int.zabbadoz.net> On Sun, 8 Jun 2008, Julian Elischer wrote: > At the BSDCAn devsummit we discussed how to proceed with committing Vimage to > -current. > > the Milestones included something like: > > June 8 (today) Headsup.... > > June 15 commit changes that add macros for vnet > (network module) and vinet(inet virtualisation) > with macros defined in such a way to make 0 actual > differences. provable by md5 etc. > Documentat > s/hostname/g//V_hostname/ > #define V_hostname hostname > 2 weeks settle time, next step prepared, tested > and reviewed. For which part were you talking about a sed/awk script to use? Can we have a diff for just this part (once it is avail?) [schedule] * I am missing the BIG HEADS UP somewhere for all the people with outstanding work so that they will not re-do any integration multiple times. * I am missing the developers and users documentation in the schedule. > diffs can be found at: > http://www.freebsd.org/~julian/vimage.diff and it are usually > fairly up to date. I am just starting to skip through the patch, not doing a close review atm (not checking functional changes, etc. at all), and even this is hard at the end of the day... Are sys/ddb/db_command.c related in any way to this? sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 tabs. Do we need those changes like sys/kern/kern_switch.c ? sys/kern/kern_sysctl.c has indentation problems in the @@ -1322,7 +1421,17 @@ junk sys/kern/kern_timeout.c has an extra whitespace sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. sys/kern/kern_vimage.c has // comments no-no - " - s,#define NAME,#define\tNAME,g - " - vnet_mod_register,vnet_mod_register_multi,(more)... declarations - " - adds a new suser() call. - " - in vi_symlookup() 2nsd line of for, remove a space - " - thinks like this scare me: /* A brute force check whether there's enough mem for a new vimage */ especially if its freed again instantly - " - near Detach / free per-module state instances remove whitespace - " - vi_free() remove a \t before the break; - " - db_show_vnets should probably check for db_pager_quit sys/kern/kern_xxx.c the printf looks like debugging? sys/kern/sys_socket.c has an unrelated whitespace change sys/kern/uipc_domain.c removes a comment I am entirely sure it can be removed. - " - why do we need to change net_init_domain(?here?) just to cast again? sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , - " - if (how != SHUT_RD) { int error; add \n sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which should either be renamed or removed. sys/modules/Makefile does not look like it belongs there sys/modules/netgraph/Makefile looks really strange, can we fix that? sys/modules/netgraph/pipe/Makefile has an extra space sys/modules/netgraph/wormhole/Makefile has an extra space sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either rename or remove; also has whitespace issues and debugging printfs in there (that should not compile). sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) { ... needs an extra \t, no? doesn't look nice; there are more of those in this file; maybe not yet; not before the #ifdefs go. - " - SYSINT .. if_attachdomain was a wrong ws change - " - junk @@ -1842,6 +1971,24 @@ adds another suser() - " - at the end there are two unrelated/wrong ws changes sys/net/if_ethersubr.c ether_reassign() has whitespace issues - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent - " - gifmodevent() empty line wrongly removed sys/net/if_gif.h #define\tNAME sys/net/if_gre.c is there a reason to rename the local variables? sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach w/ or w/o the #ifdef. Should the outer one go? - " - is there a need to move the loif check up in lo_clone_destroy? - " - junk @@ -190,7 +266,7 @@ use 4 spaces sys/net/if_mib.c SYSCTL_V_INT fix ws sys/net/if_var.h do we need to move if_index? sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change - " - rtable_init() ws wrong - " - is that realted to more MRT changes or why are functions split and shuffled? - " - there were and still are more ws problems around V_rt_tables - " - return 0; ws problem - " - rtable_idetach() ws problem and more and the return sys/net/rtsock.c rnh =\n ... whitespace next line sys/net/vnet.h XXX RCS tag goes here do so - " - struct vnet_net has ws issues with the _ether_ipfw line - " - #define\tNAME I am running out of battery, so I am going to continue with the next ~20%+- in sys/net80211/**, l 6556 tomorrow. General: values in return statements should be enclosed in parentheses. General: function declarations K&R vs. ANSI vs ... General: you are adding 92 lines with XXX, 18 say "locking", 2 say WRONG, 10 say RCS, (other), ... can we get (most of) them fixed before committing? (fixed, not removed) /bz -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From julian at elischer.org Mon Jun 9 19:47:12 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 19:47:16 2008 Subject: vimage include files Message-ID: <484D88AC.2000402@elischer.org> The current vimage code adds a handful of new include files.. e.g. vnet.h for vimage related defines that are related to general networking stuff vinet for vimage related defines that are related to inet. however eventually these defines would move to other files. For example I think every single file that includes vinet.h already includes netinet/in.h so these definitions move into that file. My question however comes with vnet.h 95% of the files that use it also include so possibly they could go there, but a few of them don't. they are: uipc_socket.c (sets a reference counter in the vnet structure) raw_cb.c accesses V_rawcb_list raw_usrreq.c accesses V_rawcb_list tcp_output.c lots of stuff of course but doesn't use if.h tcp_timer.c ditto vnet appears to be needed jsut for the SYSCTL_V_ stuff. (marko?) netipsec/keysock.c no need for if.h so there is no one place where all of the vnet structure is in scope but if.h is the closest. so should we: keep vnet.h? split it up a bit to make it more in scope? Find/make an include file for "networking in general?" is there such a file? As I said if.h seems the closest. From julian at elischer.org Mon Jun 9 20:06:53 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 20:06:57 2008 Subject: kinda headsup.. In-Reply-To: <20080609174826.Q83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> Message-ID: <484D8D5B.9010500@elischer.org> Bjoern A. Zeeb wrote: > On Sun, 8 Jun 2008, Julian Elischer wrote: > >> At the BSDCAn devsummit we discussed how to proceed with committing >> Vimage to -current. >> >> the Milestones included something like: >> >> June 8 (today) Headsup.... >> >> June 15 commit changes that add macros for vnet >> (network module) and vinet(inet virtualisation) >> with macros defined in such a way to make 0 actual >> differences. provable by md5 etc. >> Documentat >> s/hostname/g//V_hostname/ >> #define V_hostname hostname >> 2 weeks settle time, next step prepared, tested >> and reviewed. > > For which part were you talking about a sed/awk script to use? > Can we have a diff for just this part (once it is avail?) We are generating this now (the script..it'll be tcl it looks like :-) I have a "hand made" version of the diff that I'm generating for comparison. It is made by getting the diffs in the "vimage-commit" branch. run the perl script in the base of that branch to get the diff generated for you. What I'm doing is creating a diff that will compile with no changes, and yet includes as much of the full diff as possible. note: that branch doesn't compile yet as I haven't defined all the macros yet, but the ain is that it will, and should produce a "No differences" binary.. > > > [schedule] > > * I am missing the BIG HEADS UP somewhere for all the people with > outstanding work so that they will not re-do any integration multiple > times. > > * I am missing the developers and users documentation in the schedule. > > > >> diffs can be found at: >> http://www.freebsd.org/~julian/vimage.diff and it are usually >> fairly up to date. > > I am just starting to skip through the patch, not doing a close review > atm (not checking functional changes, etc. at all), and even this is > hard at the end of the day... > > Are sys/ddb/db_command.c related in any way to this? > > sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. > > sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 > tabs. > > Do we need those changes like sys/kern/kern_switch.c ? > > sys/kern/kern_sysctl.c has indentation problems in the > @@ -1322,7 +1421,17 @@ junk > > sys/kern/kern_timeout.c has an extra whitespace > > sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. > sys/kern/kern_vimage.c has // comments no-no > - " - s,#define NAME,#define\tNAME,g > - " - vnet_mod_register,vnet_mod_register_multi,(more)... declarations > - " - adds a new suser() call. > - " - in vi_symlookup() 2nsd line of for, remove a space > - " - thinks like this scare me: > /* A brute force check whether there's enough mem for a new vimage */ > especially if its freed again instantly > - " - near Detach / free per-module state instances remove whitespace > - " - vi_free() remove a \t before the break; > - " - db_show_vnets should probably check for db_pager_quit > > sys/kern/kern_xxx.c the printf looks like debugging? > > sys/kern/sys_socket.c has an unrelated whitespace change > > sys/kern/uipc_domain.c removes a comment I am entirely sure it can be > removed. > - " - why do we need to change net_init_domain(?here?) just to cast > again? > > sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , > - " - if (how != SHUT_RD) { int error; add \n > > sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which > should either be renamed or removed. > > sys/modules/Makefile does not look like it belongs there > > sys/modules/netgraph/Makefile looks really strange, can we fix that? > > sys/modules/netgraph/pipe/Makefile has an extra space > > sys/modules/netgraph/wormhole/Makefile has an extra space > > sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either > rename or remove; also has whitespace issues and debugging > printfs in there (that should not compile). > > sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) { > ... needs an extra \t, no? doesn't look nice; there are more > of those in this file; maybe not yet; not before the #ifdefs go. > - " - SYSINT .. if_attachdomain was a wrong ws change > - " - junk @@ -1842,6 +1971,24 @@ adds another suser() > - " - at the end there are two unrelated/wrong ws changes > > sys/net/if_ethersubr.c ether_reassign() has whitespace issues > - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong > > sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent > - " - gifmodevent() empty line wrongly removed > > sys/net/if_gif.h #define\tNAME > > sys/net/if_gre.c is there a reason to rename the local variables? > > sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach > w/ or w/o the #ifdef. Should the outer one go? > - " - is there a need to move the loif check up in lo_clone_destroy? > - " - junk @@ -190,7 +266,7 @@ use 4 spaces > > sys/net/if_mib.c SYSCTL_V_INT fix ws > > sys/net/if_var.h do we need to move if_index? > > sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change > - " - rtable_init() ws wrong > - " - is that realted to more MRT changes or why are functions split > and shuffled? > - " - there were and still are more ws problems around V_rt_tables > - " - return 0; ws problem > - " - rtable_idetach() ws problem and more and the return > > sys/net/rtsock.c rnh =\n ... whitespace next line > > sys/net/vnet.h XXX RCS tag goes here do so > - " - struct vnet_net has ws issues with the _ether_ipfw line > - " - #define\tNAME > > > > I am running out of battery, so I am going to continue with the > next ~20%+- in sys/net80211/**, l 6556 tomorrow. > > > General: values in return statements should be enclosed in parentheses. > > General: function declarations K&R vs. ANSI vs ... > > General: you are adding 92 lines with XXX, 18 say "locking", 2 say > WRONG, 10 say RCS, (other), ... can we get (most of) them fixed before > committing? (fixed, not removed) > > > /bz > From julian at elischer.org Mon Jun 9 20:13:18 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 20:13:21 2008 Subject: kinda headsup.. In-Reply-To: <20080609174826.Q83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> Message-ID: <484D8EDD.3040103@elischer.org> Bjoern A. Zeeb wrote: > On Sun, 8 Jun 2008, Julian Elischer wrote: > >> At the BSDCAn devsummit we discussed how to proceed with committing >> Vimage to -current. >> >> the Milestones included something like: >> >> June 8 (today) Headsup.... >> >> June 15 commit changes that add macros for vnet >> (network module) and vinet(inet virtualisation) >> with macros defined in such a way to make 0 actual >> differences. provable by md5 etc. >> Documentat >> s/hostname/g//V_hostname/ >> #define V_hostname hostname >> 2 weeks settle time, next step prepared, tested >> and reviewed. > > For which part were you talking about a sed/awk script to use? > Can we have a diff for just this part (once it is avail?) > > > [schedule] > > * I am missing the BIG HEADS UP somewhere for all the people with > outstanding work so that they will not re-do any integration multiple > times. > > * I am missing the developers and users documentation in the schedule. > > > >> diffs can be found at: >> http://www.freebsd.org/~julian/vimage.diff and it are usually >> fairly up to date. > > I am just starting to skip through the patch, not doing a close review > atm (not checking functional changes, etc. at all), and even this is > hard at the end of the day... > > Are sys/ddb/db_command.c related in any way to this? > > sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. > > sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 > tabs. > > Do we need those changes like sys/kern/kern_switch.c ? This diff there includes experimental changes to virtualise things like load average, and they will not be part of the commit. so ignore anything that smells like "scheduler" > > sys/kern/kern_sysctl.c has indentation problems in the > @@ -1322,7 +1421,17 @@ junk > > sys/kern/kern_timeout.c has an extra whitespace yes we (I) will be trying to clean that sort of thing.. > > sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. SNV tag? > sys/kern/kern_vimage.c has // comments no-no > - " - s,#define NAME,#define\tNAME,g > - " - vnet_mod_register,vnet_mod_register_multi,(more)... declarations > - " - adds a new suser() call. > - " - in vi_symlookup() 2nsd line of for, remove a space > - " - thinks like this scare me: > /* A brute force check whether there's enough mem for a new vimage */ > especially if its freed again instantly > - " - near Detach / free per-module state instances remove whitespace > - " - vi_free() remove a \t before the break; > - " - db_show_vnets should probably check for db_pager_quit > > sys/kern/kern_xxx.c the printf looks like debugging? > > sys/kern/sys_socket.c has an unrelated whitespace change > > sys/kern/uipc_domain.c removes a comment I am entirely sure it can be > removed. > - " - why do we need to change net_init_domain(?here?) just to cast > again? > > sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , > - " - if (how != SHUT_RD) { int error; add \n > > sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which > should either be renamed or removed. > > sys/modules/Makefile does not look like it belongs there > > sys/modules/netgraph/Makefile looks really strange, can we fix that? > > sys/modules/netgraph/pipe/Makefile has an extra space > > sys/modules/netgraph/wormhole/Makefile has an extra space > > sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either > rename or remove; also has whitespace issues and debugging > printfs in there (that should not compile). > > sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) { > ... needs an extra \t, no? doesn't look nice; there are more > of those in this file; maybe not yet; not before the #ifdefs go. > - " - SYSINT .. if_attachdomain was a wrong ws change > - " - junk @@ -1842,6 +1971,24 @@ adds another suser() > - " - at the end there are two unrelated/wrong ws changes > > sys/net/if_ethersubr.c ether_reassign() has whitespace issues > - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong > > sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent > - " - gifmodevent() empty line wrongly removed > > sys/net/if_gif.h #define\tNAME > > sys/net/if_gre.c is there a reason to rename the local variables? > > sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach > w/ or w/o the #ifdef. Should the outer one go? > - " - is there a need to move the loif check up in lo_clone_destroy? > - " - junk @@ -190,7 +266,7 @@ use 4 spaces > > sys/net/if_mib.c SYSCTL_V_INT fix ws > > sys/net/if_var.h do we need to move if_index? > > sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change > - " - rtable_init() ws wrong > - " - is that realted to more MRT changes or why are functions split > and shuffled? > - " - there were and still are more ws problems around V_rt_tables > - " - return 0; ws problem > - " - rtable_idetach() ws problem and more and the return > > sys/net/rtsock.c rnh =\n ... whitespace next line > > sys/net/vnet.h XXX RCS tag goes here do so > - " - struct vnet_net has ws issues with the _ether_ipfw line > - " - #define\tNAME > > > > I am running out of battery, so I am going to continue with the > next ~20%+- in sys/net80211/**, l 6556 tomorrow. > > > General: values in return statements should be enclosed in parentheses. > > General: function declarations K&R vs. ANSI vs ... > > General: you are adding 92 lines with XXX, 18 say "locking", 2 say > WRONG, 10 say RCS, (other), ... can we get (most of) them fixed before > committing? (fixed, not removed) thanks max Our hope is to generate a set of patches derived from that we have now rather than commit what we have exactly, so we hope we can get your cleanups included as we create those diffs. feel free to use p4 to fix things yourself if you want to.. > > > /bz > From julian at elischer.org Mon Jun 9 20:15:01 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 20:15:06 2008 Subject: kinda headsup.. In-Reply-To: <484D8EDD.3040103@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> Message-ID: <484D8F44.60603@elischer.org> Julian Elischer wrote: > Bjoern A. Zeeb wrote: >> On Sun, 8 Jun 2008, Julian Elischer wrote: >> >>> At the BSDCAn devsummit we discussed how to proceed with committing >>> Vimage to -current. >>> >>> the Milestones included something like: >>> >>> June 8 (today) Headsup.... >>> >>> June 15 commit changes that add macros for vnet >>> (network module) and vinet(inet virtualisation) >>> with macros defined in such a way to make 0 actual >>> differences. provable by md5 etc. >>> Documentat >>> s/hostname/g//V_hostname/ >>> #define V_hostname hostname >>> 2 weeks settle time, next step prepared, tested >>> and reviewed. >> >> For which part were you talking about a sed/awk script to use? >> Can we have a diff for just this part (once it is avail?) >> >> >> [schedule] >> >> * I am missing the BIG HEADS UP somewhere for all the people with >> outstanding work so that they will not re-do any integration multiple >> times. >> >> * I am missing the developers and users documentation in the schedule. >> >> >> >>> diffs can be found at: >>> http://www.freebsd.org/~julian/vimage.diff and it are usually >>> fairly up to date. >> >> I am just starting to skip through the patch, not doing a close review >> atm (not checking functional changes, etc. at all), and even this is >> hard at the end of the day... >> >> Are sys/ddb/db_command.c related in any way to this? >> >> sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. >> >> sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 >> tabs. >> >> Do we need those changes like sys/kern/kern_switch.c ? > > This diff there includes experimental changes to virtualise things > like load average, and they will not be part of the commit. > so ignore anything that smells like "scheduler" > >> >> sys/kern/kern_sysctl.c has indentation problems in the >> @@ -1322,7 +1421,17 @@ junk >> >> sys/kern/kern_timeout.c has an extra whitespace > > yes we (I) will be trying to clean that sort of thing.. > >> >> sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. > > SNV tag? > >> sys/kern/kern_vimage.c has // comments no-no >> - " - s,#define NAME,#define\tNAME,g >> - " - vnet_mod_register,vnet_mod_register_multi,(more)... declarations >> - " - adds a new suser() call. >> - " - in vi_symlookup() 2nsd line of for, remove a space >> - " - thinks like this scare me: >> /* A brute force check whether there's enough mem for a new vimage */ >> especially if its freed again instantly >> - " - near Detach / free per-module state instances remove whitespace >> - " - vi_free() remove a \t before the break; >> - " - db_show_vnets should probably check for db_pager_quit >> >> sys/kern/kern_xxx.c the printf looks like debugging? >> >> sys/kern/sys_socket.c has an unrelated whitespace change >> >> sys/kern/uipc_domain.c removes a comment I am entirely sure it can be >> removed. >> - " - why do we need to change net_init_domain(?here?) just to cast >> again? >> >> sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , >> - " - if (how != SHUT_RD) { int error; add \n >> >> sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which >> should either be renamed or removed. >> >> sys/modules/Makefile does not look like it belongs there >> >> sys/modules/netgraph/Makefile looks really strange, can we fix that? >> >> sys/modules/netgraph/pipe/Makefile has an extra space >> >> sys/modules/netgraph/wormhole/Makefile has an extra space >> >> sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either >> rename or remove; also has whitespace issues and debugging >> printfs in there (that should not compile). >> >> sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) { >> ... needs an extra \t, no? doesn't look nice; there are more >> of those in this file; maybe not yet; not before the #ifdefs go. >> - " - SYSINT .. if_attachdomain was a wrong ws change >> - " - junk @@ -1842,6 +1971,24 @@ adds another suser() >> - " - at the end there are two unrelated/wrong ws changes >> >> sys/net/if_ethersubr.c ether_reassign() has whitespace issues >> - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong >> >> sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent >> - " - gifmodevent() empty line wrongly removed >> >> sys/net/if_gif.h #define\tNAME >> >> sys/net/if_gre.c is there a reason to rename the local variables? >> >> sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach >> w/ or w/o the #ifdef. Should the outer one go? >> - " - is there a need to move the loif check up in lo_clone_destroy? >> - " - junk @@ -190,7 +266,7 @@ use 4 spaces >> >> sys/net/if_mib.c SYSCTL_V_INT fix ws >> >> sys/net/if_var.h do we need to move if_index? >> >> sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change >> - " - rtable_init() ws wrong >> - " - is that realted to more MRT changes or why are functions split >> and shuffled? >> - " - there were and still are more ws problems around V_rt_tables >> - " - return 0; ws problem >> - " - rtable_idetach() ws problem and more and the return >> >> sys/net/rtsock.c rnh =\n ... whitespace next line >> >> sys/net/vnet.h XXX RCS tag goes here do so >> - " - struct vnet_net has ws issues with the _ether_ipfw line >> - " - #define\tNAME >> >> >> >> I am running out of battery, so I am going to continue with the >> next ~20%+- in sys/net80211/**, l 6556 tomorrow. >> >> >> General: values in return statements should be enclosed in parentheses. >> >> General: function declarations K&R vs. ANSI vs ... >> >> General: you are adding 92 lines with XXX, 18 say "locking", 2 say >> WRONG, 10 say RCS, (other), ... can we get (most of) them fixed before >> committing? (fixed, not removed) > > thanks max I'm an idiot.. I got mixed up between two emails.. "thanks Bjoern" I should have said.. > > Our hope is to generate a set of patches derived from that we have now > rather than commit what we have exactly, so we hope we can get your > cleanups included as we create those diffs. > > feel free to use p4 to fix things yourself if you want to.. > > >> >> >> /bz >> > > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From bz at FreeBSD.org Mon Jun 9 20:30:08 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Mon Jun 9 20:30:11 2008 Subject: kinda headsup.. In-Reply-To: <484D8EDD.3040103@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> Message-ID: <20080609202807.V83875@maildrop.int.zabbadoz.net> On Mon, 9 Jun 2008, Julian Elischer wrote: > feel free to use p4 to fix things yourself if you want to.. to which of the --n++ branches? vimage as in //depot/projects/vimage/src/sys/... ? -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From julian at elischer.org Mon Jun 9 20:53:30 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 20:53:35 2008 Subject: kinda headsup.. In-Reply-To: <20080609202807.V83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <20080609202807.V83875@maildrop.int.zabbadoz.net> Message-ID: <484D9849.4030403@elischer.org> Bjoern A. Zeeb wrote: > On Mon, 9 Jun 2008, Julian Elischer wrote: > >> feel free to use p4 to fix things yourself if you want to.. > > to which of the --n++ branches? > > vimage as in //depot/projects/vimage/src/sys/... ? yep > > From jamie at gritton.org Mon Jun 9 21:49:10 2008 From: jamie at gritton.org (James Gritton) Date: Mon Jun 9 21:49:14 2008 Subject: kinda headsup.. In-Reply-To: <484D8EDD.3040103@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> Message-ID: <484DA546.9060005@gritton.org> Could we have a list of what isn't expected to actually commit? So the scheduler stuff is out. Is that all of the struct vcpu? Parts of struct vprocg? I see some scheduling bits in both. Aside from vnet/vinet and the doomed scheduling bits, I see not much besides the hostname, domain name, and morphing symlinks. Are these staying? The hostname is already in jails ,and the domainname makes sense in my new jail framework - the morphing symlinks might be something best left for later. Ideally, for integration purposes, the vnet/vinet would hang off jails that have pretty much the same capability as the vimage structure, and then other bits could be added later. I don't want to worry about trying to integrate features that aren't in the final cut anyway. - Jamie Julian Elischer wrote: > ... > > This diff there includes experimental changes to virtualise things > like load average, and they will not be part of the commit. > so ignore anything that smells like "scheduler" From julian at elischer.org Mon Jun 9 22:15:36 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 22:15:40 2008 Subject: kinda headsup.. In-Reply-To: <484DA546.9060005@gritton.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> Message-ID: <484DAB87.6040706@elischer.org> James Gritton wrote: > Could we have a list of what isn't expected to actually commit? So the > scheduler stuff is out. Is that all of the struct vcpu? Parts of > struct vprocg? I see some scheduling bits in both. > > Aside from vnet/vinet and the doomed scheduling bits, I see not much > besides the hostname, domain name, and morphing symlinks. Are these > staying? The hostname is already in jails ,and the domainname makes > sense in my new jail framework - the morphing symlinks might be > something best left for later. domain name and hostname both stay.. Hostname is tricky because both jail and vimage expect to change it.. though jail only really expects it to be virtualised to the user rather than REALLY VIRTUALISED. The morphing symlinks are an experimental feature. The verio guys have some work in that direction too that they want to work on.... hmmm that's not you is it? Loadavg etc. is not "out for ever" just "not in the first commit set." as they have not been extensively tested, and probably need more work. > > Ideally, for integration purposes, the vnet/vinet would hang off jails > that have pretty much the same capability as the vimage structure, and > then other bits could be added later. I don't want to worry about > trying to integrate features that aren't in the final cut anyway. the aim is that vimage and jail structures would merge. as the for "final cut", the schedule only covers initial commits of the vnet code, but once the framework is in place more functionality would be added. > > - Jamie > > > Julian Elischer wrote: >> ... >> >> This diff there includes experimental changes to virtualise things >> like load average, and they will not be part of the commit. >> so ignore anything that smells like "scheduler" From jamie at gritton.org Mon Jun 9 22:20:53 2008 From: jamie at gritton.org (James Gritton) Date: Mon Jun 9 22:20:56 2008 Subject: kinda headsup.. In-Reply-To: <484DAB87.6040706@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> Message-ID: <484DACBD.50109@gritton.org> Julian Elischer wrote: > James Gritton wrote: >> Could we have a list of what isn't expected to actually commit? So >> the scheduler stuff is out. Is that all of the struct vcpu? Parts >> of struct vprocg? I see some scheduling bits in both. >> >> Aside from vnet/vinet and the doomed scheduling bits, I see not much >> besides the hostname, domain name, and morphing symlinks. Are these >> staying? The hostname is already in jails ,and the domainname makes >> sense in my new jail framework - the morphing symlinks might be >> something best left for later. > > domain name and hostname both stay.. > Hostname is tricky because both jail and vimage expect to change it.. > though jail only really expects it to be virtualised to the user > rather than REALLY VIRTUALISED. I notice there are some differences between the two approaches, and plan to keep the hostname as virtualized as possible. But really, the differences are few and easily merged. > The morphing symlinks are an experimental feature. > The verio guys have some work in that direction too that they want to > work on.... hmmm that's not you is it? Yeah, could be. So while it's a feature I understand and like, I still prefer it remain for later. > Loadavg etc. is not "out for ever" just "not in the first commit set." > as they have not been extensively tested, and probably need more work. > > >> >> Ideally, for integration purposes, the vnet/vinet would hang off >> jails that have pretty much the same capability as the vimage >> structure, and then other bits could be added later. I don't want to >> worry about trying to integrate features that aren't in the final cut >> anyway. > > the aim is that vimage and jail structures would merge. > > as the for "final cut", the schedule only covers initial commits of > the vnet code, but once the framework is in place more functionality > would be added. "Final cut" was a poor choice of words - I too am talking about the first commit that covers the vnet code. - Jamie From 000.fbsd at quip.cz Mon Jun 9 23:03:30 2008 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Mon Jun 9 23:03:34 2008 Subject: kinda headsup.. In-Reply-To: <484DAB87.6040706@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> Message-ID: <484DB357.80103@quip.cz> Julian Elischer wrote: > James Gritton wrote: > >> Could we have a list of what isn't expected to actually commit? So >> the scheduler stuff is out. Is that all of the struct vcpu? Parts of >> struct vprocg? I see some scheduling bits in both. >> >> Aside from vnet/vinet and the doomed scheduling bits, I see not much >> besides the hostname, domain name, and morphing symlinks. Are these >> staying? The hostname is already in jails ,and the domainname makes >> sense in my new jail framework - the morphing symlinks might be >> something best left for later. > > > domain name and hostname both stay.. > Hostname is tricky because both jail and vimage expect to change it.. > though jail only really expects it to be virtualised to the user > rather than REALLY VIRTUALISED. > > The morphing symlinks are an experimental feature. > The verio guys have some work in that direction too that they want to > work on.... hmmm that's not you is it? Are there somebody who can shed some light on "what is planned in Jail / Vimage" for near future? I read many times "we talked about it at the developers summit" or "we will publish them on a wiki", but it is a long time ago and real informations are still kind of secret. I am working on page http://wiki.freebsd.org/Jails and I will be glad to publish as more informations as I can. For example, what is morphing symlinks? Or I know Verio has VPS on FreeBSD with fair-share resource management - are there some plans to have it in the src tree? Are you in contact with them? I think there are many developers working / thinking on some virtualization stuff but diverged and users (potential testers) know almost nothing about this 'work-in-progress'. > Loadavg etc. is not "out for ever" just "not in the first commit set." > as they have not been extensively tested, and probably need more work. What we can expect from loadavg and "scheduler" stuff? >> Ideally, for integration purposes, the vnet/vinet would hang off jails >> that have pretty much the same capability as the vimage structure, and >> then other bits could be added later. I don't want to worry about >> trying to integrate features that aren't in the final cut anyway. > > > the aim is that vimage and jail structures would merge. Are there some good examples of how things "will work" in vimage+jail world? (not from developers view, but from users view) > as the for "final cut", the schedule only covers initial commits of > the vnet code, but once the framework is in place more functionality > would be added. > > >> >> - Jamie >> >> >> Julian Elischer wrote: >> >>> ... >>> >>> This diff there includes experimental changes to virtualise things >>> like load average, and they will not be part of the commit. >>> so ignore anything that smells like "scheduler" Miroslav Lachman From julian at elischer.org Mon Jun 9 23:11:10 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 23:11:14 2008 Subject: Vimage commit In-Reply-To: <484DACBD.50109@gritton.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> <484DACBD.50109@gritton.org> Message-ID: <484DB88C.1020403@elischer.org> I have another branch in p4, called vimage-commit //repos/projects/vimage-commit/src/sys/... that currently is SUPPOSED to create an identical binary to -current. It contains A LARGE PART of the commits. but only everything that compiles away to nothing if defined that way. I always update vimage and vimage-commit together, so that a diff between the two branches produces all the changes in vimage that do NOT just evaluate to nothing.. There is a perl script in the base called makemeat.pl produces such a diff. so to recap: //repos/projects/vimage/... is the tree that currently contains the full vimage diff. It is derived from -current. In the base directory is update.sh that keeps it merged with -current, and makediff.pl that generates a diff from -current. //repos/projects/vimage-commit/... is the tree that currently contains the partial vimage diff that "evaluates to nothing". It too is derived from -current. In the base directory is update.sh that keeps it merged with -current, and makediff.pl that generates a diff from -current. There is also a makemeat.pl that generates a diff between the two branches. This shows all the interesting stuff. From julian at elischer.org Mon Jun 9 23:30:44 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 9 23:30:49 2008 Subject: kinda headsup.. In-Reply-To: <484DB357.80103@quip.cz> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> <484DB357.80103@quip.cz> Message-ID: <484DBD23.6000501@elischer.org> Miroslav Lachman wrote: > Julian Elischer wrote: > >> James Gritton wrote: >> >>> Could we have a list of what isn't expected to actually commit? So >>> the scheduler stuff is out. Is that all of the struct vcpu? Parts >>> of struct vprocg? I see some scheduling bits in both. >>> >>> Aside from vnet/vinet and the doomed scheduling bits, I see not much >>> besides the hostname, domain name, and morphing symlinks. Are these >>> staying? The hostname is already in jails ,and the domainname makes >>> sense in my new jail framework - the morphing symlinks might be >>> something best left for later. >> >> >> domain name and hostname both stay.. >> Hostname is tricky because both jail and vimage expect to change it.. >> though jail only really expects it to be virtualised to the user >> rather than REALLY VIRTUALISED. >> >> The morphing symlinks are an experimental feature. >> The verio guys have some work in that direction too that they want to >> work on.... hmmm that's not you is it? > > Are there somebody who can shed some light on "what is planned in Jail / > Vimage" for near future? I read many times "we talked about it at the > developers summit" or "we will publish them on a wiki", but it is a long > time ago and real informations are still kind of secret. > I am working on page http://wiki.freebsd.org/Jails and I will be glad to > publish as more informations as I can. I am sorry that I didn't realise that page existed earlier, and it reminds me that maybe we should be using the jails mailing list if we plan to make this all merged in with jails.. > For example, what is morphing symlinks? Or I know Verio has VPS on > FreeBSD with fair-share resource management - are there some plans to > have it in the src tree? Are you in contact with them? I will try edit the (2 hour) video of the discussion on the topic and get it up on the web asap.. basically we are trying to pull everyone together onto the basis of a merged vimage/jail implementation. It seems a big job, having read your wiki page! > > I think there are many developers working / thinking on some > virtualization stuff but diverged and users (potential testers) know > almost nothing about this 'work-in-progress'. yes I agree. > >> Loadavg etc. is not "out for ever" just "not in the first commit set." >> as they have not been extensively tested, and probably need more work. > > What we can expect from loadavg and "scheduler" stuff? being able for example to see which jails have which load-averages and thus being able to see who is using the resources. Scheduler partitioning is a bigger problem and I wouldn't care to try do it yet.. The Verio guys have some interesting stuff in this direction which may be useful to us but we need to do some more talking. > >>> Ideally, for integration purposes, the vnet/vinet would hang off >>> jails that have pretty much the same capability as the vimage >>> structure, and then other bits could be added later. I don't want to >>> worry about trying to integrate features that aren't in the final cut >>> anyway. >> >> >> the aim is that vimage and jail structures would merge. > > Are there some good examples of how things "will work" in vimage+jail > world? (not from developers view, but from users view) Not such a document at the moment. but I plan to try write some docs over the next couple of weeks.. basically, it is much like being in a jail except that you ahve your own firewall, routing tables and interfaces. You can run your own ipsec tunnels and whatever. You can also set the tcp sysctls yourself (for example) to turn on or off things like mtu discovery or rando portnumbers.. (etc.etc.) you can also make a jail inside your jail if you want to and let it run in a different virtual machine.. with a different IP space etc. which brings up a point which is that we need to provide a way to limit the damage that a vimage based jail can do by assigning Ip addresses.. i.e. we need to add some constraint based on the current jail code's ip constraints to allow a sub jail to only be allowed to set some addresses... We discussed this at BSDCan but didn't resolve it. > From jamie at gritton.org Tue Jun 10 00:55:31 2008 From: jamie at gritton.org (James Gritton) Date: Tue Jun 10 00:55:36 2008 Subject: kinda headsup.. In-Reply-To: <484DB357.80103@quip.cz> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> <484DB357.80103@quip.cz> Message-ID: <484DD100.6030904@gritton.org> Miroslav Lachman wrote: > Are there somebody who can shed some light on "what is planned in Jail > / Vimage" for near future? I read many times "we talked about it at > the developers summit" or "we will publish them on a wiki", but it is > a long time ago and real informations are still kind of secret. > I am working on page http://wiki.freebsd.org/Jails and I will be glad > to publish as more informations as I can. > For example, what is morphing symlinks? Or I know Verio has VPS on > FreeBSD with fair-share resource management - are there some plans to > have it in the src tree? Are you in contact with them? My own place in things is currently not adding features so much as working on an underlying framework where these different things can be added in an extensible and hopefully backward-compatible way. Yes, this comes from talk at the recent develop summit, where there concern (including my own) that there's already a "jail" system and new and improved virtualization should remain part of that. Vimage is where the momentum is to get something actually committed in the near future, so I'm especially gearing to work with that - also, I want to use what it has to offer. As for the Verio stuff (that's me), last I heard there's still lawyers involved in slowing things down, but after there will be two steps. One is just showing what we have to give, and then the next more important step is integrating it with the setup I'm working on now, instead of our own like-jail-but-not-quite configuration. Vimage will be in place by then, so this is something for later, but to plan for now. - Jamie From bz at FreeBSD.org Tue Jun 10 20:45:08 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Tue Jun 10 20:45:11 2008 Subject: kinda headsup.. In-Reply-To: <484DBD23.6000501@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <484D8EDD.3040103@elischer.org> <484DA546.9060005@gritton.org> <484DAB87.6040706@elischer.org> <484DB357.80103@quip.cz> <484DBD23.6000501@elischer.org> Message-ID: <20080610204001.M83875@maildrop.int.zabbadoz.net> On Mon, 9 Jun 2008, Julian Elischer wrote: Hi, > I am sorry that I didn't realise that page existed earlier, as most people haven't realised that these wiki pages have exists (for years) ... http://wiki.freebsd.org/Image > and it reminds me that maybe we should be using the jails mailing list if we > plan to make this all merged in with jails.. For a plan for the mailing lists see my very first posting to this one: http://lists.freebsd.org/pipermail/freebsd-virtualization/2008-May/000000.html /bz -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From zec at freebsd.org Wed Jun 11 14:56:43 2008 From: zec at freebsd.org (Marko Zec) Date: Wed Jun 11 15:36:31 2008 Subject: kinda headsup.. In-Reply-To: <20080609174826.Q83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> Message-ID: <200806111622.58194.zec@freebsd.org> On Monday 09 June 2008 21:24:42 Bjoern A. Zeeb wrote: > On Sun, 8 Jun 2008, Julian Elischer wrote: > > At the BSDCAn devsummit we discussed how to proceed with committing > > Vimage to -current. > > > > the Milestones included something like: > > > > June 8 (today) Headsup.... > > > > June 15 commit changes that add macros for vnet > > (network module) and vinet(inet virtualisation) > > with macros defined in such a way to make 0 actual > > differences. provable by md5 etc. > > Documentat > > s/hostname/g//V_hostname/ > > #define V_hostname hostname > > 2 weeks settle time, next step prepared, tested > > and reviewed. > > For which part were you talking about a sed/awk script to use? > Can we have a diff for just this part (once it is avail?) There's now a script in p4 projects/vimage/var_rename.tcl - that one is done in TCL, doing it in sed/awk was easier said than done... In projects/vimage/misc/ there's an machine-generated diff plus another small one manually forged that has to be applied afterwards for clean compile. > [schedule] > > * I am missing the BIG HEADS UP somewhere for all the people with > outstanding work so that they will not re-do any integration multiple > times. > > * I am missing the developers and users documentation in the > schedule. Users doc: vimage(8) is not entirely in sync with most recent code but not too misleading either, and there's a reasonably up-to-date cookbook here: http://imunes.net/virtnet/eurobsdcon07_tutorial.pdf Developers doc: yes I know it's a showstopper, it's in a pipeline... > > diffs can be found at: > > http://www.freebsd.org/~julian/vimage.diff and it are usually > > fairly up to date. > > I am just starting to skip through the patch, not doing a close > review atm (not checking functional changes, etc. at all), and even > this is hard at the end of the day... > > Are sys/ddb/db_command.c related in any way to this? No, the change there merely displays the list of possible command completions in a sorted order. This is nothing vimage-specific. > sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. OK. Thanks for reporting this and others style violations bellow, will do a sweeping pass over the vimage branch today. > sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 > tabs. > > Do we need those changes like sys/kern/kern_switch.c ? Not really, they were / are experimental, primarily to check out how the virtualization framework can deal with other subsystems unrelated to networking. There's no plan to push for commiting these bits. > sys/kern/kern_sysctl.c has indentation problems in the > @@ -1322,7 +1421,17 @@ junk > > sys/kern/kern_timeout.c has an extra whitespace > > sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. > sys/kern/kern_vimage.c has // comments no-no > - " - s,#define NAME,#define\tNAME,g > - " - vnet_mod_register,vnet_mod_register_multi,(more)... > declarations - " - adds a new suser() call. Yes priv() should be used here instead. > - " - in vi_symlookup() 2nsd line of for, remove a space > - " - thinks like this scare me: > /* A brute force check whether there's enough mem for a new vimage > */ especially if its freed again instantly This is a huge problem that needs much work. In general if any kernel subsystem fails to allocate resources at boot time it typically panics. With vimage at any point in time we might be calling those same per-subsystem initialization vectors at run time, but with a running system we need a more gratitious back-out mechanism in resource shortages than panics. So all of the existing initialization functions will have to be extended to potentially return an error, and we'll have to extend the virtualization framework to roll back partially initialized vnets in cases of such failures. > - " - near Detach / free per-module state instances remove > whitespace - " - vi_free() remove a \t before the break; > - " - db_show_vnets should probably check for db_pager_quit OK > sys/kern/kern_xxx.c the printf looks like debugging? yup > sys/kern/sys_socket.c has an unrelated whitespace change > > sys/kern/uipc_domain.c removes a comment I am entirely sure it can be > removed. - " - why do we need to change net_init_domain(?here?) just > to cast again? OK the original comment can be restored. Casting: net_init_domain() is now of vnet_attach_fn type which passes generic void *arg from the caller, this arg may be something other than struct domain * in other cases. The idea is that we can register a single initialization function multiple times with different arguments (different struct domain * in this case). When a new vnet gets instantiated, the vimage framework takes care to invoke each registration of net_init_domain() with proper struct domain *arg in the proper order. > sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , > - " - if (how != SHUT_RD) { int error; add \n OK true > sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which > should either be renamed or removed. Yes as the name implies this is a HACK not intended to be commited. > sys/modules/Makefile does not look like it belongs there Hm I had some issues compiling zfs module long time ago so disabled this, will see if zfs + VIMAGE can be compiled now. > sys/modules/netgraph/Makefile looks really strange, can we fix > that? This was my best shot hack at compiling ng_wormhole only if options VIMAGE is defined. ng_wormhole makes no sense and doesn't even compile on non-vimage kernels - it provides an explicit "tunnel" from one vnet to another at negraph layer. > sys/modules/netgraph/pipe/Makefile has an extra space > > sys/modules/netgraph/wormhole/Makefile has an extra space > > sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either > rename or remove; also has whitespace issues and debugging > printfs in there (that should not compile). A HACK -> not to be commited... > sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) > { ... needs an extra \t, no? doesn't look nice; there are more of > those in this file; maybe not yet; not before the #ifdefs go. - " - > SYSINT .. if_attachdomain was a wrong ws change > - " - junk @@ -1842,6 +1971,24 @@ adds another suser() > - " - at the end there are two unrelated/wrong ws changes > > sys/net/if_ethersubr.c ether_reassign() has whitespace issues > - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong > > sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent > - " - gifmodevent() empty line wrongly removed > > sys/net/if_gif.h #define\tNAME > > sys/net/if_gre.c is there a reason to rename the local variables? ip_id is a global, and if_gre reuses the same name as local, hence renaming to gre_ip_id was done to reduce ambiguity in face of automated variable renaming scripts. Not sure about ip_tos -> gre_ip_tos to be honest... > sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach > w/ or w/o the #ifdef. Should the outer one go? Hmm the difference is clearly visible to me, are we looking at the same code chunk? We could probably resolve IS_DEFAULT_VNET() to always true for non-vimage kernels though... +static int vnet_loif_iattach(unused) + const void *unused; +{ + INIT_VNET_NET(curvnet); + + LIST_INIT(&V_lo_list); +#ifdef VIMAGE + if (IS_DEFAULT_VNET(curvnet)) + if_clone_attach(&lo_cloner); + else + lo_cloner.ifc_attach(&lo_cloner); +#else + if_clone_attach(&lo_cloner); +#endif + return 0; +} > - " - is there a need to move the loif check up in > lo_clone_destroy? - " - junk @@ -190,7 +266,7 @@ use 4 spaces Hmm: static void lo_clone_destroy(struct ifnet *ifp) { struct lo_softc *sc; +#ifdef INVARIANTS + INIT_VNET_NET(ifp->if_vnet); +#endif sc = ifp->if_softc; /* XXX: destroying lo0 will lead to panics. */ KASSERT(V_loif != ifp, ("%s: destroying lo0", __func__)); + mtx_lock(&lo_mtx); + LIST_REMOVE(sc, sc_next); + mtx_unlock(&lo_mtx); bpfdetach(ifp); if_detach(ifp); if_free(ifp); What excatly do you see as problematic there? > sys/net/if_mib.c SYSCTL_V_INT fix ws > > sys/net/if_var.h do we need to move if_index? Yes each vnet should have a private if_index if that was the question? > sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change > - " - rtable_init() ws wrong > - " - is that realted to more MRT changes or why are functions > split and shuffled? route_init() is called only once at boot time, and rtable_init() on each vnet instantiation. In nooptions VIMAGE configs route_init() hence directly calls rtable_init(). > - " - there were and still are more ws problems around > V_rt_tables - " - return 0; ws problem > - " - rtable_idetach() ws problem and more and the return > > sys/net/rtsock.c rnh =\n ... whitespace next line > > sys/net/vnet.h XXX RCS tag goes here do so > - " - struct vnet_net has ws issues with the _ether_ipfw line > - " - #define\tNAME > > > > I am running out of battery, so I am going to continue with the > next ~20%+- in sys/net80211/**, l 6556 tomorrow. > > > General: values in return statements should be enclosed in > parentheses. > > General: function declarations K&R vs. ANSI vs ... > > General: you are adding 92 lines with XXX, 18 say "locking", 2 say > WRONG, 10 say RCS, (other), ... can we get (most of) them fixed > before committing? (fixed, not removed) OK thanks for looking at the code, will do a sweeping round... Cheers, Marko From julian at elischer.org Wed Jun 11 22:57:18 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 11 22:57:24 2008 Subject: kinda headsup.. In-Reply-To: <200806111622.58194.zec@freebsd.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <200806111622.58194.zec@freebsd.org> Message-ID: <4850584C.1070109@elischer.org> Marko Zec wrote: > On Monday 09 June 2008 21:24:42 Bjoern A. Zeeb wrote: >> On Sun, 8 Jun 2008, Julian Elischer wrote: >>> At the BSDCAn devsummit we discussed how to proceed with committing >>> Vimage to -current. >>> >>> the Milestones included something like: >>> >>> June 8 (today) Headsup.... >>> >>> June 15 commit changes that add macros for vnet >>> (network module) and vinet(inet virtualisation) >>> with macros defined in such a way to make 0 actual >>> differences. provable by md5 etc. >>> Documentat >>> s/hostname/g//V_hostname/ >>> #define V_hostname hostname >>> 2 weeks settle time, next step prepared, tested >>> and reviewed. >> For which part were you talking about a sed/awk script to use? >> Can we have a diff for just this part (once it is avail?) > > There's now a script in p4 projects/vimage/var_rename.tcl - that one is > done in TCL, doing it in sed/awk was easier said than done... > In projects/vimage/misc/ there's an machine-generated diff plus another > small one manually forged that has to be applied afterwards for clean > compile. for compile of LINT some more were needed.. I have checked the output of that script and all fixups into: //repos/projects/vimage-commit2/... LINT compiles in that tree. It's probably what will be committed for the first step. > >> [schedule] >> >> * I am missing the BIG HEADS UP somewhere for all the people with >> outstanding work so that they will not re-do any integration multiple >> times. hmm you seem to be responding to it. so I don't see how you missed it.. >> >> * I am missing the developers and users documentation in the >> schedule. Docs are on the way. > > Users doc: vimage(8) is not entirely in sync with most recent code but > not too misleading either, and there's a reasonably up-to-date cookbook > here: http://imunes.net/virtnet/eurobsdcon07_tutorial.pdf > > Developers doc: yes I know it's a showstopper, it's in a pipeline... > >>> diffs can be found at: >>> http://www.freebsd.org/~julian/vimage.diff and it are usually >>> fairly up to date. >> I am just starting to skip through the patch, not doing a close >> review atm (not checking functional changes, etc. at all), and even >> this is hard at the end of the day... >> >> Are sys/ddb/db_command.c related in any way to this? > > No, the change there merely displays the list of possible command > completions in a sorted order. This is nothing vimage-specific. > >> sys/kern/init_main.c has an extra whitespace before the LIST_FIRST. > > OK. Thanks for reporting this and others style violations bellow, will > do a sweeping pass over the vimage branch today. > >> sys/kern/kern_linker.c Isizeof(lookup) should be 4 space indent not 2 >> tabs. >> >> Do we need those changes like sys/kern/kern_switch.c ? > > Not really, they were / are experimental, primarily to check out how the > virtualization framework can deal with other subsystems unrelated to > networking. There's no plan to push for commiting these bits. > >> sys/kern/kern_sysctl.c has indentation problems in the >> @@ -1322,7 +1421,17 @@ junk >> >> sys/kern/kern_timeout.c has an extra whitespace >> >> sys/kern/kern_vimage.c says "XXX RCS tag goes here" so add it. >> sys/kern/kern_vimage.c has // comments no-no >> - " - s,#define NAME,#define\tNAME,g >> - " - vnet_mod_register,vnet_mod_register_multi,(more)... >> declarations - " - adds a new suser() call. > > Yes priv() should be used here instead. > >> - " - in vi_symlookup() 2nsd line of for, remove a space >> - " - thinks like this scare me: >> /* A brute force check whether there's enough mem for a new vimage >> */ especially if its freed again instantly > > This is a huge problem that needs much work. In general if any kernel > subsystem fails to allocate resources at boot time it typically panics. > With vimage at any point in time we might be calling those same > per-subsystem initialization vectors at run time, but with a running > system we need a more gratitious back-out mechanism in resource > shortages than panics. So all of the existing initialization functions > will have to be extended to potentially return an error, and we'll have > to extend the virtualization framework to roll back partially > initialized vnets in cases of such failures. fixing all the init routines is a task that will take time and it will be something people will be fixing long after this commit is done. It is not however something to stop this work. If you want to not use vimage, things are as before. > >> - " - near Detach / free per-module state instances remove >> whitespace - " - vi_free() remove a \t before the break; >> - " - db_show_vnets should probably check for db_pager_quit > > OK > >> sys/kern/kern_xxx.c the printf looks like debugging? > > yup > >> sys/kern/sys_socket.c has an unrelated whitespace change >> >> sys/kern/uipc_domain.c removes a comment I am entirely sure it can be >> removed. - " - why do we need to change net_init_domain(?here?) just >> to cast again? > > OK the original comment can be restored. Casting: net_init_domain() is > now of vnet_attach_fn type which passes generic void *arg from the > caller, this arg may be something other than struct domain * in other > cases. > > The idea is that we can register a single initialization function > multiple times with different arguments (different struct domain * in > this case). When a new vnet gets instantiated, the vimage framework > takes care to invoke each registration of net_init_domain() with proper > struct domain *arg in the proper order. > >> sys/kern/uipc_socket.c junk @@ -1284,13 +1314,17 @@ s,\t, , >> - " - if (how != SHUT_RD) { int error; add \n > > OK true > >> sys/kern/vfs_lookup.c adds something called IMUNES_SYMLINK_HACK which >> should either be renamed or removed. > > Yes as the name implies this is a HACK not intended to be commited. > >> sys/modules/Makefile does not look like it belongs there > > Hm I had some issues compiling zfs module long time ago so disabled > this, will see if zfs + VIMAGE can be compiled now. > >> sys/modules/netgraph/Makefile looks really strange, can we fix >> that? > > This was my best shot hack at compiling ng_wormhole only if options > VIMAGE is defined. ng_wormhole makes no sense and doesn't even compile > on non-vimage kernels - it provides an explicit "tunnel" from one vnet > to another at netgraph layer. > >> sys/modules/netgraph/pipe/Makefile has an extra space >> >> sys/modules/netgraph/wormhole/Makefile has an extra space >> >> sys/net/bpf.c adds an IMUNES_BPF_HACK, and defines it - either >> rename or remove; also has whitespace issues and debugging >> printfs in there (that should not compile). > > A HACK -> not to be commited... the vimage branch is the 'head of development branch. From that we are extracting actual commit branches... the first commit will come from teh branch I mentioned at the top I think. the second commit will probably come from that same branch after some more changes have been moved to it.. At least that is how I'm planning it. Your comments are all however, valid.. > >> sys/net/if.c @@ -292,31 +317,73 @@ junk if (IS_DEFAULT_VNET(curvnet)) >> { ... needs an extra \t, no? doesn't look nice; there are more of >> those in this file; maybe not yet; not before the #ifdefs go. - " - >> SYSINT .. if_attachdomain was a wrong ws change >> - " - junk @@ -1842,6 +1971,24 @@ adds another suser() >> - " - at the end there are two unrelated/wrong ws changes >> >> sys/net/if_ethersubr.c ether_reassign() has whitespace issues >> - " - SYSCTL_V_INT for ether_ipfw 2nd line indent looks wrong >> >> sys/net/if_gif.c SYSCTL_V_INT 2nd line, parallel_tunnels indent >> - " - gifmodevent() empty line wrongly removed >> >> sys/net/if_gif.h #define\tNAME >> >> sys/net/if_gre.c is there a reason to rename the local variables? > > ip_id is a global, and if_gre reuses the same name as local, hence > renaming to gre_ip_id was done to reduce ambiguity in face of automated > variable renaming scripts. Not sure about ip_tos -> gre_ip_tos to be > honest... > >> sys/net/if_loop.c I cannot see a difference for vnet_loif_iattach >> w/ or w/o the #ifdef. Should the outer one go? > > Hmm the difference is clearly visible to me, are we looking at the same > code chunk? We could probably resolve IS_DEFAULT_VNET() to always true > for non-vimage kernels though... > > +static int vnet_loif_iattach(unused) > + const void *unused; > +{ > + INIT_VNET_NET(curvnet); > + > + LIST_INIT(&V_lo_list); > +#ifdef VIMAGE > + if (IS_DEFAULT_VNET(curvnet)) > + if_clone_attach(&lo_cloner); > + else > + lo_cloner.ifc_attach(&lo_cloner); > +#else > + if_clone_attach(&lo_cloner); > +#endif > + return 0; > +} > >> - " - is there a need to move the loif check up in >> lo_clone_destroy? - " - junk @@ -190,7 +266,7 @@ use 4 spaces > > Hmm: > > static void > lo_clone_destroy(struct ifnet *ifp) > { > struct lo_softc *sc; > +#ifdef INVARIANTS > + INIT_VNET_NET(ifp->if_vnet); > +#endif > > sc = ifp->if_softc; > > /* XXX: destroying lo0 will lead to panics. */ > KASSERT(V_loif != ifp, ("%s: destroying lo0", __func__)); > > + mtx_lock(&lo_mtx); > + LIST_REMOVE(sc, sc_next); > + mtx_unlock(&lo_mtx); > bpfdetach(ifp); > if_detach(ifp); > if_free(ifp); > > What excatly do you see as problematic there? > >> sys/net/if_mib.c SYSCTL_V_INT fix ws >> >> sys/net/if_var.h do we need to move if_index? > > Yes each vnet should have a private if_index if that was the question? > >> sys/net/route.c static uma_zone_t rtzone; has an uneeded ws change >> - " - rtable_init() ws wrong >> - " - is that realted to more MRT changes or why are functions >> split and shuffled? > > route_init() is called only once at boot time, and rtable_init() on each > vnet instantiation. In nooptions VIMAGE configs route_init() hence > directly calls rtable_init(). > >> - " - there were and still are more ws problems around >> V_rt_tables - " - return 0; ws problem >> - " - rtable_idetach() ws problem and more and the return >> >> sys/net/rtsock.c rnh =\n ... whitespace next line >> >> sys/net/vnet.h XXX RCS tag goes here do so >> - " - struct vnet_net has ws issues with the _ether_ipfw line >> - " - #define\tNAME >> >> >> >> I am running out of battery, so I am going to continue with the >> next ~20%+- in sys/net80211/**, l 6556 tomorrow. >> >> >> General: values in return statements should be enclosed in >> parentheses. >> >> General: function declarations K&R vs. ANSI vs ... >> >> General: you are adding 92 lines with XXX, 18 say "locking", 2 say >> WRONG, 10 say RCS, (other), ... can we get (most of) them fixed >> before committing? (fixed, not removed) > > OK thanks for looking at the code, will do a sweeping round... > > Cheers, > > Marko > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" From julian at elischer.org Thu Jun 12 16:49:20 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jun 12 16:49:23 2008 Subject: PERFORCE change 143354 for review In-Reply-To: <48514C4B.4070709@gritton.org> References: <200806120757.m5C7vm90085356@repoman.freebsd.org> <48514C4B.4070709@gritton.org> Message-ID: <4851538E.7030004@elischer.org> James Gritton wrote: > Julian Elischer wrote: >> Use the real hostname for dumps >> it has been suggested that we define a R_hostname to mean Real >> hostname >> > > Another option, particularly for hostname, is just to leave the > "hostname" global variable. Right now both jail and vimage hostnames > are fixed arrays in their structures, but I'm considering going to > pointers instead (as a jail may or may not have a virtual hostname). > Then the "root jail" could just point to the static hostname[] array > which can continue to exist under its own name. > > It could be that hostname is the special case here, as far as having a > significant number of "global" references. Load average may have that > case too, but I'm not sure yet. Load average would be later on.. we are not planning on doing htat in this round. but you are right.. it is something where it makes sense in both global and local scope. Since vimage is hierarchical, I think however that it makes sense if the load average of a vimage includes the load average of children. thus the load average of the root vimage would be that of the whole machine. just an idea. > > - Jamie From obrien at FreeBSD.ORG Thu Jun 12 21:43:10 2008 From: obrien at FreeBSD.ORG (David O'Brien) Date: Thu Jun 12 21:58:56 2008 Subject: kinda headsup.. In-Reply-To: <200806111622.58194.zec@freebsd.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <200806111622.58194.zec@freebsd.org> Message-ID: <20080612214310.GA55195@hub.freebsd.org> On Wed, Jun 11, 2008 at 04:22:58PM +0200, Marko Zec wrote: > There's now a script in p4 projects/vimage/var_rename.tcl - that one is > done in TCL, doing it in sed/awk was easier said than done... Please post a fetch(1)able URL to all the conversion files a 3rd party FreeBSD user needs to make the same changes on their source base. thanks, -- -- David (obrien@FreeBSD.org) From julian at elischer.org Thu Jun 12 23:10:32 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jun 12 23:10:36 2008 Subject: kinda headsup.. In-Reply-To: <20080612214310.GA55195@hub.freebsd.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <200806111622.58194.zec@freebsd.org> <20080612214310.GA55195@hub.freebsd.org> Message-ID: <4851ACE7.2070505@elischer.org> David O'Brien wrote: > On Wed, Jun 11, 2008 at 04:22:58PM +0200, Marko Zec wrote: >> There's now a script in p4 projects/vimage/var_rename.tcl - that one is >> done in TCL, doing it in sed/awk was easier said than done... > > Please post a fetch(1)able URL to all the conversion files a 3rd party > FreeBSD user needs to make the same changes on their source base. I sent you one but you apparently didn't believe the fact that I included a trace of fetch actually doing it.. the script is not perfect however you have to go over the output and check that it hasn't converted 1) local variables with the same name as the globals in question. 2) In one case I saw "hostname" become "V_hostname" so there is something not quite right there. > > thanks, From obrien at freebsd.org Fri Jun 13 16:21:45 2008 From: obrien at freebsd.org (David O'Brien) Date: Fri Jun 13 16:30:55 2008 Subject: kinda headsup.. In-Reply-To: <4851ACE7.2070505@elischer.org> References: <484CC690.9020303@elischer.org> <20080609174826.Q83875@maildrop.int.zabbadoz.net> <200806111622.58194.zec@freebsd.org> <20080612214310.GA55195@hub.freebsd.org> <4851ACE7.2070505@elischer.org> Message-ID: <20080613161134.GA86001@dragon.NUXI.org> On Thu, Jun 12, 2008 at 04:10:31PM -0700, Julian Elischer wrote: > David O'Brien wrote: >> On Wed, Jun 11, 2008 at 04:22:58PM +0200, Marko Zec wrote: >>> There's now a script in p4 projects/vimage/var_rename.tcl - that one is >>> done in TCL, doing it in sed/awk was easier said than done... >> Please post a fetch(1)able URL to all the conversion files a 3rd party >> FreeBSD user needs to make the same changes on their source base. > > I sent you one but you apparently didn't believe the fact that I > included a trace of fetch actually doing it.. You mentioned a URL on IRC. That is very different from putting it in an email in a thread about reviewing this work. Also from the looks of http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/projects/vimage/... folks should get var_rename.tcl and vimage_globals. On IRC you only gave an example for one file. -- -- David (obrien@FreeBSD.org) From jamie at gritton.org Fri Jun 13 19:59:07 2008 From: jamie at gritton.org (James Gritton) Date: Fri Jun 13 19:59:11 2008 Subject: V_rootvnode Message-ID: <4852D184.9060301@gritton.org> In merging vimage into jails, there's one field prison has that vimage lacks: pr_root. In planning for the merge, it would make sense to treat rootvnode like the pr_root of the base prison, i.e. use a V_rootvnode macro. As with the hostname, there are many places in the kernel where there's no reasonable context for anything other than the real root vnode. So in addition to the V_rootvnode, we'd want a G_rootvnode like the recently introduced G_hostname. - Jamie From julian at elischer.org Sat Jun 14 07:00:42 2008 From: julian at elischer.org (Julian Elischer) Date: Sat Jun 14 07:00:47 2008 Subject: Vimage headsup.. revised. In-Reply-To: <484CC690.9020303@elischer.org> References: <484CC690.9020303@elischer.org> Message-ID: <48536C9A.8020801@elischer.org> Julian Elischer wrote: > At the BSDCAn devsummit we discussed how to proceed with committing > Vimage to -current. > > the Milestones included something like: However I think we need to delay by at least a week. several issues have come up, none of them critical, but together, limiting the amount of testing and tweeking that we have been able to do. Also some interesting work has turned up that may be relevent at the later stages WRT cobining vimage and jails. SO, I think we should add 2 weeks to each of these dates. While I'd love to push ahead, commmon sense says that we will probably regret rushing in too fast. > > June 8 (today) Headsup.... > > June 15 commit changes that add macros for vnet > (network module) and vinet(inet virtualisation) > with macros defined in such a way to make 0 actual > differences. provable by md5 etc. > Documentat > s/hostname/g//V_hostname/ > #define V_hostname hostname > 2 weeks settle time, next step prepared, tested > and reviewed. > > June 29 Add changes to convert all globals to members of > per-module structures. Done in a reversible way > (i.e. compilable out). Macros defined so that > depending on compile options structures or globals > are used (one global structure). > Performance implications of using structures are > evaluated. Structures possibly tuned. > Initialisation routines added, checked and tuned. > example: > #if VIMAGE_USE_STRUCTS > #define V_hostname sys_globals.hostname > ... > #else > #define V_hostname hostname > ... > #endif > > > July 13 globals removed in vnet, vinet. > ifdefs and compile option removed or scaled back > to make code clean to read again. > Destructor routines added where needed. > Remaining "NULL Macros" (compile to nothing at this > point) committed to reduce the size of the > MEAT diffs. Review of Meat diffs formally under way > for final comment. > example: > #define INIT_VNET_INET(x) /* nothing */ > add "INIT_VNET_INET(curvnet);"(and similar) > where needed. > remove globals (e.g. 'hostname') > > > July 21 JAIL+Vimage framework committed. > e.g. add new syscall, program, etc. > (part one of meat diffs) structures still only > global instances. vimage inhansed jails can be created > but act jus tlike normal jails? > > July 28 Ability to created > 1 vimage enabled. > Vimage enhanced jails now have private network > stacks etc. > > > August start on converting more modules as needed and time > allows. > > Marko and I have been working towards splitting up the current diffs > (which do the whole thing) so allow this schedule to be followed. > > We may or may not be ready for the June 15 step by then, but if not > it may be a week there-after. So this should be considered the > heads-up. discussion will be on freebsd-virtualization@ > and the perforce branch that we have as a current working system > is branch 'vimage'. //depot/projects/vimage/... > > diffs can be found at: > http://www.freebsd.org/~julian/vimage.diff and it are usually > fairly up to date. > > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From jamie at gritton.org Wed Jun 18 03:48:39 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 03:48:42 2008 Subject: V_* meta-symbols and locking Message-ID: <48588595.7020709@gritton.org> Like everything I have to say about the V_* issue, perhaps this doesn't apply to the vnet stuff. But to the two symbols I currently care about, hostname and rootvnode, locking is a problem. Current kernel code plays fast and loose with both these symbols. Check out getcredhostname for example: void getcredhostname(struct ucred *cred, char *buf, size_t size) { struct prison *pr; pr = cred->cr_prison; if (pr != &prison0) { mtx_lock(&pr->pr_mtx); strlcpy(buf, (pr->pr_flags & PR_NOHOST) ? hostname : pr->pr_host, size); mtx_unlock(&pr->pr_mtx); } else strlcpy(buf, hostname, size); } In the prison case, it nicely locks the prison record. But for the global hostname, it just copies it. The hostname sysctl is no better about setting it. And rootvnode is referred to all over the place without any sort of lock - pretty safe since it's not expected to change (though it theoretically can). This same no-locking assumption seems to be going on with V_hostname. But now this macro applies not only to the "real" hostname but to the "virtual" one as well - no locking the vimage record. As I try to add a similar macro to my new jail framework, I find I can't. Instead of a mere variable redirection, I need to lock-copy-unlock much like getcredhostname does. Luckily, much hostname access is already jail-aware. But anything using the "real" hostname should have the same locking on prison0. Perhaps not wholly necessary since it's just a string that we know will always have a null byte at the end of the buffer, but still good form and unknown prevention. And in the case of actually virtual hostnames, it's essential since they'll be changing from fixed arrays in struct prison into pointers that may be freed. Rootvnode is a stickier problem. There's much more code that refers to it, and it's a more essential part of the system. I don't relish digging in everywhere and changing the whole rootvnode paradigm with locking. So instead my solution is to make the jail "path" parameter (and thus root vnode) set-once. So as long as the V_rootvnode is taken from a context that will remain for the duration of its use (curthread is a good bet), it will be safe to access it without locks. In particular, the real rootvnode that lives at prison0 isn't going anywhere. So in summary: I won't use V_hostname (or G_hostname), opting for explicit locking. I will V_rootvnode (and perhaps G_rootvnode). All the other network-related V_stuff may deserve a look, but it out of my purview. - Jamie From julian at elischer.org Wed Jun 18 06:40:06 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 06:40:12 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48588595.7020709@gritton.org> References: <48588595.7020709@gritton.org> Message-ID: <4858ADCC.1050909@elischer.org> James Gritton wrote: > Like everything I have to say about the V_* issue, perhaps this doesn't > apply to the vnet stuff. But to the two symbols I currently care about, > hostname and rootvnode, locking is a problem. > yes and I for one have probably not thought enough about it. > Current kernel code plays fast and loose with both these symbols. Check > out getcredhostname for example: > > void > getcredhostname(struct ucred *cred, char *buf, size_t size) > { > struct prison *pr; > > pr = cred->cr_prison; > if (pr != &prison0) { > mtx_lock(&pr->pr_mtx); > strlcpy(buf, (pr->pr_flags & PR_NOHOST) > ? hostname : pr->pr_host, size); > mtx_unlock(&pr->pr_mtx); > } else > strlcpy(buf, hostname, size); > } > > In the prison case, it nicely locks the prison record. But for the > global hostname, it just copies it. The hostname sysctl is no better > about setting it. And rootvnode is referred to all over the place > without any sort of lock - pretty safe since it's not expected to change > (though it theoretically can). I'm not sure there is much of a problem because the hostname associated with a virtual machine is a fixed array of bytes. it is true that one might be able (though unlikely) to get half of one hostname and half of another but you will never get invalid memory.. I think that the only readers of the hostname in a vm are processes in that VM so the VM is not going anywhere and thus the hostname is not going anywhere.. > > This same no-locking assumption seems to be going on with V_hostname. > But now this macro applies not only to the "real" hostname but to the > "virtual" one as well - no locking the vimage record. As I try to add a > similar macro to my new jail framework, I find I can't. Instead of a > mere variable redirection, I need to lock-copy-unlock much like > getcredhostname does. Luckily, much hostname access is already > jail-aware. But anything using the "real" hostname should have the same > locking on prison0. Perhaps not wholly necessary since it's just a > string that we know will always have a null byte at the end of the > buffer, but still good form and unknown prevention. And in the case of > actually virtual hostnames, it's essential since they'll be changing > from fixed arrays in struct prison into pointers that may be freed. I think in the vimage code it is not freeable unless the vimage is freed and in that case there is no-one to read the string. vimage0 is of course not going away under any situation. > > Rootvnode is a stickier problem. There's much more code that refers to > it, and it's a more essential part of the system. I don't relish > digging in everywhere and changing the whole rootvnode paradigm with > locking. So instead my solution is to make the jail "path" parameter > (and thus root vnode) set-once. So as long as the V_rootvnode is taken > from a context that will remain for the duration of its use (curthread > is a good bet), it will be safe to access it without locks. In > particular, the real rootvnode that lives at prison0 isn't going anywhere. teh man page for vimage(8) says for the chroot parameter: chroot Set the chroot directory for the virtual image. All new processes spawned into the target virtual image using the vimage command will be initially chrooted to that directory. This parameter can be changed only when no processes are running within the target virtual image. Note that it is not required to have a chrooted environment for a virtual image operate, which is also the default behavior. so the croot is fixed unless there is no-one using it. > > So in summary: > > I won't use V_hostname (or G_hostname), opting for explicit locking. I'm not sure you need this. > > I will V_rootvnode (and perhaps G_rootvnode). > > All the other network-related V_stuff may deserve a look, but it out of > my purview. > > - Jamie > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From zec at icir.org Wed Jun 18 14:40:28 2008 From: zec at icir.org (Marko Zec) Date: Wed Jun 18 14:40:33 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48588595.7020709@gritton.org> References: <48588595.7020709@gritton.org> Message-ID: <200806181619.07026.zec@icir.org> On Wednesday 18 June 2008 05:48:37 James Gritton wrote: > Like everything I have to say about the V_* issue, perhaps this > doesn't apply to the vnet stuff. But to the two symbols I currently > care about, hostname and rootvnode, locking is a problem. You are most probably right about the current code not sufficiently protecting the "hostname" global from concurrent access, but I don't see how V_ macroization / virtualization adds or changes anything to this particular issue. The same goes for virtualizing the networking state - if there is a locking issue in a newtorking subsystem, virtualization should not make such issues any more or less pronounced. That doesn't mean we shouldn't be looking into solving such issues, but I'd prefer to decouple such efforts from the initial push for virtualizing the networking stack, which I think should keep the changes to existing subsystem's program logic / flow to the absolute minimum. Marko > Current kernel code plays fast and loose with both these symbols. > Check out getcredhostname for example: > > void > getcredhostname(struct ucred *cred, char *buf, size_t size) > { > struct prison *pr; > > pr = cred->cr_prison; > if (pr != &prison0) { > mtx_lock(&pr->pr_mtx); > strlcpy(buf, (pr->pr_flags & PR_NOHOST) > ? hostname : pr->pr_host, size); > mtx_unlock(&pr->pr_mtx); > } else > strlcpy(buf, hostname, size); > } > > In the prison case, it nicely locks the prison record. But for the > global hostname, it just copies it. The hostname sysctl is no better > about setting it. And rootvnode is referred to all over the place > without any sort of lock - pretty safe since it's not expected to > change (though it theoretically can). > > This same no-locking assumption seems to be going on with V_hostname. > But now this macro applies not only to the "real" hostname but to the > "virtual" one as well - no locking the vimage record. As I try to > add a similar macro to my new jail framework, I find I can't. > Instead of a mere variable redirection, I need to lock-copy-unlock > much like getcredhostname does. Luckily, much hostname access is > already jail-aware. But anything using the "real" hostname should > have the same locking on prison0. Perhaps not wholly necessary since > it's just a string that we know will always have a null byte at the > end of the buffer, but still good form and unknown prevention. And > in the case of actually virtual hostnames, it's essential since > they'll be changing from fixed arrays in struct prison into pointers > that may be freed. > > Rootvnode is a stickier problem. There's much more code that refers > to it, and it's a more essential part of the system. I don't relish > digging in everywhere and changing the whole rootvnode paradigm with > locking. So instead my solution is to make the jail "path" parameter > (and thus root vnode) set-once. So as long as the V_rootvnode is > taken from a context that will remain for the duration of its use > (curthread is a good bet), it will be safe to access it without > locks. In particular, the real rootvnode that lives at prison0 isn't > going anywhere. > > So in summary: > > I won't use V_hostname (or G_hostname), opting for explicit locking. > > I will V_rootvnode (and perhaps G_rootvnode). > > All the other network-related V_stuff may deserve a look, but it out > of my purview. > > - Jamie > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From jamie at gritton.org Wed Jun 18 15:20:20 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 15:20:25 2008 Subject: V_* meta-symbols and locking In-Reply-To: <200806181619.07026.zec@icir.org> References: <48588595.7020709@gritton.org> <200806181619.07026.zec@icir.org> Message-ID: <485927AD.8010204@gritton.org> Marko Zec wrote: > You are most probably right about the current code not sufficiently > protecting the "hostname" global from concurrent access, but I don't > see how V_ macroization / virtualization adds or changes anything to > this particular issue. The change is that with the virtual hostname, I want to have the locking that is currently lacking. As that locking would be in a jail context, it doesn't make sense to go use these virtual-variable macros when the "jailness" of it is explicitly exposed anyway. > The same goes for virtualizing the networking > state - if there is a locking issue in a newtorking subsystem, > virtualization should not make such issues any more or less pronounced. I suspect most, perhaps all, of the networking variables will work with their own locking, especially if the locks themselves are part of a virtualized structure. - Jamie From jamie at gritton.org Wed Jun 18 15:56:46 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 15:56:51 2008 Subject: V_* meta-symbols and locking In-Reply-To: <4858ADCC.1050909@elischer.org> References: <48588595.7020709@gritton.org> <4858ADCC.1050909@elischer.org> Message-ID: <48593036.60502@gritton.org> Julian Elischer wrote: > I'm not sure there is much of a problem because the hostname associated with a virtual machine is a fixed array of bytes. > > it is true that one might be able (though unlikely) to get half of one hostname and half of another but you will never get invalid memory.. > > I think that the only readers of the hostname in a vm are processes in that VM so the VM is not going anywhere and thus the hostname is not going anywhere.. This is true of current jail code, and of vimage. But one of the points made in the developer summit was that a jail should be able to virtualize some things and not others. The was really meant about modules, but it made sense to me that there should also be the option not to virtualize the non-module bits, i.e. perhaps have a jail that only had the vnet stuff but kept, for example, the same hostname as its parent. And I don't mean just inheriting the current hostname, but making it totally non-virtual so any change to the parent is reflected. I'm implementing this by replacing that fixed array with a pointer that may well be freed. That makes the concurrency issues less trivial than just the possibility of copying part of one hostname and not part of another. Now perhaps it would be better to keep the fixed array, making reading the virtual hostname safe, and complicating the setting issue (I'd have to set potentially multiple jail records). This makes sense, as setting is much less common, and is in line with a similar strategy I have for the securelevel. Even with that though, the mechanism is in place for safely reading a hostname (i.e. getcredhostname) and is just not universally used. Might as well clean that up. > teh man page for vimage(8) says for the chroot parameter: > > chroot > Set the chroot directory for the virtual image. All new processes > spawned into the target virtual image using the vimage command > will be initially chrooted to that directory. This parameter can > be changed only when no processes are running within the target > virtual image. Note that it is not required to have a chrooted > environment for a virtual image operate, which is also the > default behavior. > > so the croot is fixed unless there is no-one using it. That's a good idea - more flexible than my current strategy of only allowing setting the path on jail creation, but still not messing up current jails. And I can continue to ignore the locking implications of rootvnode. - Jamie From jeremie at le-hen.org Wed Jun 18 15:55:05 2008 From: jeremie at le-hen.org (Jeremie Le Hen) Date: Wed Jun 18 16:00:32 2008 Subject: Vimage headsup.. revised. In-Reply-To: <48536C9A.8020801@elischer.org> References: <484CC690.9020303@elischer.org> <48536C9A.8020801@elischer.org> Message-ID: <20080618151911.GB46885@obiwan.tataz.chchile.org> Hi all, On Sat, Jun 14, 2008 at 12:00:42AM -0700, Julian Elischer wrote: > Also some interesting work has turned up that may be relevent > at the later stages WRT cobining vimage and jails. Is it possible to follow this work/discussion somewhere? Besides, I'd like to join the freebsd-virtualization@ mailing-list, but I can't see it in Mailman. Is it a private one? Thanks and regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From julian at elischer.org Wed Jun 18 16:19:13 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 16:19:41 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48593036.60502@gritton.org> References: <48588595.7020709@gritton.org> <4858ADCC.1050909@elischer.org> <48593036.60502@gritton.org> Message-ID: <48593586.9040600@elischer.org> James Gritton wrote: > Julian Elischer wrote: > > > I'm not sure there is much of a problem because the hostname > associated with a virtual machine is a fixed array of bytes. > > > > it is true that one might be able (though unlikely) to get half of > one hostname and half of another but you will never get invalid memory.. > > > > I think that the only readers of the hostname in a vm are processes > in that VM so the VM is not going anywhere and thus the hostname is not > going anywhere.. > > This is true of current jail code, and of vimage. But one of the > points made in the developer summit was that a jail should be able to > virtualize some things and not others. The was really meant about > modules, but it made sense to me that there should also be the option > not to virtualize the non-module bits, i.e. perhaps have a jail that > only had the vnet stuff but kept, for example, the same hostname as > its parent. And I don't mean just inheriting the current hostname, > but making it totally non-virtual so any change to the parent is > reflected. since vimage in hierarchical, and all hostnames are virualised, then the hostname you use is either yours or that of a parent vimage. either way, since you can not remove a vimage while it has children vimages, the logic still applies. > > I'm implementing this by replacing that fixed array with a pointer > that may well be freed. That makes the concurrency issues less > trivial than just the possibility of copying part of one hostname and > not part of another. Now perhaps it would be better to keep the fixed > array, making reading the virtual hostname safe, and complicating the > setting issue (I'd have to set potentially multiple jail records). > This makes sense, as setting is much less common, and is in line with > a similar strategy I have for the securelevel. Even with that though, > the mechanism is in place for safely reading a hostname (i.e. > getcredhostname) and is just not universally used. Might as well > clean that up. well if you want to do that then that is a separate thing but the reason is not because of what vimage is doing with hostname :-) > > > > the man page for vimage(8) says for the chroot parameter: > > > > chroot > > Set the chroot directory for the virtual image. All new processes > > spawned into the target virtual image using the vimage command > > will be initially chrooted to that directory. This parameter can > > be changed only when no processes are running within the target > > virtual image. Note that it is not required to have a chrooted > > environment for a virtual image operate, which is also the > > default behavior. > > > > so the croot is fixed unless there is no-one using it. > > That's a good idea - more flexible than my current strategy of only > allowing setting the path on jail creation, but still not messing up > current jails. And I can continue to ignore the locking implications > of rootvnode. Note that one could also read "or children images" I think in some of these checks.. > > - Jamie From julian at elischer.org Wed Jun 18 16:21:14 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 16:21:22 2008 Subject: Vimage headsup.. revised. In-Reply-To: <20080618151911.GB46885@obiwan.tataz.chchile.org> References: <484CC690.9020303@elischer.org> <48536C9A.8020801@elischer.org> <20080618151911.GB46885@obiwan.tataz.chchile.org> Message-ID: <485935FF.8040308@elischer.org> Jeremie Le Hen wrote: > Hi all, > > On Sat, Jun 14, 2008 at 12:00:42AM -0700, Julian Elischer wrote: >> Also some interesting work has turned up that may be relevent >> at the later stages WRT cobining vimage and jails. > > Is it possible to follow this work/discussion somewhere? > > Besides, I'd like to join the freebsd-virtualization@ mailing-list, but > I can't see it in Mailman. Is it a private one? > > Thanks and regards, I've asked teh admins to get the virtualization mailing list archived and more generally available.. From jamie at gritton.org Wed Jun 18 19:10:50 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 19:10:54 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48593586.9040600@elischer.org> References: <48588595.7020709@gritton.org> <4858ADCC.1050909@elischer.org> <48593036.60502@gritton.org> <48593586.9040600@elischer.org> Message-ID: <48595DB2.3030005@gritton.org> Julian Elischer wrote: >>> the man page for vimage(8) says for the chroot parameter: >>> >>> chroot >>> Set the chroot directory for the virtual image. All new processes >>> spawned into the target virtual image using the vimage command >>> will be initially chrooted to that directory. This parameter can >>> be changed only when no processes are running within the target >>> virtual image. Note that it is not required to have a chrooted >>> environment for a virtual image operate, which is also the >>> default behavior. >>> >>> so the croot is fixed unless there is no-one using it. > > Note that one could also read "or children images" I think in some of these checks.. The situation with setting the chroot path becomes more complicated the more I look at it. If I replicate the vimage behavior of being able to set jails more than one level below the current jail (i.e. create "foo.bar.baz" which would be placed under the current "foo.bar"), then there's not necessarily a connection between place in the prison hierarchy and the file hierarchy. I could create jail "foo.bar" rooted at /home/foo/bar and then create "foo.bar.baz" rooted at /home/baz. That's kind of nonintuitive. Or perhaps I could restrict the chroot pathname lookup not to the caller's root, but to the parent jail's root. But pathnames that are looked up with something other that the root of the process doing the looking is also rather counterintuitive. And then there's the possibility of changing the root path. Suppose I have "foo" at /home/foo and "foo.bar" at home.bar. If I then change foo's home to /jail/foo, does foo.bar's jail likewise change to /home/foo/bar? What if /jail/foo/bar doesn't exist? Should the whole thing fail, or would I have foo now at /jail/foo and foo.bar at /home/foo/bar? I could just not recursively re-root child jails when I change a chroot path - except I still should if foo.bar isn't separately chrooted and also lives at /home/foo. Making things even worse, jail allows relative chroot paths. Those saved pathnames (used for prison_canseemount and prison_enforce_statfs) are totally useless when the erstwhile current directory is unknown. I'd just not allow them, but the current behavior of rendering all mount points essentially invisible under such circumstances seems reasonable. But there'd certainly be no way to relate a relative chroot pathname to its place in any parent jails. The upshot of all this is that for now, I'm sticking with only allowing the path to be set when a jail is created. The vimage implementation of all this seems to consist entirely of the quoted man page, so I can't just go there for answers. - Jamie From julian at elischer.org Wed Jun 18 19:21:17 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 19:21:21 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48595DB2.3030005@gritton.org> References: <48588595.7020709@gritton.org> <4858ADCC.1050909@elischer.org> <48593036.60502@gritton.org> <48593586.9040600@elischer.org> <48595DB2.3030005@gritton.org> Message-ID: <48596033.5070906@elischer.org> James Gritton wrote: > Julian Elischer wrote: > > >>> the man page for vimage(8) says for the chroot parameter: > >>> > >>> chroot > >>> Set the chroot directory for the virtual image. All new processes > >>> spawned into the target virtual image using the vimage command > >>> will be initially chrooted to that directory. This parameter can > >>> be changed only when no processes are running within the target > >>> virtual image. Note that it is not required to have a chrooted > >>> environment for a virtual image operate, which is also the > >>> default behavior. > >>> > >>> so the croot is fixed unless there is no-one using it. > > > > Note that one could also read "or children images" I think in some of > these checks.. > > > The situation with setting the chroot path becomes more complicated > the more I look at it. If I replicate the vimage behavior of being > able to set jails more than one level below the current jail > (i.e. create "foo.bar.baz" which would be placed under the current > "foo.bar"), then there's not necessarily a connection between place in > the prison hierarchy and the file hierarchy. I could create jail > "foo.bar" rooted at /home/foo/bar and then create "foo.bar.baz" rooted > at /home/baz. That's kind of nonintuitive. Or perhaps I could > restrict the chroot pathname lookup not to the caller's root, but to > the parent jail's root. But pathnames that are looked up with > something other that the root of the process doing the looking is also > rather counterintuitive. > > And then there's the possibility of changing the root path. Suppose I > have "foo" at /home/foo and "foo.bar" at home.bar. If I then change > foo's home to /jail/foo, does foo.bar's jail likewise change to > /home/foo/bar? What if /jail/foo/bar doesn't exist? Should the whole > thing fail, or would I have foo now at /jail/foo and foo.bar at > /home/foo/bar? I could just not recursively re-root child jails when > I change a chroot path - except I still should if foo.bar isn't > separately chrooted and also lives at /home/foo. I beieve you can not change a chroot if there are 1/ running processes 2/ children vimages.. effectively this means that you really only have one chance to reparent a vimage, and that is right after you have created it. Once it has children (either processes or vimages), you have "fixed it. I do think that the chroot path needs to be expressed when creating it in the terms of the parent vimage. If no chroot is specified then it gets what its parent had.. I have not actually looked (yet) to see what Marko has done here.. (wanders off to do so) > > Making things even worse, jail allows relative chroot paths. Those > saved pathnames (used for prison_canseemount and > prison_enforce_statfs) are totally useless when the erstwhile current > directory is unknown. I'd just not allow them, but the current > behavior of rendering all mount points essentially invisible under > such circumstances seems reasonable. But there'd certainly be no way > to relate a relative chroot pathname to its place in any parent jails. > > The upshot of all this is that for now, I'm sticking with only > allowing the path to be set when a jail is created. > > The vimage implementation of all this seems to consist entirely of the > quoted man page, so I can't just go there for answers. > > - Jamie From zec at icir.org Wed Jun 18 19:40:44 2008 From: zec at icir.org (Marko Zec) Date: Wed Jun 18 19:40:49 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48595DB2.3030005@gritton.org> References: <48588595.7020709@gritton.org> <48593586.9040600@elischer.org> <48595DB2.3030005@gritton.org> Message-ID: <200806182140.23123.zec@icir.org> On Wednesday 18 June 2008 21:10:42 James Gritton wrote: > Julian Elischer wrote: > >>> the man page for vimage(8) says for the chroot parameter: > >>> > >>> chroot > >>> Set the chroot directory for the virtual image. All new > >>> processes spawned into the target virtual image using the vimage > >>> command will be initially chrooted to that directory. This > >>> parameter can be changed only when no processes are running > >>> within the target virtual image. Note that it is not required to > >>> have a chrooted environment for a virtual image operate, which > >>> is also the default behavior. > >>> > >>> so the croot is fixed unless there is no-one using it. > > > > Note that one could also read "or children images" I think in some > > of > > these checks.. > > > The situation with setting the chroot path becomes more complicated > the more I look at it. If I replicate the vimage behavior of being > able to set jails more than one level below the current jail > (i.e. create "foo.bar.baz" which would be placed under the current > "foo.bar"), then there's not necessarily a connection between place > in the prison hierarchy and the file hierarchy. I could create jail > "foo.bar" rooted at /home/foo/bar and then create "foo.bar.baz" > rooted at /home/baz. That's kind of nonintuitive. True. Chrooting "foo.bar.baz" at absolute path of /home/foo/bar/home/baz could be a more logical action. > Or perhaps I > could restrict the chroot pathname lookup not to the caller's root, > but to the parent jail's root. But pathnames that are looked up with > something other that the root of the process doing the looking is > also rather counterintuitive. > > And then there's the possibility of changing the root path. Suppose > I have "foo" at /home/foo and "foo.bar" at home.bar. If I then > change foo's home to /jail/foo, does foo.bar's jail likewise change > to /home/foo/bar? What if /jail/foo/bar doesn't exist? Should the > whole thing fail, or would I have foo now at /jail/foo and foo.bar at > /home/foo/bar? I could just not recursively re-root child jails when > I change a chroot path - except I still should if foo.bar isn't > separately chrooted and also lives at /home/foo. > > Making things even worse, jail allows relative chroot paths. Those > saved pathnames (used for prison_canseemount and > prison_enforce_statfs) are totally useless when the erstwhile current > directory is unknown. I'd just not allow them, but the current > behavior of rendering all mount points essentially invisible under > such circumstances seems reasonable. But there'd certainly be no way > to relate a relative chroot pathname to its place in any parent > jails. > > The upshot of all this is that for now, I'm sticking with only > allowing the path to be set when a jail is created. > > The vimage implementation of all this seems to consist entirely of > the quoted man page, so I can't just go there for answers. vimage(8) simply invokes chroot(2) to the target directory (stored a plaintext string in the kernel) before spawning a new process inside the target VM. Obviously such an approach has served only as a proof of concept hack (though there's some anecdotal evidence some ISPs have been making money using precisely such vimage implementation on FreeBSD 4.11). Hence, don't feel constrained by the legacy of such kludges, and feel free to propose better alternatives. The only thing I'd like to have as an option is to be able to spawn a new process in the target VM _without_ making it chrooted... Marko From jamie at gritton.org Wed Jun 18 19:46:46 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 19:46:50 2008 Subject: V_* meta-symbols and locking In-Reply-To: <200806182140.23123.zec@icir.org> References: <48588595.7020709@gritton.org> <48593586.9040600@elischer.org> <48595DB2.3030005@gritton.org> <200806182140.23123.zec@icir.org> Message-ID: <4859661E.9070502@gritton.org> Marko Zec wrote: > The only thing I'd like to have > as an option is to be able to spawn a new process in the target VM > _without_ making it chrooted... If you mean creating a jail that's not chrooted, that's no problem. If you mean creating a jail that *is* chrooted, and then placing a process into that jail without chrooting it, that would be a breakage of the jail paradigm. Hopefully you mean the former? - Jamie From bzeeb-lists at lists.zabbadoz.net Wed Jun 18 19:50:18 2008 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Wed Jun 18 19:50:24 2008 Subject: Vimage headsup.. revised. In-Reply-To: <485935FF.8040308@elischer.org> References: <484CC690.9020303@elischer.org> <48536C9A.8020801@elischer.org> <20080618151911.GB46885@obiwan.tataz.chchile.org> <485935FF.8040308@elischer.org> Message-ID: <20080618192903.O83875@maildrop.int.zabbadoz.net> On Wed, 18 Jun 2008, Julian Elischer wrote: > Jeremie Le Hen wrote: >> Hi all, >> >> On Sat, Jun 14, 2008 at 12:00:42AM -0700, Julian Elischer wrote: >>> Also some interesting work has turned up that may be relevent >>> at the later stages WRT cobining vimage and jails. >> >> Is it possible to follow this work/discussion somewhere? >> >> Besides, I'd like to join the freebsd-virtualization@ mailing-list, but >> I can't see it in Mailman. Is it a private one? >> >> Thanks and regards, > > > I've asked teh admins to get the virtualization mailing list archived and > more generally available.. I has always been I think - as the "private" thing was sorted out before creation. It's on http://lists.freebsd.org/mailman/listinfo you can also go there directly: http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization http://lists.freebsd.org/pipermail/freebsd-virtualization/ !!!!!!!! be sure to read this though: http://lists.freebsd.org/pipermail/freebsd-virtualization/2008-May/000000.html !!!!!!!! It seems to be missing in the handbook still. Dunno why. /bz -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From zec at icir.org Wed Jun 18 19:57:01 2008 From: zec at icir.org (Marko Zec) Date: Wed Jun 18 19:57:06 2008 Subject: V_* meta-symbols and locking In-Reply-To: <4859661E.9070502@gritton.org> References: <48588595.7020709@gritton.org> <200806182140.23123.zec@icir.org> <4859661E.9070502@gritton.org> Message-ID: <200806182156.37998.zec@icir.org> On Wednesday 18 June 2008 21:46:38 James Gritton wrote: > Marko Zec wrote: > > The only thing I'd like to have > > as an option is to be able to spawn a new process in the target VM > > _without_ making it chrooted... > > If you mean creating a jail that's not chrooted, that's no problem. > If you mean creating a jail that *is* chrooted, and then placing a > process into that jail without chrooting it, that would be a breakage > of the jail paradigm. Hopefully you mean the former? No, I want the later, as an option. Given that the parent environment / jail completely controls the child anyhow, I don't think such an (optional) behavior would be too big a security issue. Marko From jamie at gritton.org Wed Jun 18 20:08:04 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 20:08:08 2008 Subject: V_* meta-symbols and locking In-Reply-To: <200806182156.37998.zec@icir.org> References: <48588595.7020709@gritton.org> <200806182140.23123.zec@icir.org> <4859661E.9070502@gritton.org> <200806182156.37998.zec@icir.org> Message-ID: <48596B1D.8000801@gritton.org> Marko Zec wrote: >>> The only thing I'd like to have >>> as an option is to be able to spawn a new process in the target VM >>> _without_ making it chrooted... >> >> If you mean creating a jail that's not chrooted, that's no problem. >> If you mean creating a jail that *is* chrooted, and then placing a >> process into that jail without chrooting it, that would be a breakage >> of the jail paradigm. Hopefully you mean the former? > > No, I want the later, as an option. Given that the parent environment / > jail completely controls the child anyhow, I don't think such an > (optional) behavior would be too big a security issue. I'm thinking of the security implications, but of the mess. The existing jail_attach() wouldn't be sufficient, as it only passes a jid. You'd need a separate jail_attach2() sysctl call. Would this be exactly the same as jail_attach() except that it doesn't do the chroot? That sounds like a one-off to me. So would you instead have a way of specifying which parts of the jail environment you do and don't want? Then you'd have to not only know what virtualizations a jail does and doesn't do (the problem I'm working on), but also what virtualizations every process in a jail may or may not have enabled. It might be that only the chroot can reasonable support this kind of split anyway. Why do you want this? If you want to be part of the jail, you attach to it. If you want to do administration of the jail environment, it should be sufficient to do it from outside. - Jamie From julian at elischer.org Wed Jun 18 20:18:01 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 20:18:07 2008 Subject: Vimage headsup.. revised. In-Reply-To: <20080618192903.O83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <48536C9A.8020801@elischer.org> <20080618151911.GB46885@obiwan.tataz.chchile.org> <485935FF.8040308@elischer.org> <20080618192903.O83875@maildrop.int.zabbadoz.net> Message-ID: <48596D7F.2060203@elischer.org> Bjoern A. Zeeb wrote: > On Wed, 18 Jun 2008, Julian Elischer wrote: > >> Jeremie Le Hen wrote: >>> Hi all, >>> >>> On Sat, Jun 14, 2008 at 12:00:42AM -0700, Julian Elischer wrote: >>>> Also some interesting work has turned up that may be relevent >>>> at the later stages WRT cobining vimage and jails. >>> >>> Is it possible to follow this work/discussion somewhere? >>> >>> Besides, I'd like to join the freebsd-virtualization@ mailing-list, but >>> I can't see it in Mailman. Is it a private one? >>> >>> Thanks and regards, >> >> >> I've asked teh admins to get the virtualization mailing list archived >> and more generally available.. > > I has always been I think - as the "private" thing was sorted out > before creation. > > It's on > http://lists.freebsd.org/mailman/listinfo > > you can also go there directly: > > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > http://lists.freebsd.org/pipermail/freebsd-virtualization/ but you can't get to it via http://www.freebsd.org/search/search.html#mailinglists > > !!!!!!!! > be sure to read this though: > > http://lists.freebsd.org/pipermail/freebsd-virtualization/2008-May/000000.html > > !!!!!!!! > > It seems to be missing in the handbook still. Dunno why. > > /bz > From julian at elischer.org Wed Jun 18 20:22:29 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jun 18 20:22:34 2008 Subject: V_* meta-symbols and locking In-Reply-To: <200806182140.23123.zec@icir.org> References: <48588595.7020709@gritton.org> <48593586.9040600@elischer.org> <48595DB2.3030005@gritton.org> <200806182140.23123.zec@icir.org> Message-ID: <48596E8B.6010906@elischer.org> Marko Zec wrote: > On Wednesday 18 June 2008 21:10:42 James Gritton wrote: >> Julian Elischer wrote: >> >>> the man page for vimage(8) says for the chroot parameter: >> >>> >> >>> chroot >> >>> Set the chroot directory for the virtual image. All new >> >>> processes spawned into the target virtual image using the vimage >> >>> command will be initially chrooted to that directory. This >> >>> parameter can be changed only when no processes are running >> >>> within the target virtual image. Note that it is not required to >> >>> have a chrooted environment for a virtual image operate, which >> >>> is also the default behavior. >> >>> >> >>> so the croot is fixed unless there is no-one using it. >> > >> > Note that one could also read "or children images" I think in some >> > of >> >> these checks.. >> >> >> The situation with setting the chroot path becomes more complicated >> the more I look at it. If I replicate the vimage behavior of being >> able to set jails more than one level below the current jail >> (i.e. create "foo.bar.baz" which would be placed under the current >> "foo.bar"), then there's not necessarily a connection between place >> in the prison hierarchy and the file hierarchy. I could create jail >> "foo.bar" rooted at /home/foo/bar and then create "foo.bar.baz" >> rooted at /home/baz. That's kind of nonintuitive. > > True. Chrooting "foo.bar.baz" at absolute path > of /home/foo/bar/home/baz could be a more logical action. > >> Or perhaps I >> could restrict the chroot pathname lookup not to the caller's root, >> but to the parent jail's root. But pathnames that are looked up with >> something other that the root of the process doing the looking is >> also rather counterintuitive. >> >> And then there's the possibility of changing the root path. Suppose >> I have "foo" at /home/foo and "foo.bar" at home.bar. If I then >> change foo's home to /jail/foo, does foo.bar's jail likewise change >> to /home/foo/bar? What if /jail/foo/bar doesn't exist? Should the >> whole thing fail, or would I have foo now at /jail/foo and foo.bar at >> /home/foo/bar? I could just not recursively re-root child jails when >> I change a chroot path - except I still should if foo.bar isn't >> separately chrooted and also lives at /home/foo. >> >> Making things even worse, jail allows relative chroot paths. Those >> saved pathnames (used for prison_canseemount and >> prison_enforce_statfs) are totally useless when the erstwhile current >> directory is unknown. I'd just not allow them, but the current >> behavior of rendering all mount points essentially invisible under >> such circumstances seems reasonable. But there'd certainly be no way >> to relate a relative chroot pathname to its place in any parent >> jails. >> >> The upshot of all this is that for now, I'm sticking with only >> allowing the path to be set when a jail is created. >> >> The vimage implementation of all this seems to consist entirely of >> the quoted man page, so I can't just go there for answers. > > vimage(8) simply invokes chroot(2) to the target directory (stored a > plaintext string in the kernel) before spawning a new process inside > the target VM. > > Obviously such an approach has served only as a proof of concept hack > (though there's some anecdotal evidence some ISPs have been making > money using precisely such vimage implementation on FreeBSD 4.11). > Hence, don't feel constrained by the legacy of such kludges, and feel > free to propose better alternatives. The only thing I'd like to have > as an option is to be able to spawn a new process in the target VM > _without_ making it chrooted... I'd say that there is no choice.. it has to be chrooted.. to something.. but that chroot could be either the chroot of the parent or a subdirectory of that. it can not be a directory back toward the root from the parent's chroot. > > Marko From jeremie at le-hen.org Wed Jun 18 20:26:27 2008 From: jeremie at le-hen.org (Jeremie Le Hen) Date: Wed Jun 18 20:26:34 2008 Subject: Vimage headsup.. revised. In-Reply-To: <20080618192903.O83875@maildrop.int.zabbadoz.net> References: <484CC690.9020303@elischer.org> <48536C9A.8020801@elischer.org> <20080618151911.GB46885@obiwan.tataz.chchile.org> <485935FF.8040308@elischer.org> <20080618192903.O83875@maildrop.int.zabbadoz.net> Message-ID: <20080618202212.GF46885@obiwan.tataz.chchile.org> On Wed, Jun 18, 2008 at 07:32:03PM +0000, Bjoern A. Zeeb wrote: > !!!!!!!! > be sure to read this though: > > http://lists.freebsd.org/pipermail/freebsd-virtualization/2008-May/000000.html > !!!!!!!! You may want to ask postmaster@ to put this message in the mailing-list description for now. (See "About freebsd-virtualization" on http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization) Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From jamie at gritton.org Wed Jun 18 20:59:32 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 20:59:35 2008 Subject: How rootvnode is used Message-ID: <4859772D.5030208@gritton.org> A closer look at rootvnode shows it's not as heavily used as I thought it was. Almost everything uses the current process's own root directory (fd_rdir) and occasionally its jail directory (fd_jdir). Among the direct users of rootvnode, many explicitly want the system root directory: ZFS for kernel threads that don't yet have a root ZFS for opening device files, where comments say "root of the global zone". dounmount, if forcefully unmounting. vfs_mountroot. check_root, if chroot_allow_open_directories == 1 nfs_namei, with pubflag set. audit_canon_path, checking if the path came from the system root. vfs_mountroot is special, as after it sets rootvnode, I'd want to set prison0's vnode as well (currently strategy is to keep the rootvnode global and prison0.pr_root as "separate but equal"). The case of dounmount forcefully unmounting the root filesystem is only a precursor to system shutdown. One other rootvnode user is mountcheckdirs, which will change rootvnode if it's mounted on top of. This also checks the cdir, rdir ,and fdir of every process in the system, and should be augmented to also check the pr_root of every prison (including prison0). Aside from the initial vfs_mountroot call, this is the only way rootvnode's value is ever changed. All other uses of rootvnode involve walking up the file tree: lookup will stop at fd_rdir or fd_jdir or fd_rootvnode when looking up "..". It should also stop at the current prison's pr_root. vn_fullpath1 (kern___getcwd and vn_fullpath) stop at fd_rdir or fd_rootvnode. In addition to stopping at the current prison's pr_root, it should check fd_jdir as well. linux_getcwd only uses rootvnode when fd_rdir is null (which as far as I can tell never happens in user threads, and kernel threads don't go here). Thus it really only stops at fd_rdir. It returns an error if it gets to real root before the process root (i.e. if an open directory was fchdir'd to), so there's no particular need for checking prison root to just end up giving the same error. That leaves only four places in the kernel where prison root directories would be checked (in addition to jail_attach of course). I should be able to handle that, even with decent locking. - Jamie From jamie at gritton.org Wed Jun 18 21:02:42 2008 From: jamie at gritton.org (James Gritton) Date: Wed Jun 18 21:02:45 2008 Subject: V_* meta-symbols and locking In-Reply-To: <200806182156.37998.zec@icir.org> References: <48588595.7020709@gritton.org> <200806182140.23123.zec@icir.org> <4859661E.9070502@gritton.org> <200806182156.37998.zec@icir.org> Message-ID: <485977EB.90504@gritton.org> Marko Zec wrote: >>> The only thing I'd like to have >>> as an option is to be able to spawn a new process in the target VM >>> _without_ making it chrooted... >> >> If you mean creating a jail that's not chrooted, that's no problem. >> If you mean creating a jail that *is* chrooted, and then placing a >> process into that jail without chrooting it, that would be a breakage >> of the jail paradigm. Hopefully you mean the former? > > No, I want the later, as an option. Given that the parent environment / > jail completely controls the child anyhow, I don't think such an > (optional) behavior would be too big a security issue. One thing you could do is keep a file descriptor open to the real root directory, and call jail_attach(). As long as the system is in its default state of chroot_allow_open_directories == 1, you can then fchdir() or openat() from the saved descriptor. That could easily be made an option to jexec(8). - Jamie From zec at icir.org Wed Jun 18 21:04:21 2008 From: zec at icir.org (Marko Zec) Date: Wed Jun 18 21:04:24 2008 Subject: V_* meta-symbols and locking In-Reply-To: <48596B1D.8000801@gritton.org> References: <48588595.7020709@gritton.org> <200806182156.37998.zec@icir.org> <48596B1D.8000801@gritton.org> Message-ID: <200806182303.57184.zec@icir.org> On Wednesday 18 June 2008 22:07:57 James Gritton wrote: > Marko Zec wrote: > >>> The only thing I'd like to have > >>> as an option is to be able to spawn a new process in the target > >>> VM _without_ making it chrooted... > >> > >> If you mean creating a jail that's not chrooted, that's no > >> problem. If you mean creating a jail that *is* chrooted, and then > >> placing a process into that jail without chrooting it, that would > >> be a breakage of the jail paradigm. Hopefully you mean the > >> former? > > > > No, I want the later, as an option. Given that the parent > > environment / jail completely controls the child anyhow, I don't > > think such an (optional) behavior would be too big a security > > issue. > > I'm thinking of the security implications, but of the mess. The > existing jail_attach() wouldn't be sufficient, as it only passes a > jid. You'd need a separate jail_attach2() sysctl call. Would this > be exactly the same as jail_attach() except that it doesn't do the > chroot? That sounds like a one-off to me. So would you instead have > a way of specifying which parts of the jail environment you do and > don't want? Then you'd have to not only know what virtualizations a > jail does and doesn't do (the problem I'm working on), but also what > virtualizations every process in a jail may or may not have enabled. > It might be that only the chroot can reasonable support this kind of > split anyway. > > Why do you want this? If you want to be part of the jail, you attach > to it. If you want to do administration of the jail environment, it > should be sufficient to do it from outside. Consider a situation where the directory tree of a jail would get corrupted or compromised (by accident or maliciously), so that you couldn't or wouldn't wish to exec any binaries from that part of the filesystem tree, but you'd like to check the network state / setup there (TCP connections, routes, firewall...), perhaps do a tcpdump capture and store it in a file inaccessible to processes already running in that jail... Marko From julian at elischer.org Thu Jun 19 19:04:10 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jun 19 19:04:15 2008 Subject: vimage (and vimage-devel) branches Message-ID: <485AADB0.8080006@elischer.org> Status: Currently they are compiling GENERIC fine, and VIMAGE ok, but failing to compile LINT the current points I know we need to work on are: sctp: I'm working with randall on this pf: we need a pf maintainer to take a look and decide the right things to do for this. the work that is being don on the vimage/jail merge (something I didn't expect to happen so quickly) needs to be looked at to see how it affects our schedule. The vimage-commit2 branch is looking fine but I want to do some additions for SCTP in it. From julian at elischer.org Thu Jun 19 19:49:08 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jun 19 19:49:12 2008 Subject: vimage (and vimage-devel) branches In-Reply-To: <200806192147.22285.zec@freebsd.org> References: <485AADB0.8080006@elischer.org> <200806192147.22285.zec@freebsd.org> Message-ID: <485AB83A.5020006@elischer.org> Marko Zec wrote: > On Thursday 19 June 2008 21:04:16 Julian Elischer wrote: >> Status: >> >> Currently they are compiling GENERIC fine, and VIMAGE >> ok, but failing to compile LINT > > LINT compiles fine on i386 / vimage-commit2, but not on other branches. yes > > Marko > >> the current points I know we need to work on are: >> >> sctp: I'm working with randall on this >> pf: we need a pf maintainer to take a look and decide >> the right things to do for this. >> >> the work that is being don on the vimage/jail >> merge (something I didn't expect to happen so quickly) >> needs to be looked at to see how it affects our schedule. >> >> >> The vimage-commit2 branch is looking fine >> but I want to do some additions for SCTP in it. > From zec at freebsd.org Thu Jun 19 19:47:46 2008 From: zec at freebsd.org (Marko Zec) Date: Thu Jun 19 20:06:14 2008 Subject: vimage (and vimage-devel) branches In-Reply-To: <485AADB0.8080006@elischer.org> References: <485AADB0.8080006@elischer.org> Message-ID: <200806192147.22285.zec@freebsd.org> On Thursday 19 June 2008 21:04:16 Julian Elischer wrote: > Status: > > Currently they are compiling GENERIC fine, and VIMAGE > ok, but failing to compile LINT LINT compiles fine on i386 / vimage-commit2, but not on other branches. Marko > the current points I know we need to work on are: > > sctp: I'm working with randall on this > pf: we need a pf maintainer to take a look and decide > the right things to do for this. > > the work that is being don on the vimage/jail > merge (something I didn't expect to happen so quickly) > needs to be looked at to see how it affects our schedule. > > > The vimage-commit2 branch is looking fine > but I want to do some additions for SCTP in it. From bzeeb-lists at lists.zabbadoz.net Mon Jun 30 08:35:08 2008 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Mon Jun 30 08:35:12 2008 Subject: vimage status? Message-ID: <20080630082647.H83875@maildrop.int.zabbadoz.net> Hi, I think a lot of us lost track on the current state. So I have a few questions: 1) there are almost half a dozen p4 branches now, can you (maybe on the wiki) put up a description for which stage each branch is for and the current state (with a date) and keep this updated? 2) What is the current state of stage 1? Is there a script+patch for (non p4 people) to download and point at/review? What is the plan for committing? 1 day, 2 days, 1 week, 2 weeks? Just roughly. I know it's sooner than later but just to give people an idea as I haven't seen any review requests so far. 3) A few of us had started a review the entire thing and had been told to `wait' for some cleanup to happen? Has this happened? Is there a new patch on the entire work that would be ready for review? Which branch to use to commit changes to (or rather send patches)? 4) Some of us had tried to get vimage to boot and failed. Should we give it a try again? Which patch/branch? Bjoern -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From julian at elischer.org Mon Jun 30 16:48:19 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jun 30 16:48:24 2008 Subject: vimage status? In-Reply-To: <20080630082647.H83875@maildrop.int.zabbadoz.net> References: <20080630082647.H83875@maildrop.int.zabbadoz.net> Message-ID: <48690E63.3060309@elischer.org> Bjoern A. Zeeb wrote: > Hi, > > I think a lot of us lost track on the current state. So I have a few > questions: > > 1) there are almost half a dozen p4 branches now, can you (maybe on > the wiki) put up a description for which stage each branch is for and > the current state (with a date) and keep this updated? I have doc explaining this that I sent to silby. I'll put it on the wiki if I can. > > 2) What is the current state of stage 1? Is there a script+patch for > (non p4 people) to download and point at/review? What is the plan for > committing? 1 day, 2 days, 1 week, 2 weeks? Just roughly. I know it's > sooner than later but just to give people an idea as I haven't seen > any review requests so far. stage 1 is currently as you see it in the vimage-commit2 branch as of friday it was good, but an MFC over the weekend broke a few things..(the pain of following -current in a branch) The branch "vimage-commit3" is an alternate commit candidate that contains more stuff that also goes to NULL (nut more than was agreed to in the meeting). The reason that we might include the extra stuff is that it reduces the size of the remaining diff considerably and makes it easier to see the 'real' diffs that are to follow. > > 3) A few of us had started a review the entire thing and had been told > to `wait' for some cleanup to happen? Has this happened? Is there a > new patch on the entire work that would be ready for review? Which > branch to use to commit changes to (or rather send patches)? I am working a couple of days a week on trying to separate out the diffs to the stage where we can do what we were asked to do. in doing so I have come across some issues that may require some clarification and work. So I'm pushing back the commits "indefinitley" until we can get enough time to clean/address these issues. it's still probably worth reading the base vimage files i.e. kern_vimage.c and sys/vimage.h regardless of whether we are cleaning. > > 4) Some of us had tried to get vimage to boot and failed. Should we > give it a try again? Which patch/branch? The 'vimage' branch is the one to try as it is the 'final product' however since my MFC on Sunday it is probably broken. (the price of following -current) > > > Bjoern > From rwatson at FreeBSD.org Mon Jun 30 16:08:52 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jun 30 16:54:47 2008 Subject: vimage status? In-Reply-To: <20080630082647.H83875@maildrop.int.zabbadoz.net> References: <20080630082647.H83875@maildrop.int.zabbadoz.net> Message-ID: <20080630165016.B77620@fledge.watson.org> On Mon, 30 Jun 2008, Bjoern A. Zeeb wrote: > 1) there are almost half a dozen p4 branches now, can you (maybe on the > wiki) put up a description for which stage each branch is for and the > current state (with a date) and keep this updated? > > 2) What is the current state of stage 1? Is there a script+patch for (non p4 > people) to download and point at/review? What is the plan for committing? 1 > day, 2 days, 1 week, 2 weeks? Just roughly. I know it's sooner than later > but just to give people an idea as I haven't seen any review requests so > far. Now that I'm back from travel, I'm happy to spend a bit of time reviewing candidate patches, so if there is a final-looking patch pending commit somewhere that I can go through, please let me know. Ideally one without lots of known issues, or at least a list of known issues, so that I can focus the review time on things that aren't destined to be immediately changed. Robert N M Watson Computer Laboratory University of Cambridge