From pcc at gmx.net Sat May 2 13:39:43 2009 From: pcc at gmx.net (Peter Cornelius) Date: Sat May 2 13:39:49 2009 Subject: VIMAGE (was: Multiple default routes / Force external routing) In-Reply-To: <20090425133006.311010@gmx.net> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> Message-ID: <20090502131259.31160@gmx.net> Hi, Are vimage and jail mutually exclusive? Regards, Peter. -------- Original-Nachricht -------- > Datum: Sat, 25 Apr 2009 15:30:06 +0200 > Von: "Peter Cornelius" > An: Marko Zec , freebsd-net@freebsd.org > CC: sfourman@gmail.com, steve@ibctech.ca, sthaug@nethelp.no, julian@elischer.org > Betreff: Re: VIMAGE (was: Multiple default routes / Force external routing) > Thanks, Marco, > > > > > > is VIMAGE fully integrated into FreeBSD 8 CURRENT? (I believe > this > > > > > answer is no) > > > > > also is VIMAGE expected to make it into FreeBSD 8? > > > > > > > > not fully but a lot of it is under way > > > > > > Thanks for the pointer, I currently don't get it [1] to build on > > RELENG_7 > > > which I naively hoped, so the "lot" probably not suffient for me yet. > > So, > > > w/o patience for August, I probably need to find another way. > > > > Hmm... > > tpx32% uname -a > > FreeBSD tpx32.icir.org 7.1-STABLE FreeBSD 7.1-STABLE #0: Thu Feb 5 > > 22:36:40 > > CET 2009 > > > marko@tpx32.icir.org:/u/marko/p4/zec/vimage_7/src/sys/i386/compile/VIMAGE > > i386 > > tpx32% pwd > > /u/marko/tmp > > tpx32% tar -xzf vimage_7_20090401.tgz > > tpx32% cd src/sys/i386/conf/ > > tpx32% config VIMAGE > > tpx32% cd ../compile/VIMAGE/ > > tpx32% make depend; make > > tpx32% sudo make install > > tpx32% cd ~/tmp/src/usr.sbin/vimage/ > > tpx32% make clean; make > > tpx32% sudo make install > > > > Let me know if that doesn't work... > > In fact, it *does* work, thank you. I mistook the tar to be a patch to > copy over an existing tree which obviously did not work out as I expected. So, > how's that: > > Copyright (c) 1992-2009 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 7.2-PRERELEASE #0: Sat Apr 25 08:22:26 UTC 2009 > > root@netserv.ka.cornelius:/usr/src.VIMAGE_20090401/sys/i386/compile/VNETSERV > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel Pentium III (1004.52-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x686 Stepping = 6 > > Features=0x383fbff > real memory = 1610596352 (1535 MB) > avail memory = 1568624640 (1495 MB) > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 3 > cpu1 (AP): APIC ID: 0 > (...) > > So, I suppose it's further reading time and then I'll go and set up a > couple of vimages and see what it does... :) > > Thanks again, > > Peter. > -- > Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: > http://www.gmx.net/de/go/multimessenger01 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a From clcchu at hotmail.com Sat May 2 15:00:06 2009 From: clcchu at hotmail.com (Clarence Chu) Date: Sat May 2 15:00:12 2009 Subject: VIMAGE (was: Multiple default routes / Force external routing) In-Reply-To: <20090502131259.31160@gmx.net> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> Message-ID: > > Are vimage and jail mutually exclusive? > man vimage, in it: SEE ALSO jail(8) Clarence CHU _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From julian at elischer.org Sat May 2 17:00:48 2009 From: julian at elischer.org (Julian Elischer) Date: Sat May 2 17:00:53 2009 Subject: VIMAGE In-Reply-To: <20090502131259.31160@gmx.net> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> Message-ID: <49FC78DA.2010201@elischer.org> Peter Cornelius wrote: > Hi, > > Are vimage and jail mutually exclusive? no The situation is that right now jail and vimage are orthogonal (ish) however in the future, vimage will become a set of options on jail. > > Regards, > > Peter. > > > -------- Original-Nachricht -------- >> Datum: Sat, 25 Apr 2009 15:30:06 +0200 >> Von: "Peter Cornelius" >> An: Marko Zec , freebsd-net@freebsd.org >> CC: sfourman@gmail.com, steve@ibctech.ca, sthaug@nethelp.no, julian@elischer.org >> Betreff: Re: VIMAGE (was: Multiple default routes / Force external routing) > >> Thanks, Marco, >> >>>>>> is VIMAGE fully integrated into FreeBSD 8 CURRENT? (I believe >> this >>>>>> answer is no) >>>>>> also is VIMAGE expected to make it into FreeBSD 8? >>>>> not fully but a lot of it is under way >>>> Thanks for the pointer, I currently don't get it [1] to build on >>> RELENG_7 >>>> which I naively hoped, so the "lot" probably not suffient for me yet. >>> So, >>>> w/o patience for August, I probably need to find another way. >>> Hmm... >>> tpx32% uname -a >>> FreeBSD tpx32.icir.org 7.1-STABLE FreeBSD 7.1-STABLE #0: Thu Feb 5 >>> 22:36:40 >>> CET 2009 >>> >> marko@tpx32.icir.org:/u/marko/p4/zec/vimage_7/src/sys/i386/compile/VIMAGE >>> i386 >>> tpx32% pwd >>> /u/marko/tmp >>> tpx32% tar -xzf vimage_7_20090401.tgz >>> tpx32% cd src/sys/i386/conf/ >>> tpx32% config VIMAGE >>> tpx32% cd ../compile/VIMAGE/ >>> tpx32% make depend; make >>> tpx32% sudo make install >>> tpx32% cd ~/tmp/src/usr.sbin/vimage/ >>> tpx32% make clean; make >>> tpx32% sudo make install >>> >>> Let me know if that doesn't work... >> In fact, it *does* work, thank you. I mistook the tar to be a patch to >> copy over an existing tree which obviously did not work out as I expected. So, >> how's that: >> >> Copyright (c) 1992-2009 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 >> The Regents of the University of California. All rights reserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 7.2-PRERELEASE #0: Sat Apr 25 08:22:26 UTC 2009 >> >> root@netserv.ka.cornelius:/usr/src.VIMAGE_20090401/sys/i386/compile/VNETSERV >> Timecounter "i8254" frequency 1193182 Hz quality 0 >> CPU: Intel Pentium III (1004.52-MHz 686-class CPU) >> Origin = "GenuineIntel" Id = 0x686 Stepping = 6 >> >> Features=0x383fbff >> real memory = 1610596352 (1535 MB) >> avail memory = 1568624640 (1495 MB) >> ACPI APIC Table: >> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >> cpu0 (BSP): APIC ID: 3 >> cpu1 (AP): APIC ID: 0 >> (...) >> >> So, I suppose it's further reading time and then I'll go and set up a >> couple of vimages and see what it does... :) >> >> Thanks again, >> >> Peter. >> -- >> Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: >> http://www.gmx.net/de/go/multimessenger01 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From olivier at gid0.org Sat May 2 18:52:47 2009 From: olivier at gid0.org (Olivier SMEDTS) Date: Sat May 2 18:52:55 2009 Subject: VIMAGE status In-Reply-To: <49FC812B.2070305@elischer.org> References: <49FC812B.2070305@elischer.org> Message-ID: <367b2c980905021130i76012f91i7cce93edd55cacad@mail.gmail.com> 2009/5/2 Julian Elischer : > The VIMAGE code is nearly all in the the kernel. > > One is now able to make VIMAGE kernels (add options VIMAGE) > though they don't actually allow you to make multiple > vimages instances yet.. > > The VIMAGE option enables all the low level changes needed > throughout the kernel. > > The VIMAGE_GLOBALS option basically sets thing sback to how they were > before. > > Having neither (the default) gives a kernel that is a kind of hybrid. > > The Hybrid state is what will go forward as 'NON-VIMAGE' mode > and the VIMAGE_GLOBALS mode will probably go away in time as > it complicates the code. > > The aim of this mail is to ask people to try add the VIMAGE option > to their regular kernels and try use them as you woudl normally. > You will not yet be able to use any new VIMAGE features but we > should be fully compatible with previous kernels. Here is a warning I have when building kernel with options VIMAGE and INET6 : cc -c -O2 -pipe -march=native -fno-strict-aliasing -std=c99 -g -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/work/src/sys -I/work/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror /work/src/sys/netinet6/mld6.c cc1: warnings being treated as errors /work/src/sys/netinet6/mld6.c: In function 'vnet_mld_idetach': /work/src/sys/netinet6/mld6.c:3145: warning: unused variable 'vnet_inet6' *** Error code 1 > > Please report any concerns to the freebsd-virtualization@ mailing list. > > THEORETICALLY you should not see any changes in behaviour, however we have > the following issues: > > * SCTP is not fully converted yet. add 'nooptions SCTP' for now if you > ?are not using it yet. > > * An NFS (crash) issue was reported. This MAY have been fixed... > > > Theory tells us that all three kernel options should behave about the same > but if you do try this, and have any benchmarking facilities, > it would be incredibly useful if you could let us know if you see any > performance changes between the three. > > > thanks, > > Julian (currently running a VIMAGE kernel myself) > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From julian at elischer.org Sat May 2 19:03:28 2009 From: julian at elischer.org (Julian Elischer) Date: Sat May 2 19:03:34 2009 Subject: VIMAGE status In-Reply-To: <367b2c980905021130i76012f91i7cce93edd55cacad@mail.gmail.com> References: <49FC812B.2070305@elischer.org> <367b2c980905021130i76012f91i7cce93edd55cacad@mail.gmail.com> Message-ID: <49FC9902.0@elischer.org> Olivier SMEDTS wrote: > 2009/5/2 Julian Elischer : >> The VIMAGE code is nearly all in the the kernel. >> >> One is now able to make VIMAGE kernels (add options VIMAGE) >> though they don't actually allow you to make multiple >> vimages instances yet.. >> >> The VIMAGE option enables all the low level changes needed >> throughout the kernel. >> > Here is a warning I have when building kernel with options VIMAGE and INET6 : > > cc -c -O2 -pipe -march=native -fno-strict-aliasing -std=c99 -g -Wall > -Wredundant-decls -Wnested-externs -Wstrict-prototypes > -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef > -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/work/src/sys > -I/work/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS > -include opt_global.h -fno-common -finline-limit=8000 --param > inline-unit-growth=100 --param large-function-growth=1000 > -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 > -mno-sse3 -mno-mmx -mno-3dnow -msoft-float > -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector > -Werror /work/src/sys/netinet6/mld6.c > cc1: warnings being treated as errors > /work/src/sys/netinet6/mld6.c: In function 'vnet_mld_idetach': > /work/src/sys/netinet6/mld6.c:3145: warning: unused variable 'vnet_inet6' > *** Error code 1 > > > I assume you do not have INVARIANTS.. can you just put #ifdef INVARIANTS around that line and do the compile again? From olivier at gid0.org Sat May 2 19:57:49 2009 From: olivier at gid0.org (Olivier SMEDTS) Date: Sat May 2 19:57:56 2009 Subject: VIMAGE status In-Reply-To: <49FC9902.0@elischer.org> References: <49FC812B.2070305@elischer.org> <367b2c980905021130i76012f91i7cce93edd55cacad@mail.gmail.com> <49FC9902.0@elischer.org> Message-ID: <367b2c980905021257v17e2484fo9d44811b190d256c@mail.gmail.com> 2009/5/2 Julian Elischer : > Olivier SMEDTS wrote: >> >> 2009/5/2 Julian Elischer : >>> >>> The VIMAGE code is nearly all in the the kernel. >>> >>> One is now able to make VIMAGE kernels (add options VIMAGE) >>> though they don't actually allow you to make multiple >>> vimages instances yet.. >>> >>> The VIMAGE option enables all the low level changes needed >>> throughout the kernel. >>> > >> Here is a warning I have when building kernel with options VIMAGE and >> INET6 : >> >> cc -c -O2 -pipe -march=native -fno-strict-aliasing -std=c99 -g -Wall >> -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual ?-Wundef >> -Wno-pointer-sign -fformat-extensions -nostdinc ?-I. ?-I/work/src/sys >> -I/work/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS >> -include opt_global.h -fno-common -finline-limit=8000 --param >> inline-unit-growth=100 --param large-function-growth=1000 >> -mcmodel=kernel -mno-red-zone ?-mfpmath=387 -mno-sse -mno-sse2 >> -mno-sse3 -mno-mmx -mno-3dnow ?-msoft-float >> -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector >> -Werror ?/work/src/sys/netinet6/mld6.c >> cc1: warnings being treated as errors >> /work/src/sys/netinet6/mld6.c: In function 'vnet_mld_idetach': >> /work/src/sys/netinet6/mld6.c:3145: warning: unused variable 'vnet_inet6' >> *** Error code 1 >> > >> >> > > I assume you do not have INVARIANTS.. Right, here is my kernel config file's content (amd64) : cpu HAMMER ident QUAD makeoptions DEBUG=-g options SCHED_ULE options PREEMPTION options IPI_PREEMPTION options INET options INET6 options FFS options SOFTUPDATES options UFS_DIRHASH options COMPAT_IA32 options KTRACE options STACK options SYSVSHM options SYSVMSG options SYSVSEM options _KPOSIX_PRIORITY_SCHEDULING options KBD_INSTALL_CDEV options STOP_NMI options AUDIT options VIMAGE options PRINTF_BUFR_SIZE=128 options SMP device acpi device pci device atkbdc device atkbd device vga device sc device loop device ether device pty device bpf > > can you just put #ifdef INVARIANTS around that line and do the compile > again? > It now compiles without errors. -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From julian at elischer.org Sat May 2 20:15:35 2009 From: julian at elischer.org (Julian Elischer) Date: Sat May 2 20:15:41 2009 Subject: VIMAGE status In-Reply-To: <367b2c980905021257v17e2484fo9d44811b190d256c@mail.gmail.com> References: <49FC812B.2070305@elischer.org> <367b2c980905021130i76012f91i7cce93edd55cacad@mail.gmail.com> <49FC9902.0@elischer.org> <367b2c980905021257v17e2484fo9d44811b190d256c@mail.gmail.com> Message-ID: <49FCA9E8.5060006@elischer.org> Olivier SMEDTS wrote: > 2009/5/2 Julian Elischer : > >> can you just put #ifdef INVARIANTS around that line and do the compile >> again? >> > > It now compiles without errors. yeah my svn machine went back to Cisco when I left there, so I don't have an svn tree at the moment. otherwise I'd just check it in.. Now you have a VIMAGE system, just use it as normal and let us know if you see any unusual behaviour. If you have anything you can benchmark you might try both kernels and see if there are any performance differences. thanks From pcc at gmx.net Sun May 3 10:32:50 2009 From: pcc at gmx.net (Peter Cornelius) Date: Sun May 3 10:32:57 2009 Subject: VIMAGE In-Reply-To: <49FC78DA.2010201@elischer.org> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> Message-ID: <20090503103244.44760@gmx.net> Re... > The situation is that right now jail and vimage are > orthogonal (ish) however in the future, > vimage will become a set of options on jail. Ah. SO it probably is kinda useless to try and stick a couple of jails 'inside' a vimage. Rgds., Peter. -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 From julian at elischer.org Sun May 3 17:51:52 2009 From: julian at elischer.org (Julian Elischer) Date: Sun May 3 17:52:28 2009 Subject: VIMAGE In-Reply-To: <20090503103244.44760@gmx.net> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> Message-ID: <49FDD9B9.7090403@elischer.org> Peter Cornelius wrote: > Re... > >> The situation is that right now jail and vimage are >> orthogonal (ish) however in the future, >> vimage will become a set of options on jail. > > Ah. SO it probably is kinda useless to try and stick a couple of jails 'inside' a vimage. no you will be able to nest jails. some of them may have the vimage options and some may not. > > Rgds., > > Peter. From nvass9573 at gmx.com Sun May 3 18:06:03 2009 From: nvass9573 at gmx.com (Nikos Vassiliadis) Date: Sun May 3 18:06:10 2009 Subject: VIMAGE In-Reply-To: <49FDD9B9.7090403@elischer.org> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> Message-ID: <49FDDD02.3090803@gmx.com> Julian Elischer wrote: > Peter Cornelius wrote: >> Re... >> >>> The situation is that right now jail and vimage are >>> orthogonal (ish) however in the future, >>> vimage will become a set of options on jail. >> >> Ah. SO it probably is kinda useless to try and stick a couple of jails >> 'inside' a vimage. > > no you will be able to nest jails. > some of them may have the vimage options and some may not. What about vimages without jails? I can imagine some applications of VIMAGE which completely lack user-space processing. If I recall correctly a jail exists as far there is at least one process associated with it. Would that be feasible? Having a vimage with no processes? From jamie at FreeBSD.org Mon May 4 02:55:55 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Mon May 4 03:59:53 2009 Subject: VIMAGE In-Reply-To: <49FDDD02.3090803@gmx.com> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> Message-ID: <49FE5937.3000606@FreeBSD.org> Nikos Vassiliadis wrote: > Julian Elischer wrote: >> Peter Cornelius wrote: >>> Re... >>> >>>> The situation is that right now jail and vimage are >>>> orthogonal (ish) however in the future, >>>> vimage will become a set of options on jail. >>> >>> Ah. SO it probably is kinda useless to try and stick a couple of >>> jails 'inside' a vimage. >> >> no you will be able to nest jails. >> some of them may have the vimage options and some may not. > > What about vimages without jails? > I can imagine some applications of VIMAGE which completely > lack user-space processing. If I recall correctly a jail > exists as far there is at least one process associated with > it. Would that be feasible? > Having a vimage with no processes? Jails will be able to exist without processes, and in fact with nothing more than a vimage attached. But much of vimage only makes sense in conjunction with processes - a process attached to a vimage can see that vimage's network interfaces. There are still things like routing that work independent of processes I suppose, but it seems to me much what a vimage does is provide the network stack to the processes it's tied to. - Jamie From julian at elischer.org Mon May 4 05:12:59 2009 From: julian at elischer.org (Julian Elischer) Date: Mon May 4 05:13:06 2009 Subject: VIMAGE In-Reply-To: <49FDDD02.3090803@gmx.com> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> Message-ID: <49FE795E.9040902@elischer.org> Nikos Vassiliadis wrote: > Julian Elischer wrote: >> Peter Cornelius wrote: >>> Re... >>> >>>> The situation is that right now jail and vimage are >>>> orthogonal (ish) however in the future, >>>> vimage will become a set of options on jail. >>> >>> Ah. SO it probably is kinda useless to try and stick a couple of >>> jails 'inside' a vimage. >> >> no you will be able to nest jails. >> some of them may have the vimage options and some may not. > > What about vimages without jails? > I can imagine some applications of VIMAGE which completely > lack user-space processing. If I recall correctly a jail > exists as far there is at least one process associated with > it. Would that be feasible? > Having a vimage with no processes? at this time yes From jamie at FreeBSD.org Mon May 4 02:50:37 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Mon May 4 06:47:17 2009 Subject: New jail framework - the userland side Message-ID: <49FE5387.3020503@FreeBSD.org> Hi all. I recently added some new jail-related system calls to extend the current jail system with an nmount-inspired name=value interface. This not only adds a new interface to the jail system, but allows for future extensions. For the first step, I've just added new system calls to set and read jail parameters. This is step 2: altering jail(8) and jls(8) to work with the new jails. With the included patch, the old "jail path hostname ip-number command..." command line turns to a more general "jail foo=bar baz=bletch ...". There's a set of core parameters to set the things jails can already do, plus the ability to set any parameters that other subsystems may want to tie to jails - work in progress includes the Linux MIB parameters, future ideas include separate namespaces for things like SYSV/Posix IPC. And of course, the plan is to use these new jails to tie in to the Vimage project. This patch is for the jail admin programs, and uses the current kernel as of r191673. You won't yet be able to do anything jails don't do already, but the interface is how I plan for things to look in the future. I'd appreciate comments from anyone who's interested in the future of lightweight virtualization. As a bonus, there are man pages included :-). - Jamie -------------- next part -------------- Index: usr.bin/killall/killall.1 =================================================================== --- usr.bin/killall/killall.1 (revision 191694) +++ usr.bin/killall/killall.1 (working copy) @@ -24,7 +24,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 9, 2007 +.Dd April 30, 2009 .Os .Dt KILLALL 1 .Sh NAME @@ -34,7 +34,7 @@ .Nm .Op Fl delmsvz .Op Fl help -.Op Fl j Ar jid +.Op Fl j Ar jail .Op Fl u Ar user .Op Fl t Ar tty .Op Fl c Ar procname @@ -91,9 +91,9 @@ (with or without a leading .Dq Li SIG ) , or numerically. -.It Fl j Ar jid -Kill processes in the jail specified by -.Ar jid . +.It Fl j Ar jail +Kill processes in the specified +.Ar jail . .It Fl u Ar user Limit potentially matching processes to those belonging to the specified Index: usr.bin/killall/killall.c =================================================================== --- usr.bin/killall/killall.c (revision 191694) +++ usr.bin/killall/killall.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -51,7 +52,7 @@ usage(void) { - fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jid]\n"); + fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jail]\n"); fprintf(stderr, " [-u user] [-t tty] [-c cmd] [-SIGNAL] [cmd]...\n"); fprintf(stderr, "At least one option or argument to specify processes must be given.\n"); @@ -100,6 +101,7 @@ int main(int ac, char **av) { + struct iovec jparams[2]; struct kinfo_proc *procs = NULL, *newprocs; struct stat sb; struct passwd *pw; @@ -159,12 +161,21 @@ } jflag++; if (*av == NULL) - errx(1, "must specify jid"); - jid = strtol(*av, &ep, 10); - if (!*av || *ep) - errx(1, "illegal jid: %s", *av); + errx(1, "must specify jail"); + jid = strtoul(*av, &ep, 10); + if (!**av || *ep) { + *(const void **)&jparams[0].iov_base = + "name"; + jparams[0].iov_len = sizeof("name"); + jparams[1].iov_base = *av; + jparams[1].iov_len = strlen(*av) + 1; + jid = jail_get(jparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", + *av); + } if (jail_attach(jid) == -1) - err(1, "jail_attach(): %d", jid); + err(1, "jail_attach(%d)", jid); break; case 'u': ++*av; Index: usr.sbin/jls/jls.c =================================================================== --- usr.sbin/jls/jls.c (revision 191694) +++ usr.sbin/jls/jls.c (working copy) @@ -1,6 +1,7 @@ /*- * Copyright (c) 2003 Mike Barcroft * Copyright (c) 2008 Bjoern A. Zeeb + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -23,18 +24,20 @@ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. - * - * $FreeBSD$ */ +#include +__FBSDID("$FreeBSD$"); + #include -#include #include +#include #include +#include -#include +#include #include -#include + #include #include #include @@ -43,215 +46,672 @@ #include #include -#define FLAG_A 0x00001 -#define FLAG_V 0x00002 +#define SJPARAM "security.jail.param" +#define ARRAY_SLOP 5 -#ifdef SUPPORT_OLD_XPRISON -static -char *print_xprison_v1(void *p, char *end, unsigned flags) +#define CTLTYPE_BOOL (CTLTYPE + 1) +#define CTLTYPE_NOBOOL (CTLTYPE + 2) +#define CTLTYPE_IPADDR (CTLTYPE + 3) +#define CTLTYPE_IP6ADDR (CTLTYPE + 4) + +#define PARAM_KEY 0x1 +#define PARAM_USER 0x2 +#define PARAM_ARRAY 0x4 +#define PARAM_OPT 0x8 + +#define PRINT_DEFAULT 0x01 +#define PRINT_VDEFAULT 0x02 +#define PRINT_HEADER 0x04 +#define PRINT_NAMEVAL 0x08 +#define PRINT_QUOTED 0x10 + +struct param { + char *name; + void *value; + size_t size; + int type; + unsigned flags; +}; + +struct iovec2 { + struct iovec name; + struct iovec value; +}; + +static struct param *params; +static int nparams; +static char errmsg[256]; + +static void add_param(const char *name, void *value, unsigned flags); +static int get_param(const char *name, struct param *param); +static int sort_param(const void *a, const void *b); +static char *noname(const char *name); +static char *nononame(const char *name); +static int print_jail(int pflags, int jflags); +static void quoted_print(char *str, int len); + +int +main(int argc, char **argv) { - struct xprison_v1 *xp; - struct in_addr in; + char *ep, *jname; + int c, i, jflags, jid, lastjid, pflags; - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); + jname = NULL; + pflags = jflags = jid = 0; + while ((c = getopt(argc, argv, "dj:hnqv")) >= 0) + switch (c) { + case 'd': + jflags |= JAIL_DYING; + break; + case 'j': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) + jname = optarg; + break; + case 'h': + pflags |= PRINT_HEADER; + break; + case 'n': + pflags |= PRINT_NAMEVAL; + break; + case 'q': + pflags |= PRINT_QUOTED; + break; + case 'v': + pflags |= PRINT_VDEFAULT; + break; + default: + errx(1, "usage: jls [-dhnqv] [-j jail] [param ...]"); + } - xp = (struct xprison_v1 *)p; - if (flags & FLAG_V) { - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); - /* We are not printing an empty line here for state and name. */ - /* We are not printing an empty line here for cpusetid. */ - /* IPv4 address. */ - in.s_addr = htonl(xp->pr_ip); - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + /* Add the parameters to print. */ + if (optind == argc) { + if (pflags & PRINT_VDEFAULT) { + add_param("jid", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + add_param("name", NULL, PARAM_USER); + add_param("dying", NULL, PARAM_USER); + add_param("cpuset", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("ip6.addr", NULL, PARAM_USER | PARAM_OPT); + } else { + pflags |= PRINT_DEFAULT; + add_param("jid", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + } + } else + while (optind < argc) + add_param(argv[optind++], NULL, PARAM_USER); + + /* Add the index key and errmsg parameters. */ + if (jid != 0) + add_param("jid", &jid, PARAM_KEY); + else if (jname != NULL) + add_param("name", jname, PARAM_KEY); + else + add_param("lastjid", &lastjid, PARAM_KEY); + add_param("errmsg", errmsg, PARAM_KEY); + + /* Print a header line if requested. */ + if (pflags & PRINT_VDEFAULT) + printf(" JID Hostname Path\n" + " Name State\n" + " CPUSetID\n" + " IP Address(es)\n"); + else if (pflags & PRINT_DEFAULT) + printf(" JID IP Address " + "Hostname Path\n"); + else if (pflags & PRINT_HEADER) { + for (i = 0; i < nparams; i++) + if (params[i].flags & PARAM_USER) { + if (i > 0) + putchar(' '); + fputs(params[i].name, stdout); + } + putchar('\n'); + } + + /* Fetch the jail(s) and print the paramters. */ + if (jid != 0 || jname != NULL) { + if (print_jail(pflags, jflags) < 0) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } else { - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, inet_ntoa(in), xp->pr_host, xp->pr_path); + for (lastjid = 0; + (lastjid = print_jail(pflags, jflags)) >= 0; ) + ; + if (errno != 0 && errno != ENOENT) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } - return ((char *)(xp + 1)); + return (0); } -#endif -static -char *print_xprison_v3(void *p, char *end, unsigned flags) +static void +add_param(const char *name, void *value, unsigned flags) { - struct xprison *xp; - struct in_addr *iap, in; - struct in6_addr *ia6p; - char buf[INET6_ADDRSTRLEN]; - const char *state; - char *q; - uint32_t i; + struct param *param; + char *nname; + size_t mlen1, mlen2, buflen; + int mib1[CTL_MAXNAME], mib2[CTL_MAXNAME - 2]; + int i, tnparams; + char buf[MAXPATHLEN]; - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - xp = (struct xprison *)p; + static int paramlistsize; - if (xp->pr_state < 0 || xp->pr_state >= (int) - ((sizeof(prison_states) / sizeof(struct prison_state)))) - state = "(bogus)"; - else - state = prison_states[xp->pr_state].state_name; + /* The pseudo-parameter "all" scans the list of available parameters. */ + if (!strcmp(name, "all")) { + tnparams = nparams; + mib1[0] = 0; + mib1[1] = 2; + mlen1 = CTL_MAXNAME - 2; + if (sysctlnametomib(SJPARAM, mib1 + 2, &mlen1) < 0) + err(1, "sysctlnametomib(" SJPARAM ")"); + for (;;) { + /* Get the next parameter. */ + mlen2 = sizeof(mib2); + if (sysctl(mib1, mlen1 + 2, mib2, &mlen2, NULL, 0) < 0) + err(1, "sysctl(0.2)"); + if (mib2[0] != mib1[2] || mib2[1] != mib1[3] || + mib2[2] != mib1[4]) + break; + /* Convert it to an ascii name. */ + memcpy(mib1 + 2, mib2, mlen2); + mlen1 = mlen2 / sizeof(int); + mib1[1] = 1; + buflen = sizeof(buf); + if (sysctl(mib1, mlen1 + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.1)"); + add_param(buf + sizeof(SJPARAM), NULL, flags); + /* + * Convert nobool parameters to bool if their + * counterpart is a node, ortherwise discard them. + */ + param = ¶ms[nparams - 1]; + if (param->type == CTLTYPE_NOBOOL) { + nname = nononame(param->name); + if (get_param(nname, param) >= 0 && + param->type != CTLTYPE_NODE) { + free(nname); + nparams--; + } else { + free(param->name); + param->name = nname; + param->type = CTLTYPE_BOOL; + param->size = sizeof(int); + param->value = NULL; + } + } + mib1[1] = 2; + } - /* See if we should print non-ACTIVE jails. No? */ - if ((flags & FLAG_A) == 0 && strcmp(state, "ALIVE")) { - q = (char *)(xp + 1); - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - return (q); + qsort(params + tnparams, (size_t)(nparams - tnparams), + sizeof(struct param), sort_param); + return; } - if (flags & FLAG_V) - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); + /* Check for repeat parameters. */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name)) { + params[i].value = value; + params[i].flags |= flags; + return; + } - /* Jail state and name. */ - if (flags & FLAG_V) - printf("%6s %-29.29s %.74s\n", - "", (xp->pr_name[0] != '\0') ? xp->pr_name : "", state); + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } - /* cpusetid. */ - if (flags & FLAG_V) - printf("%6s %-6d\n", - "", xp->pr_cpusetid); + /* Look up the parameter. */ + param = params + nparams++; + memset(param, 0, sizeof *param); + param->name = strdup(name); + if (param->name == NULL) + err(1, "strdup"); + param->flags = flags; + /* We have to know about pseudo-parameters without asking. */ + if (!strcmp(param->name, "lastjid")) { + param->type = CTLTYPE_INT; + param->size = sizeof(int); + goto got_type; + } + if (!strcmp(param->name, "errmsg")) { + param->type = CTLTYPE_STRING; + param->size = sizeof(errmsg); + goto got_type; + } + if (get_param(name, param) < 0) { + if (errno != ENOENT) + err(1, "sysctl(0.3.%s)", name); + /* See if this the "no" part of an existing boolean. */ + if ((nname = nononame(name))) { + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_BOOL) { + param->type = CTLTYPE_NOBOOL; + goto got_type; + } + } + if (flags & PARAM_OPT) { + nparams--; + return; + } + errx(1, "unknown parameter: %s", name); + } + if (param->type == CTLTYPE_NODE) { + /* + * A node isn't normally a parameter, but may be a boolean + * if its "no" counterpart exists. + */ + nname = noname(name); + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_NOBOOL) { + param->type = CTLTYPE_BOOL; + goto got_type; + } + errx(1, "unknown parameter: %s", name); + } - q = (char *)(xp + 1); - /* IPv4 addresses. */ - iap = (struct in_addr *)(void *)q; - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - in.s_addr = 0; - for (i = 0; i < xp->pr_ip4s; i++) { - if (i == 0 || flags & FLAG_V) - in.s_addr = iap[i].s_addr; - if (flags & FLAG_V) - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + got_type: + param->value = value; +} + +static int +get_param(const char *name, struct param *param) +{ + char *bufi, *p; + size_t buflen, mlen; + int mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; + + /* Look up the MIB. */ + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + return (-1); + /* Get the type and size. */ + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + param->type = *(int *)buf & CTLTYPE; + bufi = buf + sizeof(int); + p = strchr(bufi, '\0'); + if (p - 2 >= bufi && !strcmp(p - 2, ",a")) { + p[-2] = 0; + param->flags |= PARAM_ARRAY; } - /* IPv6 addresses. */ - ia6p = (struct in6_addr *)(void *)q; - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - for (i = 0; i < xp->pr_ip6s; i++) { - if (flags & FLAG_V) { - inet_ntop(AF_INET6, &ia6p[i], buf, sizeof(buf)); - printf("%6s %s\n", "", buf); + switch (param->type) { + case CTLTYPE_INT: + /* An integer parameter might be a boolean. */ + if (bufi[0] == 'B') + param->type = bufi[1] == 'N' + ? CTLTYPE_NOBOOL : CTLTYPE_BOOL; + case CTLTYPE_UINT: + param->size = sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->size = sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(bufi, "S,in_addr")) { + param->type = CTLTYPE_IPADDR; + param->size = sizeof(struct in_addr); + } else if (!strcmp(bufi, "S,in6_addr")) { + param->type = CTLTYPE_IP6ADDR; + param->size = sizeof(struct in6_addr); } + break; + case CTLTYPE_STRING: + buf[0] = 0; + sysctl(mib + 2, mlen / sizeof(int), buf, &buflen, NULL, 0); + param->size = strtoul(buf, NULL, 10); + if (param->size == 0) + param->size = BUFSIZ; } + return (0); +} - /* If requested print the old style single line version. */ - if (!(flags & FLAG_V)) - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, (in.s_addr) ? inet_ntoa(in) : "", - xp->pr_host, xp->pr_path); +static int +sort_param(const void *a, const void *b) +{ + const struct param *parama, *paramb; + char *ap, *bp; - return (q); + /* Put top-level parameters first. */ + parama = a; + paramb = b; + ap = strchr(parama->name, '.'); + bp = strchr(paramb->name, '.'); + if (ap && !bp) + return (1); + if (bp && !ap) + return (-1); + return (strcmp(parama->name, paramb->name)); } -static void -usage(void) +static char * +noname(const char *name) { + char *nname, *p; - (void)fprintf(stderr, "usage: jls [-av]\n"); - exit(1); + nname = malloc(strlen(name) + 3); + if (nname == NULL) + err(1, "malloc"); + p = strrchr(name, '.'); + if (p != NULL) + sprintf(nname, "%.*s.no%s", p - name, name, p + 1); + else + sprintf(nname, "no%s", name); + return nname; } -int -main(int argc, char *argv[]) -{ - int ch, version; - unsigned flags; - size_t i, j, len; - void *p, *q; +static char * +nononame(const char *name) +{ + char *nname, *p; - flags = 0; - while ((ch = getopt(argc, argv, "av")) != -1) { - switch (ch) { - case 'a': - flags |= FLAG_A; - break; - case 'v': - flags |= FLAG_V; - break; - default: - usage(); - } - } - argc -= optind; - argv += optind; + p = strrchr(name, '.'); + if (strncmp(p ? p + 1 : name, "no", 2)) + return NULL; + nname = malloc(strlen(name) - 1); + if (nname == NULL) + err(1, "malloc"); + if (p != NULL) + sprintf(nname, "%.*s.%s", p - name, name, p + 3); + else + strcpy(nname, name + 2); + return nname; +} - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); +static int +print_jail(int pflags, int jflags) +{ + char *nname; + int i, ai, jid, count, sanity; + char ipbuf[INET6_ADDRSTRLEN]; - j = len; - for (i = 0; i < 4; i++) { - if (len <= 0) - exit(0); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); + static struct iovec2 *iov, *aiov; + static int narray, nkey; - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; + /* Set up the parameter list(s) the first time around. */ + if (iov == NULL) { + iov = malloc(nparams * sizeof(struct iovec2)); + if (iov == NULL) + err(1, "malloc"); + for (i = narray = 0; i < nparams; i++) { + iov[i].name.iov_base = params[i].name; + iov[i].name.iov_len = strlen(params[i].name) + 1; + iov[i].value.iov_base = params[i].value; + iov[i].value.iov_len = + params[i].type == CTLTYPE_STRING && + params[i].value != NULL && + ((char *)params[i].value)[0] != '\0' + ? strlen(params[i].value) + 1 : params[i].size; + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + narray++; + if (params[i].flags & PARAM_KEY) + nkey++; + } + } + if (narray > nkey) { + aiov = malloc(narray * sizeof(struct iovec2)); + if (aiov == NULL) + err(1, "malloc"); + for (i = ai = 0; i < nparams; i++) + if (params[i].flags & + (PARAM_KEY | PARAM_ARRAY)) + aiov[ai++] = iov[i]; + } + } + /* If there are array parameters, find their sizes. */ + if (aiov != NULL) { + for (ai = 0; ai < narray; ai++) + if (aiov[ai].value.iov_base == NULL) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)aiov, 2 * narray, jflags) < 0) + return (-1); + } + /* Allocate storage for all parameters. */ + for (i = ai = 0; i < nparams; i++) { + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + malloc(iov[i].value.iov_len); + } + ai++; + } else + iov[i].value.iov_base = malloc(params[i].size); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + if (params[i].value == NULL) + memset(iov[i].value.iov_base, 0, iov[i].value.iov_len); + } + /* + * Get the actual prison. If there are array elements, retry a few + * times in case the size changed from under us. + */ + if ((jid = jail_get((struct iovec *)iov, 2 * nparams, jflags)) < 0) { + if (errno != EINVAL || aiov == NULL || errmsg[0]) + return (-1); + for (sanity = 0;; sanity++) { + if (sanity == 10) + return (-1); + for (ai = 0; ai < narray; ai++) + if (params[i].flags & PARAM_ARRAY) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)iov, 2 * narray, jflags) < + 0) + return (-1); + for (i = ai = 0; i < nparams; i++) { + if (!(params[i].flags & + (PARAM_KEY | PARAM_ARRAY))) + continue; + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = + aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + realloc(iov[i].value.iov_base, + iov[i].value.iov_len); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + } + ai++; + } + } + } + if (pflags & PRINT_VDEFAULT) { + printf("%6d %-29.29s %.74s\n" + "%6s %-29.29s %.74s\n" + "%6s %-6d\n", + *(int *)iov[0].value.iov_base, + (char *)iov[1].value.iov_base, + (char *)iov[2].value.iov_base, + "", + (char *)iov[3].value.iov_base, + *(int *)iov[4].value.iov_base ? "DYING" : "ACTIVE", + "", + *(int *)iov[5].value.iov_base); + count = iov[6].value.iov_len / sizeof(struct in_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET, + &((struct in_addr *)iov[6].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + if (!strcmp(params[7].name, "ip6.addr")) { + count = iov[7].value.iov_len / sizeof(struct in6_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET6, &((struct in_addr *) + iov[7].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + } + } else if (pflags & PRINT_DEFAULT) + printf("%6d %-15.15s %-29.29s %.74s\n", + *(int *)iov[0].value.iov_base, + iov[1].value.iov_len == 0 ? "-" + : inet_ntoa(*(struct in_addr *)iov[1].value.iov_base), + (char *)iov[2].value.iov_base, + (char *)iov[3].value.iov_base); + else { + for (i = 0; i < nparams; i++) { + if (!(params[i].flags & PARAM_USER)) continue; + if (i > 0) + putchar(' '); + if (pflags & PRINT_NAMEVAL) { + /* + * Generally "name=value", but for booleans + * either "name" or "noname". + */ + switch (params[i].type) { + case CTLTYPE_BOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = noname(params[i].name); + printf("%s", nname); + free(nname); + } + break; + case CTLTYPE_NOBOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = + nononame(params[i].name); + printf("%s", nname); + free(nname); + } + break; + default: + printf("%s=", params[i].name); + } } - err(1, "sysctlbyname(): security.jail.list"); + count = params[i].flags & PARAM_ARRAY + ? iov[i].value.iov_len / params[i].size : 1; + if (count == 0) + putchar('-'); + for (ai = 0; ai < count; ai++) { + if (ai > 0) + putchar(','); + switch (params[i].type) { + case CTLTYPE_INT: + printf("%d", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_UINT: + printf("%u", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_IPADDR: + if (inet_ntop(AF_INET, + &((struct in_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_IP6ADDR: + if (inet_ntop(AF_INET6, + &((struct in6_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_LONG: + printf("%ld", ((long *) + iov[i].value.iov_base)[ai]); + case CTLTYPE_ULONG: + printf("%lu", ((long *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_STRING: + if (pflags & PRINT_QUOTED) + quoted_print((char *) + iov[i].value.iov_base, + params[i].size); + else + printf("%.*s", + params[i].size, (char *) + iov[i].value.iov_base); + break; + case CTLTYPE_BOOL: + case CTLTYPE_NOBOOL: + if (!(pflags & PRINT_NAMEVAL)) + printf(((int *) + iov[i].value.iov_base)[ai] + ? "true" : "false"); + } + } } - break; + putchar('\n'); } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); + for (i = 0; i < nparams; i++) + if (params[i].value == NULL) + free(iov[i].value.iov_base); + return (jid); +} - if (flags & FLAG_V) { - printf(" JID Hostname Path\n"); - printf(" Name State\n"); - printf(" CPUSetID\n"); - printf(" IP Address(es)\n"); - } else { - printf(" JID IP Address Hostname" - " Path\n"); +static void +quoted_print(char *str, int len) +{ + int c, qc; + char *p = str; + char *ep = str + len; + + /* An empty string needs quoting. */ + if (!*p) { + fputs("\"\"", stdout); + return; } - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - q = print_xprison_v1(q, (char *)p + len, flags); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = print_xprison_v3(q, (char *)p + len, flags); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } + + /* + * The value will be surrounded by quotes if it contains spaces + * or quotes. + */ + qc = strchr(p, '\'') ? '"' + : strchr(p, '"') ? '\'' + : strchr(p, ' ') || strchr(p, '\t') ? '"' + : 0; + if (qc) + putchar(qc); + while (p < ep && (c = *p++)) { + if (c == '\\' || c == qc) + putchar('\\'); + putchar(c); } - - free(p); - exit(0); + if (qc) + putchar(qc); } Index: usr.sbin/jls/Makefile =================================================================== --- usr.sbin/jls/Makefile (revision 191694) +++ usr.sbin/jls/Makefile (working copy) @@ -4,6 +4,4 @@ MAN= jls.8 WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jls/jls.8 =================================================================== --- usr.sbin/jls/jls.8 (revision 191694) +++ usr.sbin/jls/jls.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JLS 8 .Os .Sh NAME @@ -33,38 +33,59 @@ .Nd "list jails" .Sh SYNOPSIS .Nm -.Op Fl av +.Op Fl dhnqv +.Op Fl j Ar jail +.Op Ar parameter ... .Sh DESCRIPTION The .Nm -utility lists all jails. -By default only active jails are listed. +utility lists all active jails, or the specified jail. +Each jail is represented by one row which contains space-separated values of +the listed +.Ar parameters , +including the pseudo-parameter +.Va all +which will show all available jail parameters. +A list of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . .Pp -The options are as follows: -.Bl -tag -width ".Fl a" -.It Fl a -Show jails in all states, not only active ones. +If no +.Ar parameters +are given, the following four columns will be printed: +jail identifier (jid), IP address (ip4.addr), hostname (host.hostname), +and path (path). +.Pp +The following options are available: +.Bl -tag -width indent +.It Fl d +List +.Va dying +as well as active jails. +.It Fl h +Print a header line containing the parameters listed. +If no parameters are given on the command line, the default four-column +output always contains a header. +.It Fl n +Print parameters in +.Dq name=value +format, where each parameter is preceded by its name. +This option is ignored for the default four-column output. +.It Fl q +Put quotes around string parameters if they contain spaces or quotes, or are +the empty string. .It Fl v -Show more verbose information. -This also lists cpusets, jail state, multi-IP, etc. instead of the -classic single-IP jail output. +Print a multiple-line summary per jail, with the following parameters: +jail identifier (jid), hostname (host.hostname), path (path), +jail name (name), jail state (dying), cpuset ID (cpuset), +IP address(es) (ip4.addr and ip6.addr). +.It Fl j Ar jail +The jid or name of the +.Ar jail +to list. +Without this option, all active jails will be listed. .El -.Pp -Each jail is represented by rows which, depending on -.Fl v , -contain the following columns: -.Bl -item -offset indent -compact -.It -jail identifier (JID), hostname and path -.It -jail state and name -.It -jail cpuset -.It -followed by one IP adddress per line. -.El .Sh SEE ALSO -.Xr jail 2 , +.Xr jail_get 2 , .Xr jail 8 , .Xr jexec 8 .Sh HISTORY @@ -72,3 +93,5 @@ .Nm utility was added in .Fx 5.1 . +Extensible jail parameters were introduced in +.Fx 8.0 . Index: usr.sbin/jexec/jexec.c =================================================================== --- usr.sbin/jexec/jexec.c (revision 191694) +++ usr.sbin/jexec/jexec.c (working copy) @@ -29,12 +29,16 @@ #include #include +#include #include +#include +#include #include #include #include +#include #include #include #include @@ -43,154 +47,8 @@ #include static void usage(void); +static int addr2jid(const char *addr); -#ifdef SUPPORT_OLD_XPRISON -static -char *lookup_xprison_v1(void *p, char *end, int *id) -{ - struct xprison_v1 *xp; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison_v1 *)p; - - *id = xp->pr_id; - return ((char *)(xp + 1)); -} -#endif - -static -char *lookup_xprison_v3(void *p, char *end, int *id, char *jailname) -{ - struct xprison *xp; - char *q; - int ok; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison *)p; - ok = 1; - - /* Jail state and name. */ - if (xp->pr_state < 0 || xp->pr_state >= - (int)((sizeof(prison_states) / sizeof(struct prison_state)))) - errx(1, "Invalid jail state."); - else if (xp->pr_state != PRISON_STATE_ALIVE) - ok = 0; - if (jailname != NULL) { - if (xp->pr_name[0] == '\0') - ok = 0; - else if (strcmp(jailname, xp->pr_name) != 0) - ok = 0; - } - - q = (char *)(xp + 1); - /* IPv4 addresses. */ - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - /* IPv6 addresses. */ - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - - if (ok) - *id = xp->pr_id; - return (q); -} - -static int -lookup_jail(int jid, char *jailname) -{ - size_t i, j, len; - void *p, *q; - int version, id, xid, count; - - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); - - j = len; - for (i = 0; i < 4; i++) { - if (len == 0) - return (-1); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); - - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; - continue; - } - err(1, "sysctlbyname(): security.jail.list"); - } - break; - } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - - count = 0; - xid = -1; - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - id = -1; - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - if (jailname != NULL) - errx(1, "Version 1 prisons did not " - "support jail names."); - q = lookup_xprison_v1(q, (char *)p + len, &id); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = lookup_xprison_v3(q, (char *)p + len, &id, jailname); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } - /* Possible match; see if we have a jail ID to match as well. */ - if (id > 0 && (jid <= 0 || id == jid)) { - xid = id; - count++; - } - } - - free(p); - - if (count == 1) - return (xid); - else if (count > 1) - errx(1, "Could not uniquely identify the jail."); - else - return (-1); -} - #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -210,22 +68,18 @@ int main(int argc, char *argv[]) { + struct iovec params[2]; int jid; login_cap_t *lcap = NULL; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, ngroups, uflag, Uflag; - char *jailname, *username; + int ch, ngroups, uflag, Uflag, hflag; + char *ep, *username; + ch = uflag = Uflag = hflag = 0; + username = NULL; - ch = uflag = Uflag = 0; - jailname = username = NULL; - jid = -1; - - while ((ch = getopt(argc, argv, "i:n:u:U:")) != -1) { + while ((ch = getopt(argc, argv, "u:U:h")) != -1) { switch (ch) { - case 'n': - jailname = optarg; - break; case 'u': username = optarg; uflag = 1; @@ -234,6 +88,9 @@ username = optarg; Uflag = 1; break; + case 'h': + hflag = 1; + break; default: usage(); } @@ -242,22 +99,24 @@ argv += optind; if (argc < 2) usage(); - if (strlen(argv[0]) > 0) { - jid = (int)strtol(argv[0], NULL, 10); - if (errno) - err(1, "Unable to parse jail ID."); - } - if (jid <= 0 && jailname == NULL) { - fprintf(stderr, "Neither jail ID nor jail name given.\n"); - usage(); - } if (uflag && Uflag) usage(); if (uflag) GET_USER_INFO; - jid = lookup_jail(jid, jailname); - if (jid <= 0) - errx(1, "Cannot identify jail."); + if (hflag) + jid = addr2jid(argv[0]); + else { + jid = strtoul(argv[0], &ep, 10); + if (!*argv[0] || *ep) { + *(const void **)¶ms[0].iov_base = "name"; + params[0].iov_len = sizeof("name"); + params[1].iov_base = argv[0]; + params[1].iov_len = strlen(argv[0]) + 1; + jid = jail_get(params, 2, 0); + if (jid < 0) + errx(1, "Unknown jail: %s", argv[0]); + } + } if (jail_attach(jid) == -1) err(1, "jail_attach(): %d", jid); if (chdir("/") == -1) @@ -285,6 +144,108 @@ fprintf(stderr, "%s%s\n", "usage: jexec [-u username | -U username]", - " [-n jailname] jid command ..."); + " [-h hostname | -h ip-number | jail] command ..."); exit(1); } + +static int +addr2jid(const char *addr) +{ + struct iovec params[6]; + struct in_addr ia; + struct in6_addr ia6; + int cnt, doip, foundjid, ii, jid, lastjid, sanity; + char hostbuf[MAXHOSTNAMELEN]; + + if (inet_pton(AF_INET, addr, &ia) > 0) + doip = 4; + else if (inet_pton(AF_INET6, addr, &ia6) > 0) + doip = 6; + else + doip = 0; + + *(const void **)¶ms[0].iov_base = "lastjid"; + params[0].iov_len = sizeof("lastjid"); + params[1].iov_base = &lastjid; + params[1].iov_len = sizeof(lastjid); + switch (doip) { + case 4: + *(const void **)¶ms[2].iov_base = "ip4.addr"; + params[2].iov_len = sizeof("ip4.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + case 6: + *(const void **)¶ms[2].iov_base = "ip6.addr"; + params[2].iov_len = sizeof("ip6.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + default: + *(const void **)¶ms[2].iov_base = "host.hostname"; + params[2].iov_len = sizeof("host.hostname"); + params[3].iov_base = hostbuf; + params[3].iov_len = MAXHOSTNAMELEN; + } + + cnt = foundjid = sanity = 0; + for (jid = 0;; jid = lastjid) { + if (doip != 0) { + params[3].iov_base = NULL; + params[3].iov_len = 0; + if (jail_get(params, 4, 0) < 0) + break; + params[3].iov_len += 5 * sizeof(struct in6_addr); + params[3].iov_base = malloc(params[3].iov_len); + jid = jail_get(params, 6, 0); + } else + jid = jail_get(params, 4, 0); + if (jid > 0) { + sanity = 0; + if (!strcmp(hostbuf, addr)) { + cnt++; + foundjid = jid; + } else switch (doip) { + case 4: + for (ii = (params[3].iov_len / + sizeof(struct in_addr)) - 1; ii >= 0; ii--) + if (((struct in_addr *)params[3]. + iov_base)[ii].s_addr == ia.s_addr) { + cnt++; + foundjid = jid; + break; + } + break; + case 6: + for (ii = (params[3].iov_len / + sizeof(struct in6_addr)) - 1; ii >= 0; + ii--) + if (IN6_ARE_ADDR_EQUAL(&ia6, + &((struct in6_addr *) + params[3].iov_base)[ii])) { + cnt++; + foundjid = jid; + break; + } + } + } else if (errno == ENOENT || ++sanity > 10) + break; + else + jid = lastjid; + if (doip != 0) + free(params[3].iov_base); + } + switch (cnt) + { + case 0: + errx(1, "Unknown jail: %s", addr); + case 1: + return foundjid; + default: + errx(1, "Could not uniquely identify the jail: %s", addr); + } +} Index: usr.sbin/jexec/jexec.8 =================================================================== --- usr.sbin/jexec/jexec.8 (revision 191694) +++ usr.sbin/jexec/jexec.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JEXEC 8 .Os .Sh NAME @@ -34,36 +34,22 @@ .Sh SYNOPSIS .Nm .Op Fl u Ar username | Fl U Ar username -.Op Fl n Ar jailname -.Ar jid command ... +.Op Fl h Ar hostname | Fl h Ar ip | Ar jid | Ar name +.Ar command ... .Sh DESCRIPTION The .Nm utility executes .Ar command -inside the jail identified by either -.Ar jailname +inside the jail identified by +.Ar hostname , +.Ar ip , +.Ar jid , or -.Ar jid -or both. +.Ar name . .Pp -If the jail cannot be identified uniquely by the given parameters, -an error message is printed. -.Nm -will also check the state of the jail (once supported) to be -.Dv ALIVE -and ignore jails in other states. -The mandatory argument -.Ar jid -is the unique jail identifier as given by -.Xr jls 8 . -In case you only want to match on other criteria, give an empty string. -.Pp The following options are available: .Bl -tag -width indent -.It Fl n Ar jailname -The name of the jail, if given upon creation of the jail. -This is not the hostname of the jail. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -73,6 +59,9 @@ .Ar command should run. .El +.Sh "CAUTIONS" +Only a jail's jid or name is guaranteed to uniquely identify the jail. +Hostname or ip only work here if matched to one unique jail. .Sh SEE ALSO .Xr jail_attach 2 , .Xr jail 8 , Index: usr.sbin/jexec/Makefile =================================================================== --- usr.sbin/jexec/Makefile (revision 191694) +++ usr.sbin/jexec/Makefile (working copy) @@ -6,6 +6,4 @@ LDADD= -lutil WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jail/jail.c =================================================================== --- usr.sbin/jail/jail.c (revision 191694) +++ usr.sbin/jail/jail.c (working copy) @@ -1,5 +1,6 @@ /*- * Copyright (c) 1999 Poul-Henning Kamp. + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -29,51 +30,43 @@ #include #include -#include #include #include -#include +#include +#include #include -#include -#include +#include #include #include #include #include +#include #include #include #include #include -#include #include #include -static void usage(void); -static int add_addresses(struct addrinfo *); -static struct in_addr *copy_addr4(void); -#ifdef INET6 -static struct in6_addr *copy_addr6(void); -#endif +#define SJPARAM "security.jail.param" +#define ERRMSG_SIZE 256 -extern char **environ; - -struct addr4entry { - STAILQ_ENTRY(addr4entry) addr4entries; - struct in_addr ip4; - int count; +struct param { + struct iovec name; + struct iovec value; }; -struct addr6entry { - STAILQ_ENTRY(addr6entry) addr6entries; -#ifdef INET6 - struct in6_addr ip6; -#endif - int count; -}; -STAILQ_HEAD(addr4head, addr4entry) addr4 = STAILQ_HEAD_INITIALIZER(addr4); -STAILQ_HEAD(addr6head, addr6entry) addr6 = STAILQ_HEAD_INITIALIZER(addr6); +static struct param *params; +static int nparams; + +static void set_param(const char *name, char *value); +static void set_param_ip_hostname(char *value, int family); +static void usage(void); + +extern char **environ; + #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -94,27 +87,28 @@ main(int argc, char **argv) { login_cap_t *lcap = NULL; - struct jail j; + struct iovec rparams[2]; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, error, i, ngroups, securelevel; - int hflag, iflag, Jflag, lflag, uflag, Uflag; - char path[PATH_MAX], *jailname, *ep, *username, *JidFile, *ip; + int ch, cmdarg, i, jail_set_flags, jid, ngroups, oldargs, securelevel; + int iflag, Jflag, lflag, rflag, uflag, Uflag; + char *ep, *username, *JidFile; + char errmsg[ERRMSG_SIZE]; static char *cleanenv; const char *shell, *p = NULL; long ltmp; FILE *fp; - struct addrinfo hints, *res0; - hflag = iflag = Jflag = lflag = uflag = Uflag = 0; - securelevel = -1; - jailname = username = JidFile = cleanenv = NULL; + iflag = Jflag = lflag = rflag = uflag = Uflag = 0; + jail_set_flags = JAIL_CREATE | JAIL_UPDATE; + cmdarg = jid = securelevel = -1; + username = JidFile = cleanenv = NULL; fp = NULL; - while ((ch = getopt(argc, argv, "hiln:s:u:U:J:")) != -1) { + while ((ch = getopt(argc, argv, "cdilor:s:u:U:J:")) != -1) { switch (ch) { - case 'h': - hflag = 1; + case 'd': + jail_set_flags |= JAIL_DYING; break; case 'i': iflag = 1; @@ -123,9 +117,6 @@ JidFile = optarg; Jflag = 1; break; - case 'n': - jailname = optarg; - break; case 's': ltmp = strtol(optarg, &ep, 0); if (*ep || ep == optarg || ltmp > INT_MAX || !ltmp) @@ -143,13 +134,41 @@ case 'l': lflag = 1; break; + case 'c': + jail_set_flags = + (jail_set_flags & ~JAIL_UPDATE) | JAIL_CREATE; + break; + case 'o': + jail_set_flags = + (jail_set_flags & ~JAIL_CREATE) | JAIL_UPDATE; + break; + case 'r': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) { + *(const void **)&rparams[0].iov_base = "name"; + rparams[0].iov_len = sizeof("name"); + rparams[1].iov_base = optarg; + rparams[1].iov_len = strlen(optarg) + 1; + jid = jail_get(rparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", optarg); + } + rflag = 1; + break; default: usage(); } } argc -= optind; argv += optind; - if (argc < 4) + if (rflag) { + if (argc > 0 || iflag || Jflag || lflag || uflag || Uflag) + usage(); + if (jail_remove(jid) < 0) + err(1, "jail_remove"); + exit (0); + } + if (argc == 0) usage(); if (uflag && Uflag) usage(); @@ -157,92 +176,70 @@ usage(); if (uflag) GET_USER_INFO; - if (realpath(argv[0], path) == NULL) - err(1, "realpath: %s", argv[0]); - if (chdir(path) != 0) - err(1, "chdir: %s", path); - /* Initialize struct jail. */ - memset(&j, 0, sizeof(j)); - j.version = JAIL_API_VERSION; - j.path = path; - j.hostname = argv[1]; - if (jailname != NULL) - j.jailname = jailname; - /* Handle IP addresses. If requested resolve hostname too. */ - bzero(&hints, sizeof(struct addrinfo)); - hints.ai_protocol = IPPROTO_TCP; - hints.ai_socktype = SOCK_STREAM; - if (JAIL_API_VERSION < 2) - hints.ai_family = PF_INET; - else - hints.ai_family = PF_UNSPEC; - /* Handle hostname. */ - if (hflag != 0) { - error = getaddrinfo(j.hostname, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle hostname: %s", - gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); + /* + * If the first argument (path) starts with a slash, and the third + * argument (IP address) starts with a digit, it is likely to be + * an old-style fixed-parameter command line. + */ + oldargs = argc >= 4 && argv[0][0] == '/' && isdigit(argv[2][0]); + if (oldargs) { + if ((jail_set_flags & (JAIL_CREATE | JAIL_UPDATE)) != + (JAIL_CREATE | JAIL_UPDATE)) + usage(); + jail_set_flags = JAIL_CREATE | JAIL_ATTACH; + set_param("path", argv[0]); + set_param("host.hostname", argv[1]); + set_param("ip4.addr", argv[2]); + cmdarg = 3; + } else { + for (i = 0; i < argc; i++) + if (!strncmp(argv[i], "command=", 8)) { + cmdarg = i; + argv[cmdarg] += 8; + jail_set_flags |= JAIL_ATTACH; + break; + } else + set_param(NULL, argv[i]); } - /* Handle IP addresses. */ - hints.ai_flags = AI_NUMERICHOST; - ip = strtok(argv[2], ","); - while (ip != NULL) { - error = getaddrinfo(ip, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle ip: %s", gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); - ip = strtok(NULL, ","); - } - /* Count IP addresses and add them to struct jail. */ - if (!STAILQ_EMPTY(&addr4)) { - j.ip4s = STAILQ_FIRST(&addr4)->count; - j.ip4 = copy_addr4(); - if (j.ip4s > 0 && j.ip4 == NULL) - errx(1, "copy_addr4()"); - } -#ifdef INET6 - if (!STAILQ_EMPTY(&addr6)) { - j.ip6s = STAILQ_FIRST(&addr6)->count; - j.ip6 = copy_addr6(); - if (j.ip6s > 0 && j.ip6 == NULL) - errx(1, "copy_addr6()"); - } -#endif + errmsg[0] = 0; + set_param("errmsg", errmsg); if (Jflag) { fp = fopen(JidFile, "w"); if (fp == NULL) errx(1, "Could not create JidFile: %s", JidFile); } - i = jail(&j); - if (i == -1) - err(1, "syscall failed with"); + jid = jail_set(¶ms->name, 2 * nparams, jail_set_flags); + if (jid < 0) { + if (errmsg[0] != '\0') + errx(1, "%s", errmsg); + err(1, "jail_set"); + } if (iflag) { - printf("%d\n", i); + printf("%d\n", jid); fflush(stdout); } if (Jflag) { - if (fp != NULL) { + if (oldargs) fprintf(fp, "%d\t%s\t%s\t%s\t%s\n", - i, j.path, j.hostname, argv[2], argv[3]); - (void)fclose(fp); - } else { - errx(1, "Could not write JidFile: %s", JidFile); + jid, (char *)params[0].value.iov_base, + argv[1], argv[2], argv[3]); + else { + fprintf(fp, "%d", jid); + for (i = 0; i < argc; i++) + fprintf(fp, "\t%s", argv[i]); + fprintf(fp, "\n"); } + (void)fclose(fp); } if (securelevel > 0) { if (sysctlbyname("kern.securelevel", NULL, 0, &securelevel, sizeof(securelevel))) err(1, "Can not set securelevel to %d", securelevel); } + if (cmdarg < 0) + exit(0); if (username != NULL) { if (Uflag) GET_USER_INFO; @@ -272,158 +269,256 @@ if (p) setenv("TERM", p, 1); } - if (execv(argv[3], argv + 3) != 0) - err(1, "execv: %s", argv[3]); - exit(0); + execvp(argv[cmdarg], argv + cmdarg); + err(1, "execvp: %s", argv[cmdarg]); } static void -usage(void) +set_param(const char *name, char *value) { + struct param *param; + char *ep, *p; + size_t buflen, mlen; + int i, nval, mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; - (void)fprintf(stderr, "%s%s%s\n", - "usage: jail [-hi] [-n jailname] [-J jid_file] ", - "[-s securelevel] [-l -u username | -U username] ", - "path hostname [ip[,..]] command ..."); - exit(1); -} + static int paramlistsize; -static int -add_addresses(struct addrinfo *res0) -{ - int error; - struct addrinfo *res; - struct addr4entry *a4p; - struct sockaddr_in *sai; + /* Separate the name from the value, if not done already. */ + if (name == NULL) { + name = value; + if ((value = strchr(value, '='))) + *value++ = '\0'; + } + + /* Handle pseudo-parameters separately. */ + if (!strcmp(name, "ip4_hostname")) { + set_param_ip_hostname(value, AF_INET); + return; + } #ifdef INET6 - struct addr6entry *a6p; - struct sockaddr_in6 *sai6; + if (!strcmp(name, "ip6_hostname")) { + set_param_ip_hostname(value, AF_INET6); + return; + } #endif - int count; - error = 0; - for (res = res0; res && error == 0; res = res->ai_next) { - switch (res->ai_family) { - case AF_INET: - sai = (struct sockaddr_in *)(void *)res->ai_addr; - STAILQ_FOREACH(a4p, &addr4, addr4entries) { - if (bcmp(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)) == 0) { - err(1, "Ignoring duplicate IPv4 address."); - break; - } - } - a4p = (struct addr4entry *) malloc( - sizeof(struct addr4entry)); - if (a4p == NULL) { - error = 1; - break; - } - bzero(a4p, sizeof(struct addr4entry)); - bcopy(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)); - if (!STAILQ_EMPTY(&addr4)) - count = STAILQ_FIRST(&addr4)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr4, a4p, addr4entries); - STAILQ_FIRST(&addr4)->count = count + 1; + /* Check for repeat parameters */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name.iov_base)) { + memcpy(params + i, params + i + 1, + (--nparams - i) * sizeof(struct param)); break; + } + + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } + + /* Look up the paramter. */ + param = params + nparams++; + *(const void **)¶m->name.iov_base = name; + param->name.iov_len = strlen(name) + 1; + /* Trivial values - no value or errmsg. */ + if (value == NULL) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + if (!strcmp(name, "errmsg")) { + param->value.iov_base = value; + param->value.iov_len = ERRMSG_SIZE; + return; + } + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + errx(1, "unknown parameter: %s", name); + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + /* + * See if this is an array type. + * Treat non-arrays as an array of one. + */ + p = strchr(buf + sizeof(int), '\0'); + nval = 1; + if (p - 2 >= buf && !strcmp(p - 2, ",a")) { + if (value[0] == '\0' || + (value[0] == '-' && value[1] == '\0')) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + p[-2] = 0; + for (p = strchr(value, ','); p; p = strchr(p + 1, ',')) { + *p = 0; + nval++; + } + } + + /* Set the values according to the parameter type. */ + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + case CTLTYPE_UINT: + param->value.iov_len = nval * sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->value.iov_len = nval * sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) + param->value.iov_len = nval * sizeof(struct in_addr); #ifdef INET6 - case AF_INET6: - sai6 = (struct sockaddr_in6 *)(void *)res->ai_addr; - STAILQ_FOREACH(a6p, &addr6, addr6entries) { - if (bcmp(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)) == 0) { - err(1, "Ignoring duplicate IPv6 address."); - break; - } + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) + param->value.iov_len = nval * sizeof(struct in6_addr); +#endif + else + errx(1, "%s: unknown parameter structure (%s)", + name, buf + sizeof(int)); + break; + case CTLTYPE_STRING: + if (!strcmp(name, "path")) { + param->value.iov_base = malloc(MAXPATHLEN); + if (param->value.iov_base == NULL) + err(1, "malloc"); + if (realpath(value, param->value.iov_base) == NULL) + err(1, "%s: realpath(%s)", name, value); + if (chdir(param->value.iov_base) != 0) + err(1, "chdir: %s", + (char *)param->value.iov_base); + } else + param->value.iov_base = value; + param->value.iov_len = strlen(param->value.iov_base) + 1; + return; + default: + errx(1, "%s: unknown parameter type %d (%s)", + name, *(int *)buf, buf + sizeof(int)); + } + param->value.iov_base = malloc(param->value.iov_len); + for (i = 0; i < nval; i++) { + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + ((int *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_UINT: + ((unsigned *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_LONG: + ((long *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_ULONG: + ((unsigned long *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) { + if (inet_pton(AF_INET, value, + &((struct in_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv4 address: %s", + name, value); } - a6p = (struct addr6entry *) malloc( - sizeof(struct addr6entry)); - if (a6p == NULL) { - error = 1; - break; +#ifdef INET6 + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) { + if (inet_pton(AF_INET6, value, + &((struct in6_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv6 address: %s", + name, value); } - bzero(a6p, sizeof(struct addr6entry)); - bcopy(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)); - if (!STAILQ_EMPTY(&addr6)) - count = STAILQ_FIRST(&addr6)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr6, a6p, addr6entries); - STAILQ_FIRST(&addr6)->count = count + 1; - break; #endif - default: - err(1, "Address family %d not supported. Ignoring.\n", - res->ai_family); - break; } + value = strchr(value, '\0') + 1; } - - return (error); } -static struct in_addr * -copy_addr4(void) +static void +set_param_ip_hostname(char *value, int family) { - size_t len; - struct in_addr *ip4s, *p, ia; - struct addr4entry *a4p; + struct addrinfo hints, *ai0, *ai; + char *avalue, *nextav; + socklen_t avlen; + int error; - if (STAILQ_EMPTY(&addr4)) - return NULL; + /* Look up the hostname in the specified address family. */ + memset(&hints, 0, sizeof(hints)); + hints.ai_family = family; + error = getaddrinfo(value, NULL, &hints, &ai0); + if (error != 0) + errx(1, "hostname %s: %s", value, gai_strerror(error)); - len = STAILQ_FIRST(&addr4)->count * sizeof(struct in_addr); - - ip4s = p = (struct in_addr *)malloc(len); - if (ip4s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr4)) { - a4p = STAILQ_FIRST(&addr4); - STAILQ_REMOVE_HEAD(&addr4, addr4entries); - ia.s_addr = a4p->ip4.s_addr; - bcopy(&ia, p, sizeof(struct in_addr)); - p++; - free(a4p); + /* Convert the addresses to ASCII so set_param can convert them back. */ + avlen = 0; + for (ai = ai0; ai; ai = ai->ai_next) + avlen++; + avlen *= +#ifdef INET6 + family == AF_INET6 ? INET6_ADDRSTRLEN : +#endif + INET_ADDRSTRLEN; + avalue = malloc(avlen); + if (avalue == NULL) + err(1, "malloc"); + avalue[0] = 0; + for (nextav = avalue, ai = ai0; ai; ai = ai->ai_next) { + if (inet_ntop(family, +#ifdef INET6 + family == AF_INET6 ? + (void *)&((struct sockaddr_in6 *)&ai->ai_addr)->sin6_addr : +#endif + (void *)&((struct sockaddr_in *)&ai->ai_addr)->sin_addr, + nextav, avlen - (nextav - avalue)) == NULL) + err(1, "inet_ntop"); + if (ai->ai_next) { + nextav = strchr(nextav, '\0'); + *nextav++ = ','; + } } - - return (ip4s); + set_param( +#ifdef INET6 + family == AF_INET6 ? "ip6.addr" : +#endif + "ip4.addr", avalue); } -#ifdef INET6 -static struct in6_addr * -copy_addr6(void) +static void +usage(void) { - size_t len; - struct in6_addr *ip6s, *p; - struct addr6entry *a6p; - if (STAILQ_EMPTY(&addr6)) - return NULL; - - len = STAILQ_FIRST(&addr6)->count * sizeof(struct in6_addr); - - ip6s = p = (struct in6_addr *)malloc(len); - if (ip6s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr6)) { - a6p = STAILQ_FIRST(&addr6); - STAILQ_REMOVE_HEAD(&addr6, addr6entries); - bcopy(&a6p->ip6, p, sizeof(struct in6_addr)); - p++; - free(a6p); - } - - return (ip6s); + (void)fprintf(stderr, + "usage: jail [-d] [-i] [-J jid_file] [-s securelevel]\n" + " [-l -u username | -U username]\n" + " [[-c | -o] param=value ... [command=command ...] |\n" + " path hostname ip command ...]\n" + " jail [-r jail]\n"); + exit(1); } -#endif - Index: usr.sbin/jail/jail.8 =================================================================== --- usr.sbin/jail/jail.8 (revision 191694) +++ usr.sbin/jail/jail.8 (working copy) @@ -1,5 +1,6 @@ .\" .\" Copyright (c) 2000, 2003 Robert N. M. Watson +.\" Copyright (c) 2008 James Gritton .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without @@ -33,49 +34,37 @@ .\" .\" $FreeBSD$ .\" -.Dd January 24, 2009 +.Dd April 30, 2009 .Dt JAIL 8 .Os .Sh NAME .Nm jail -.Nd "imprison process and its descendants" +.Nd "create or modify a system jail" .Sh SYNOPSIS .Nm -.Op Fl hi -.Op Fl n Ar jailname +.Op Fl di .Op Fl J Ar jid_file .Op Fl s Ar securelevel .Op Fl l u Ar username | Fl U Ar username -.Ar path hostname [ip[,..]] command ... +.Op Fl c | o +.Op Ar parameter=value ... | path hostname ip command ... +.Br +.Nm +.Op Fl r Ar jail .Sh DESCRIPTION The .Nm -utility imprisons a process and all future descendants. +utility creates a new jail or modifies an existing jail, optionally +imprisoning the current process (and future descendants) inside it. .Pp The options are as follows: -.Bl -tag -width ".Fl u Ar username" -.It Fl h -Resolve -.Va hostname -and add all IP addresses returned by the resolver -to the list of -.Va ip-addresses -for this prison. -This may affect default address selection for outgoing IPv4 connections -of prisons. -The address first returned by the resolver for each address family -will be used as primary address. -See -.Va ip-addresses -further down for details. +.Bl -tag -width indent +.It Fl d +Allow making changes to a +.Va +dying jail. .It Fl i Output the jail identifier of the newly created jail. -.It Fl n Ar jailname -Assign and administrative name to the jail that can be used for management -or auditing purposes. -The system will -.Sy not enforce -the name to be unique. .It Fl J Ar jid_file Write a .Ar jid_file @@ -100,7 +89,10 @@ .It Fl s Ar securelevel Sets the .Va kern.securelevel -sysctl variable to the specified value inside the newly created jail. +MIB entry to the specified value inside the newly created jail. +This is equivalent to setting the jail's +.Va securelevel +parameter. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -109,20 +101,156 @@ The user name from jailed environment as whom the .Ar command should run. -.It Ar path +.It Fl c +Create a new jail, but do not modify an existing one. +Default behavior is to allow modification if a +.Va jid +or +.Va name +parameter refers to an existing jail. +.It Fl o +Only modify an existing jail, but do not create one. +One of the +.Va jid +or +.Va name +parameters must exist and refer to an existing jail. +.It Fl r +Remove the +.Ar jail +specified by jid or name. +All jailed processes are killed. +.El +.Pp +.Ar Parameters +are listed in +.Dq name=value +form, following the options. +Some parameters are boolean, and do not have a value but are set by the +name alone with or without a +.Dq no +prefix, e.g. +.Va persist +or +.Va nopersist . +Any parameters not set will be given default values, generally based on the +current environment. +.Pp +The pseudo-parameter +.Va command +specifies that the current process should enter the new (or modified) jail, +and run the specified command. +It must be the last parameter specified, because it includes not only +the value following the +.Sq = +sign, but also passes the rest of the arguments to the command. +.Pp +Instead of supplying named +.Ar parameters , +four fixed parameters may be supplied in order on the command line: +.Ar path , +.Ar hostname , +.Ar ip , +and +.Ar command . +As the +.Va jid +and +.Va name +parameters aren't in this list, this mode will always create a new jail, and +the +.Fl c +and +.Fl o +options don't apply. +.Pp +Jails have a set a core parameters, and modules can add their own jail +parameters. +The current set of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . +Some of the notable core parameters include: +.Bl -tag -width indent +.It Va jid +The jail identifier. +This will be assigned automatically to a new jail (or can be explicitly +set), and can be used to identify the jail for later modification, or +for such commands as +.Xr jls 8 +or +.Xr jexec 8 . +.It Va name +The jail name. +This is an arbitrary string that identifies a jail. +Like the +.Va jid , +it can be passed to later +.Nm +commands, or to +.Xr jls 8 +or +.Xr jexec 8 . +If no +.Va name +is supplied, a default is assumed that is the same as the +.Va jid . +.It Va path Directory which is to be the root of the prison. -.It Ar hostname -Hostname of the prison. -.It Ar ip-addresses -None, one or more IPv4 and IPv6 addresses assigned to the prison. -The first address of each address family that was assigned to the jail will -be used as the source address in case source address selection on unbound -sockets cannot find a better match. +The +.Va command +(if any) is run from this directory, as are commands from +.Xr jexec 8 . +.It Va ip4.addr +A comma-separated list of IPv4 addresses assigned to the prison. +If this is set, the jail is restricted to using only these address. +Any attempts to use other addresses fail, and attempts to use wildcard +addresses silently use the jailed address instead. +For IPv4 the first address given will be kept used as the source address +in case source address selection on unbound sockets cannot find a better +match. It is only possible to start multiple jails with the same IP address, if none of the jails has more than this single overlapping IP address -assigned to itself for the address family in question. -.It Ar command -Pathname of the program which is to be executed. +assigned to itself. +.Pp +A list of zero elements (an empty string) will stop the jail from using IPv4 +entirely; setting the boolean parameter +.Ar noip4 +will not restrict the jail at all. +.It Va ip6.addr +A list of IPv6 addresses assigned to the prison, the counterpart to +.Ar ip4.addr +above. +.It Va host.hostname +Hostname of the prison. +If not specified, a jail will use the system hostname. +.It Va ip4_hostname +.It Va ip6_hostname +These psuedo-parameters actually set the jail's +.Va ip4 +and +.Va ip6 +parameters, but will get those addresses by resolving the supplied hostname. +.It Va securelevel +The value of the jail's +.Va kern.securelevel +sysctl. +A jail never has a lower securelevel than the default system, but by +setting this parameter it may have a higher one. +If the system securelevel is changed, any jail securelevels will be at +least as secure. +.It Va persist +Setting this boolean parameter allows a jail to exist without any +processes. +Normally, a jail is destroyed as its last process exits. +.It Va command +The command to run after creating or modifying the jail. +This command is run inside the jail, under the +.Va path +directory. +A new jail must have either the +.Va persist +or +.Va command +parameter set. .El .Pp Jails are typically set up using one of two philosophies: either to @@ -142,10 +270,6 @@ This manual page documents the configuration steps necessary to support either of these steps, although the configuration steps may be refined based on local requirements. -.Pp -Please see the -.Xr jail 2 -man page for further details. .Sh EXAMPLES .Ss "Setting up a Jail Directory Tree" To set up a jail directory tree containing an entire @@ -605,7 +729,7 @@ a jail. This functionality is disabled by default, but can be enabled by setting this MIB entry to 1. -.It Va security.jail.jail_max_af_ips +.It Va security.jail.max_af_ips This MIB entry determines how may address per address family a prison may have. The default is 255. .El @@ -641,7 +765,7 @@ .Xr ps 1 , .Xr quota 1 , .Xr chroot 2 , -.Xr jail 2 , +.Xr jail_set 2 , .Xr jail_attach 2 , .Xr procfs 5 , .Xr rc.conf 5 , @@ -665,6 +789,8 @@ .Nm utility appeared in .Fx 4.0 . +Extensible jail parameters were introduced in +.Fx 8.0 . .Sh AUTHORS .An -nosplit The jail feature was written by @@ -683,6 +809,9 @@ originally done by .An Pawel Jakub Dawidek for IPv4. +.Pp +.An James Gritton +added the extensible jail parameters. .Sh BUGS Jail currently lacks the ability to allow access to specific jail information via Index: sys/sys/jail.h =================================================================== --- sys/sys/jail.h (revision 191694) +++ sys/sys/jail.h (working copy) @@ -84,19 +84,11 @@ struct in6_addr pr_ip6[]; #endif }; -#define XPRISON_VERSION 3 +#define XPRISON_VERSION 3 -static const struct prison_state { - int pr_state; - const char * state_name; -} prison_states[] = { -#define PRISON_STATE_INVALID 0 - { PRISON_STATE_INVALID, "INVALID" }, -#define PRISON_STATE_ALIVE 1 - { PRISON_STATE_ALIVE, "ALIVE" }, -#define PRISON_STATE_DYING 2 - { PRISON_STATE_DYING, "DYING" }, -}; +#define PRISON_STATE_INVALID 0 +#define PRISON_STATE_ALIVE 1 +#define PRISON_STATE_DYING 2 /* * Flags for jail_set and jail_get. From phk at phk.freebsd.dk Mon May 4 06:42:11 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon May 4 06:47:18 2009 Subject: New jail framework - the userland side In-Reply-To: Your message of "Sun, 03 May 2009 20:31:35 CST." <49FE5387.3020503@FreeBSD.org> Message-ID: <4424.1241418320@critter.freebsd.dk> In message <49FE5387.3020503@FreeBSD.org>, Jamie Gritton writes: >Hi all. I recently added some new jail-related system calls to extend >the current jail system with an nmount-inspired name=value interface. I think this is a great move in the right direction, my only concern is that we should try to share as much of the string-munging code between the nmount and jail implementations as possible. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From nvass9573 at gmx.com Mon May 4 12:27:33 2009 From: nvass9573 at gmx.com (Nikos Vassiliadis) Date: Mon May 4 12:27:41 2009 Subject: VIMAGE In-Reply-To: <49FE5937.3000606@FreeBSD.org> References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> <49FE5937.3000606@FreeBSD.org> Message-ID: <49FEDF25.9060901@gmx.com> Jamie Gritton wrote: > Jails will be able to exist without processes, and in fact with nothing > more than a vimage attached. Ah that's what I was looking for. But much of vimage only makes sense in > conjunction with processes - a process attached to a vimage can see that > vimage's network interfaces. There are still things like routing that > work independent of processes I suppose, but it seems to me much what a > vimage does is provide the network stack to the processes it's tied to. Yet, VIMAGE is very similar in concept with VRF (http://en.wikipedia.org/wiki/VRF) and I think FreeBSD will look very promising in router-like applications:) Maybe there are applications of VIMAGE which haven't been considered by its developers. Time will tell... Nikos From jamie at FreeBSD.org Mon May 4 13:17:59 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Mon May 4 13:18:40 2009 Subject: New jail framework - the userland side In-Reply-To: <4424.1241418320@critter.freebsd.dk> References: <4424.1241418320@critter.freebsd.dk> Message-ID: <49FEEB03.7060908@FreeBSD.org> Poul-Henning Kamp wrote: > In message <49FE5387.3020503@FreeBSD.org>, Jamie Gritton writes: > >> Hi all. I recently added some new jail-related system calls to extend >> the current jail system with an nmount-inspired name=value interface. > > I think this is a great move in the right direction, my only concern is > that we should try to share as much of the string-munging code between > the nmount and jail implementations as possible. Most if it is shared - jail actually calls vfs_getopt and related calls from the family. I might want to spin those functions off into their own subsystem at some point, now that they're officially used outside of VFS. I did have to extend things somewhat for jail_get, as nmount is write- only and only had to deal with one module at a time (the filesystem type). Those extensions are available for use elsewhere, as I suspect filesystems and jails aren't the only place where we could use name- based extensibility. - Jamie From Carlos.Paniago at cnptia.embrapa.br Thu May 7 16:01:13 2009 From: Carlos.Paniago at cnptia.embrapa.br (Carlos Fernando Assis Paniago) Date: Thu May 7 16:08:30 2009 Subject: Port of virtualbox in FreeBSD i386/amd64. Message-ID: <4A030140.4060509@cnptia.embrapa.br> People: look at the article bellow: http://miwi.bsdcrew.de/2009/05/virtualbox-on-freebsd/ we can get the port in: svn co http://svn.bluelife.at/projects/packages/blueports/emulators/virtualbox/ I'm compiling in an amd64 machine... Now we can help with virtualbox, because there is a port for it (unofficial). Paniago From clcchu at hotmail.com Fri May 8 04:35:11 2009 From: clcchu at hotmail.com (Clarence Chu) Date: Fri May 8 04:35:18 2009 Subject: Port of virtualbox in FreeBSD i386/amd64, emulators/qemu In-Reply-To: <4A030140.4060509@cnptia.embrapa.br> References: <4A030140.4060509@cnptia.embrapa.br> Message-ID: > Subject: Port of virtualbox in FreeBSD i386/amd64. > > People: look at the article bellow: > > http://miwi.bsdcrew.de/2009/05/virtualbox-on-freebsd/ > > we can get the port in: > > svn co > http://svn.bluelife.at/projects/packages/blueports/emulators/virtualbox/ > > FYI, ports/emulators/qemu as of version 0.10.3 may act as VMM for Vista-x86, MacOSX/Leopard (Hackintosh). not to mention -arm, -x86_64, -etc. virtualbox can only be x86-only VMM, AFAIK. It's definitely great news to be able to run VirtualBox under FreeBSD! Best wishes, Clarence CHU _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From scrappy at hub.org Fri May 8 20:04:09 2009 From: scrappy at hub.org (Marc G. Fournier) Date: Fri May 8 20:20:16 2009 Subject: Port of virtualbox in FreeBSD i386/amd64, emulators/qemu In-Reply-To: References: <4A030140.4060509@cnptia.embrapa.br> Message-ID: <20090508164247.F3563@hub.org> On Fri, 8 May 2009, Clarence Chu wrote: > ports/emulators/qemu as of version 0.10.3 may act as VMM for Vista-x86, > MacOSX/Leopard (Hackintosh). not to mention -arm, -x86_64, -etc. I have something like 6 QEMU VPSs running on one physical server that clients are using to run fully networked Linux environments, and haven't heard any complaints ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 From jamie at FreeBSD.org Sat May 9 06:08:43 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Sat May 9 06:09:26 2009 Subject: Hierarchical jails Message-ID: <4A051DE3.30705@FreeBSD.org> Here's the first round of hierarchical jails under the new framework. Instead of creds having either a prison or a NULL pointer, they all have a prison pointer with the default being the global "prison0" that contains information about the real environment. Jailed root may (if granted permission) create prisons that would be under its place in the hierarchy, but may not alter (or even see) prisons at its level or above. The JID space is flat, i.e. every prison in the system has a unique ID. The prison name space is hierarchical, with jails having dot-separated component names. prison0 contains three fields that were system globals: pr_root, pr_host, and pr_securelevel. I've kept the globals rootvnode and hostname, and take care that when one is changed the other changes too (not yet true for hostname - read on). But I've actually removed the global securelevel, instead forcing people to use securelevel_gt() and securelevel_ge() (or in very rare cases to check prison0.pr_securelevel directly). I chose to do that because while using the global rootvnode and hostname may be incorrect, using the wrong securelevel is, well, insecure. Actually it would be insecure to use the wrong rootvnode too, but I'm not convinced removing that global is worth the headache. Other globals are subsumed into prison0, but they were only ever part of the jail system anyway: the various jail-related permission bits and such administrative things as prisoncount. The prison hierarchy keeps track of restrictions placed on prisons, and will reflect them downward so a child jail is always at least as restricted as its ancestors. It doesn't go the other way though: if a prison's restrictions are loosened, the children stay as they are. This patch doesn't have anything for userland, and hierarchical jails won't work without that patch (because jails don't have permission to create sub-jails by default, and jail(2) can't grant that permission). A userland patch will follow soon, very similar to the version I posted here recently. - Jamie -------------- next part -------------- Index: lib/libc/sys/jail.2 =================================================================== --- lib/libc/sys/jail.2 (revision 191896) +++ lib/libc/sys/jail.2 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd April 29, 2009 +.Dd May 8, 2009 .Dt JAIL 2 .Os .Sh NAME @@ -283,7 +283,7 @@ It is possible to identify a process as jailed by examining .Dq Li /proc//status : it will show a field near the end of the line, either as -a single hyphen for a process at large, or the hostname currently +a single hyphen for a process at large, or the name currently set for the prison for jailed processes. .Sh ERRORS The @@ -292,7 +292,10 @@ will fail if: .Bl -tag -width Er .It Bq Er EPERM -This process is not allowed to create a jail. +This process is not allowed to create a jail, either because it is not +the super-user, or the +.Va security.jail.allow_jails +sysctl MIB is not set. .It Bq Er EFAULT .Fa jail points to an address outside the allocated address space of the process. @@ -308,7 +311,10 @@ will fail if: .Bl -tag -width Er .It Bq Er EPERM -This process is not allowed to create a jail. +This process is not allowed to create a jail, either because it is not +the super-user, or the +.Va security.jail.allow_jails +sysctl MIB is not set. .It Bq Er EPERM A jail parameter was set to a less restrictive value then the current environment. @@ -429,4 +435,4 @@ who contributed it to .Fx . .An James Gritton -added the extensible jail parameters. +added the extensible jail parameters and hierchical jails. Index: sys/ufs/ufs/ufs_vnops.c =================================================================== --- sys/ufs/ufs/ufs_vnops.c (revision 191896) +++ sys/ufs/ufs/ufs_vnops.c (working copy) @@ -61,7 +61,6 @@ #include #include #include -#include #include Index: sys/kern/kern_jail.c =================================================================== --- sys/kern/kern_jail.c (revision 191896) +++ sys/kern/kern_jail.c (working copy) @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -48,7 +49,6 @@ #include #include #include -#include #include #include #include @@ -74,61 +74,43 @@ SYSCTL_NODE(_security, OID_AUTO, jail, CTLFLAG_RW, 0, "Jail rules"); -int jail_set_hostname_allowed = 1; -SYSCTL_INT(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW, - &jail_set_hostname_allowed, 0, - "Processes in jail can set their hostnames"); +/* prison0 describes what is "real" about the system. */ +struct prison prison0 = { + .pr_id = 0, + .pr_name = "0", + .pr_ref = 1, + .pr_uref = 1, + .pr_path = "/", + .pr_securelevel = -1, + .pr_children = LIST_HEAD_INITIALIZER(&prison0.pr_children), + .pr_flags = PR_ALLOW_ALL, + .pr_def_perms = PR_ALLOW_SET_HOSTNAME | + PR_RESTRICT_SOCKET_UNIXIPROUTE, + .pr_def_enforce_statfs = 2, +#if defined(INET) || defined(INET6) + .pr_def_max_af_ips = 255, +#endif +}; +MTX_SYSINIT(prison0, &prison0.pr_mtx, "jail mutex", MTX_DEF); -int jail_socket_unixiproute_only = 1; -SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, - &jail_socket_unixiproute_only, 0, - "Processes in jail are limited to creating UNIX/IP/route sockets only"); - -int jail_sysvipc_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, - &jail_sysvipc_allowed, 0, - "Processes in jail can use System V IPC primitives"); - -static int jail_enforce_statfs = 2; -SYSCTL_INT(_security_jail, OID_AUTO, enforce_statfs, CTLFLAG_RW, - &jail_enforce_statfs, 0, - "Processes in jail cannot see all mounted file systems"); - -int jail_allow_raw_sockets = 0; -SYSCTL_INT(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW, - &jail_allow_raw_sockets, 0, - "Prison root can create raw sockets"); - -int jail_chflags_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW, - &jail_chflags_allowed, 0, - "Processes in jail can alter system file flags"); - -int jail_mount_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW, - &jail_mount_allowed, 0, - "Processes in jail can mount/unmount jail-friendly file systems"); - -int jail_max_af_ips = 255; -SYSCTL_INT(_security_jail, OID_AUTO, jail_max_af_ips, CTLFLAG_RW, - &jail_max_af_ips, 0, - "Number of IP addresses a jail may have at most per address family"); - -/* allprison, lastprid, and prisoncount are protected by allprison_lock. */ +/* allprison and lastprid are protected by allprison_lock. */ struct sx allprison_lock; SX_SYSINIT(allprison_lock, &allprison_lock, "allprison"); struct prisonlist allprison = TAILQ_HEAD_INITIALIZER(allprison); int lastprid = 0; -int prisoncount = 0; static int do_jail_attach(struct thread *td, struct prison *pr); static void prison_complete(void *context, int pending); static void prison_deref(struct prison *pr, int flags); +static char *prison_path(struct prison *pr1, struct prison *pr2); +static void prison_remove1(struct prison *pr); #ifdef INET static int _prison_check_ip4(struct prison *pr, struct in_addr *ia); +static int prison_restrict_ip4(struct prison *pr, struct in_addr *newip4); #endif #ifdef INET6 static int _prison_check_ip6(struct prison *pr, struct in6_addr *ia6); +static int prison_restrict_ip6(struct prison *pr, struct in6_addr *newip6); #endif static int sysctl_jail_list(SYSCTL_HANDLER_ARGS); @@ -139,7 +121,46 @@ #define PD_LIST_SLOCKED 0x08 #define PD_LIST_XLOCKED 0x10 +/* + * Parameter names corresponding to PR_* flag values + */ +static char *pr_flag_names[] = { + [0] = "persist", #ifdef INET + [2] = "ipv4", +#endif +#ifdef INET6 + [3] = "ipv6", +#endif + [16] = "perm.set_hostname_allowed", + "perm.sysvipc_allowed", + "perm.allow_raw_sockets", + "perm.chflags_allowed", + "perm.mount_allowed", + "perm.allow_quotas", + "perm.allow_jails", + "perm.socket_unixiproute_only", +}; + +static char *pr_flag_nonames[] = { + [0] = "nopersist", +#ifdef INET + [2] = "noipv4", +#endif +#ifdef INET6 + [3] = "noipv6", +#endif + [16] = "perm.noset_hostname_allowed", + "perm.nosysvipc_allowed", + "perm.noallow_raw_sockets", + "perm.nochflags_allowed", + "perm.nomount_allowed", + "perm.noallow_quotas", + "perm.noallow_jails", + "perm.nosocket_unixiproute_only", +}; + +#ifdef INET static int qcmp_v4(const void *ip1, const void *ip2) { @@ -277,7 +298,7 @@ return (error); tmplen = MAXPATHLEN + MAXHOSTNAMELEN + MAXHOSTNAMELEN; #ifdef INET - if (j.ip4s > jail_max_af_ips) + if (j.ip4s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j.ip4s * sizeof(struct in_addr); #else @@ -285,7 +306,7 @@ return (EINVAL); #endif #ifdef INET6 - if (j.ip6s > jail_max_af_ips) + if (j.ip6s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j.ip6s * sizeof(struct in6_addr); #else @@ -420,23 +441,24 @@ #endif struct vfsopt *opt; struct vfsoptlist *opts; - struct prison *pr, *deadpr, *tpr; + struct prison *pr, *deadpr, *mypr, *ppr, *tpr; struct vnode *root; char *errmsg, *host, *name, *p, *path; void *op; - int created, cuflags, error, errmsg_len, errmsg_pos; - int gotslevel, jid, len; + size_t namelen, onamelen; + int created, cuflags, descend, enforce, error, errmsg_len, errmsg_pos; + int gotenforce, gotslevel, fi, jid, len; int slevel, vfslocked; #if defined(INET) || defined(INET6) - int ii; + int ii, ij, gotmaxips, maxips; #endif #ifdef INET - int ip4s; + int ip4s, ip4a, redo_ip4; #endif #ifdef INET6 - int ip6s; + int ip6s, ip6a, redo_ip6; #endif - unsigned pr_flags, ch_flags; + unsigned pr_flags, ch_flags, tflags; char numbuf[12]; error = priv_check(td, PRIV_JAIL_SET); @@ -444,6 +466,9 @@ error = priv_check(td, PRIV_JAIL_ATTACH); if (error) return (error); + mypr = ppr = td->td_ucred->cr_prison; + if ((flags & JAIL_CREATE) && !(mypr->pr_flags & PR_ALLOW_JAILS)) + return (EPERM); if (flags & ~JAIL_SET_MASK) return (EINVAL); @@ -461,12 +486,15 @@ if (error) return (error); #ifdef INET + ip4a = 0; ip4 = NULL; #endif #ifdef INET6 + ip6a = 0; ip6 = NULL; #endif + again: error = vfs_copyopt(opts, "jid", &jid, sizeof(jid)); if (error == ENOENT) jid = 0; @@ -481,9 +509,33 @@ else gotslevel = 1; + error = vfs_copyopt(opts, "perm.enforce_statfs", &enforce, + sizeof(enforce)); + gotenforce = error == 0; + if (gotenforce) { + if (enforce < 0 || enforce > 2) + return (EINVAL); + } else if (error != ENOENT) + goto done_free; + +#if defined(INET) || defined(INET6) + error = vfs_copyopt(opts, "perm.max_af_ips", &maxips, sizeof(maxips)); + gotmaxips = error == 0; + if (maxips) { + if (maxips < 1) + return (EINVAL); + } else if (error != ENOENT) + goto done_free; +#endif + pr_flags = ch_flags = 0; - vfs_flagopt(opts, "persist", &pr_flags, PR_PERSIST); - vfs_flagopt(opts, "nopersist", &ch_flags, PR_PERSIST); + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) { + if (pr_flag_names[fi] == NULL) + continue; + vfs_flagopt(opts, pr_flag_names[fi], &pr_flags, 1 << fi); + vfs_flagopt(opts, pr_flag_nonames[fi], &ch_flags, 1 << fi); + } ch_flags |= pr_flags; if ((flags & (JAIL_CREATE | JAIL_UPDATE | JAIL_ATTACH)) == JAIL_CREATE && !(pr_flags & PR_PERSIST)) { @@ -524,6 +576,7 @@ } } + /* This might be the second time around for this option. */ #ifdef INET error = vfs_getopt(opts, "ip4.addr", &op, &ip4s); if (error == ENOENT) @@ -533,43 +586,54 @@ else if (ip4s & (sizeof(*ip4) - 1)) { error = EINVAL; goto done_free; - } else if (ip4s > 0) { - ip4s /= sizeof(*ip4); - if (ip4s > jail_max_af_ips) { - error = EINVAL; - vfs_opterror(opts, "too many IPv4 addresses"); - goto done_errmsg; - } - ip4 = malloc(ip4s * sizeof(*ip4), M_PRISON, M_WAITOK); - bcopy(op, ip4, ip4s * sizeof(*ip4)); - /* - * IP addresses are all sorted but ip[0] to preserve the - * primary IP address as given from userland. This special IP - * is used for unbound outgoing connections as well for - * "loopback" traffic. - */ - if (ip4s > 1) - qsort(ip4 + 1, ip4s - 1, sizeof(*ip4), qcmp_v4); - /* - * Check for duplicate addresses and do some simple zero and - * broadcast checks. If users give other bogus addresses it is - * their problem. - * - * We do not have to care about byte order for these checks so - * we will do them in NBO. - */ - for (ii = 0; ii < ip4s; ii++) { - if (ip4[ii].s_addr == INADDR_ANY || - ip4[ii].s_addr == INADDR_BROADCAST) { + } else { + ch_flags |= PR_IP4_USER; + pr_flags |= PR_IP4_USER; + if (ip4s > 0) { + ip4s /= sizeof(*ip4); + if (gotmaxips && ip4s > maxips) { error = EINVAL; - goto done_free; + vfs_opterror(opts, "too many IPv4 addresses"); + goto done_errmsg; } - if ((ii+1) < ip4s && - (ip4[0].s_addr == ip4[ii+1].s_addr || - ip4[ii].s_addr == ip4[ii+1].s_addr)) { - error = EINVAL; - goto done_free; + if (ip4a < ip4s) { + ip4a = ip4s; + free(ip4, M_PRISON); + ip4 = NULL; } + if (ip4 == NULL) + ip4 = malloc(ip4a * sizeof(*ip4), M_PRISON, + M_WAITOK); + bcopy(op, ip4, ip4s * sizeof(*ip4)); + /* + * IP addresses are all sorted but ip[0] to preserve + * the primary IP address as given from userland. + * This special IP is used for unbound outgoing + * connections as well for "loopback" traffic. + */ + if (ip4s > 1) + qsort(ip4 + 1, ip4s - 1, sizeof(*ip4), qcmp_v4); + /* + * Check for duplicate addresses and do some simple + * zero and broadcast checks. If users give other bogus + * addresses it is their problem. + * + * We do not have to care about byte order for these + * checks so we will do them in NBO. + */ + for (ii = 0; ii < ip4s; ii++) { + if (ip4[ii].s_addr == INADDR_ANY || + ip4[ii].s_addr == INADDR_BROADCAST) { + error = EINVAL; + goto done_free; + } + if ((ii+1) < ip4s && + (ip4[0].s_addr == ip4[ii+1].s_addr || + ip4[ii].s_addr == ip4[ii+1].s_addr)) { + error = EINVAL; + goto done_free; + } + } } } #endif @@ -583,29 +647,40 @@ else if (ip6s & (sizeof(*ip6) - 1)) { error = EINVAL; goto done_free; - } else if (ip6s > 0) { - ip6s /= sizeof(*ip6); - if (ip6s > jail_max_af_ips) { - error = EINVAL; - vfs_opterror(opts, "too many IPv6 addresses"); - goto done_errmsg; - } - ip6 = malloc(ip6s * sizeof(*ip6), M_PRISON, M_WAITOK); - bcopy(op, ip6, ip6s * sizeof(*ip6)); - if (ip6s > 1) - qsort(ip6 + 1, ip6s - 1, sizeof(*ip6), qcmp_v6); - for (ii = 0; ii < ip6s; ii++) { - if (IN6_IS_ADDR_UNSPECIFIED(&ip6[0])) { + } else { + ch_flags |= PR_IP6_USER; + pr_flags |= PR_IP6_USER; + if (ip6s > 0) { + ip6s /= sizeof(*ip6); + if (gotmaxips && ip6s > maxips) { error = EINVAL; - goto done_free; + vfs_opterror(opts, "too many IPv6 addresses"); + goto done_errmsg; } - if ((ii+1) < ip6s && - (IN6_ARE_ADDR_EQUAL(&ip6[0], &ip6[ii+1]) || - IN6_ARE_ADDR_EQUAL(&ip6[ii], &ip6[ii+1]))) - { - error = EINVAL; - goto done_free; + if (ip6a < ip6s) { + ip6a = ip6s; + free(ip6, M_PRISON); + ip6 = NULL; } + if (ip6 == NULL) + ip6 = malloc(ip6a * sizeof(*ip6), M_PRISON, + M_WAITOK); + bcopy(op, ip6, ip6s * sizeof(*ip6)); + if (ip6s > 1) + qsort(ip6 + 1, ip6s - 1, sizeof(*ip6), qcmp_v6); + for (ii = 0; ii < ip6s; ii++) { + if (IN6_IS_ADDR_UNSPECIFIED(&ip6[0])) { + error = EINVAL; + goto done_free; + } + if ((ii+1) < ip6s && + (IN6_ARE_ADDR_EQUAL(&ip6[0], &ip6[ii+1]) || + IN6_ARE_ADDR_EQUAL(&ip6[ii], &ip6[ii+1]))) + { + error = EINVAL; + goto done_free; + } + } } } #endif @@ -627,13 +702,15 @@ error = EINVAL; goto done_free; } - if (len > MAXPATHLEN) { - error = ENAMETOOLONG; - goto done_free; - } if (len < 2 || (len == 2 && path[0] == '/')) path = NULL; else { + /* Leave room for a real-root full pathname. */ + if (len + (path[0] == '/' && strcmp(mypr->pr_path, "/") + ? strlen(mypr->pr_path) : 0) > MAXPATHLEN) { + error = ENAMETOOLONG; + goto done_free; + } NDINIT(&nd, LOOKUP, MPSAFE | FOLLOW, UIO_SYSSPACE, path, td); error = namei(&nd); @@ -683,7 +760,13 @@ } pr = NULL; if (jid != 0) { - /* See if a requested jid already exists. */ + /* + * See if a requested jid already exists. There is an + * information leak here if the jid exists but is not within + * the caller's jail hierarchy. Jail creators will get EEXIST + * even though they cannot see the jail, and CREATE | UPDATE + * will return ENOENT which is not normally a valid error. + */ if (jid < 0) { error = EINVAL; vfs_opterror(opts, "negative jid"); @@ -691,6 +774,7 @@ } pr = prison_find(jid); if (pr != NULL) { + ppr = pr->pr_parent; /* Create: jid must not exist. */ if (cuflags == JAIL_CREATE) { mtx_unlock(&pr->pr_mtx); @@ -699,7 +783,10 @@ jid); goto done_unlock_list; } - if (pr->pr_uref == 0) { + if (!prison_ischild(mypr, pr)) { + mtx_unlock(&pr->pr_mtx); + pr = NULL; + } else if (pr->pr_uref == 0) { if (!(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); error = ENOENT; @@ -717,7 +804,7 @@ * name. */ if (name == NULL) - name = pr->pr_name; + name = prison_name(mypr, pr); } } } @@ -738,12 +825,42 @@ * because that is the jail being updated). */ if (name != NULL) { + p = strrchr(name, '.'); + if (p != NULL) { + /* + * This is a hierarchical name. Split it into the + * parent and child names, and make sure the parent + * exists or matches an already found jail. + */ + *p = '\0'; + if (pr != NULL) { + if (strncmp(name, ppr->pr_name, p - name) || + ppr->pr_name[p - name] != '\0') { + mtx_unlock(&pr->pr_mtx); + error = EINVAL; + vfs_opterror(opts, + "cannot change jail's parent"); + goto done_unlock_list; + } + } else { + ppr = prison_find_name(mypr, name); + if (ppr == NULL) { + error = ENOENT; + vfs_opterror(opts, + "jail \"%s\" not found", name); + goto done_unlock_list; + } + mtx_unlock(&ppr->pr_mtx); + } + name = p + 1; + } if (name[0] != '\0') { + namelen = strlen(ppr->pr_name) + 1; + name_again: deadpr = NULL; - name_again: - TAILQ_FOREACH(tpr, &allprison, pr_list) { + FOREACH_PRISON_CHILD(ppr, tpr) { if (tpr != pr && tpr->pr_ref > 0 && - !strcmp(tpr->pr_name, name)) { + !strcmp(tpr->pr_name + namelen, name)) { if (pr == NULL && cuflags != JAIL_CREATE) { mtx_lock(&tpr->pr_mtx); @@ -763,7 +880,7 @@ /* * Create, or update(jid): * name must not exist in an - * active jail. + * active sibling jail. */ error = EEXIST; if (pr != NULL) @@ -810,6 +927,15 @@ /* If there's no prison to update, create a new one and link it in. */ if (pr == NULL) { created = 1; + mtx_lock(&ppr->pr_mtx); + if (ppr->pr_ref == 0 || (ppr->pr_flags & PR_REMOVE)) { + mtx_unlock(&ppr->pr_mtx); + error = ENOENT; + goto done_unlock_list; + } + ppr->pr_ref++; + ppr->pr_uref++; + mtx_unlock(&ppr->pr_mtx); pr = malloc(sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO); if (jid == 0) { /* Find the next free jid. */ @@ -829,7 +955,9 @@ vfs_opterror(opts, "no available jail IDs"); free(pr, M_PRISON); - goto done_unlock_list; + prison_deref(ppr, PD_DEREF | + PD_DEUREF | PD_LIST_XLOCKED); + goto done_releroot; } jid++; goto findnext; @@ -848,24 +976,56 @@ } if (tpr == NULL) TAILQ_INSERT_TAIL(&allprison, pr, pr_list); - prisoncount++; + LIST_INSERT_HEAD(&ppr->pr_children, pr, pr_sibling); + for (tpr = ppr; tpr != NULL; tpr = tpr->pr_parent) + tpr->pr_prisoncount++; + pr->pr_parent = ppr; pr->pr_id = jid; + + /* Set some default values, and inherit some from the parent. */ if (name == NULL) name = ""; if (path == NULL) { path = "/"; - root = rootvnode; + root = mypr->pr_root; vref(root); } +#ifdef INET + pr->pr_flags |= ppr->pr_flags & PR_IP4; + pr->pr_ip4s = ppr->pr_ip4s; + if (ppr->pr_ip4 != NULL) { + pr->pr_ip4 = malloc(pr->pr_ip4s * + sizeof(struct in_addr), M_PRISON, M_WAITOK); + bcopy(ppr->pr_ip4, pr->pr_ip4, + pr->pr_ip4s * sizeof(*pr->pr_ip4)); + } +#endif +#ifdef INET6 + pr->pr_flags |= ppr->pr_flags & PR_IP6; + pr->pr_ip6s = ppr->pr_ip6s; + if (ppr->pr_ip6 != NULL) { + pr->pr_ip6 = malloc(pr->pr_ip6s * + sizeof(struct in6_addr), M_PRISON, M_WAITOK); + bcopy(ppr->pr_ip6, pr->pr_ip6, + pr->pr_ip6s * sizeof(*pr->pr_ip6)); + } +#endif + pr->pr_securelevel = ppr->pr_securelevel; + pr->pr_flags |= ppr->pr_def_perms; + pr->pr_enforce_statfs = ppr->pr_def_enforce_statfs; +#if defined(INET) || defined(INET6) + pr->pr_max_af_ips = ppr->pr_def_max_af_ips; +#endif - mtx_init(&pr->pr_mtx, "jail mutex", NULL, MTX_DEF); + LIST_INIT(&pr->pr_children); + mtx_init(&pr->pr_mtx, "jail mutex", NULL, MTX_DEF | MTX_DUPOK); /* * Allocate a dedicated cpuset for each jail. * Unlike other initial settings, this may return an erorr. */ - error = cpuset_create_root(td, &pr->pr_cpuset); + error = cpuset_create_root(ppr, &pr->pr_cpuset); if (error) { prison_deref(pr, PD_LIST_XLOCKED); goto done_releroot; @@ -887,103 +1047,425 @@ } /* Do final error checking before setting anything. */ - error = 0; + if (gotslevel) { + if (slevel < ppr->pr_securelevel) { + error = EPERM; + goto done_deref_locked; + } + } + if (gotenforce) { + if (enforce < ppr->pr_enforce_statfs) { + error = EPERM; + goto done_deref_locked; + } + } #if defined(INET) || defined(INET6) - if ( -#ifdef INET - ip4s > 0 -#ifdef INET6 - || + if (gotmaxips) { + if (maxips > ppr->pr_max_af_ips) { + error = EPERM; + goto done_deref_locked; + } + } #endif -#endif -#ifdef INET6 - ip6s > 0 -#endif - ) - /* - * Check for conflicting IP addresses. We permit them if there - * is no more than 1 IP on each jail. If there is a duplicate - * on a jail with more than one IP stop checking and return - * error. - */ - TAILQ_FOREACH(tpr, &allprison, pr_list) { - if (tpr == pr || tpr->pr_uref == 0) - continue; #ifdef INET - if ((ip4s > 0 && tpr->pr_ip4s > 1) || - (ip4s > 1 && tpr->pr_ip4s > 0)) - for (ii = 0; ii < ip4s; ii++) + if (ch_flags & PR_IP4_USER) { + if (!gotmaxips && ip4s > pr->pr_max_af_ips) { + error = EINVAL; + vfs_opterror(opts, "too many IPv4 addresses"); + goto done_deref_locked; + } + if (ppr->pr_flags & PR_IP4) { + if (!(pr_flags & PR_IP4_USER)) { + /* + * Silently ignore attempts to make the IP + * addresses unrestricted when the parent is + * restricted; in other words, interpret + * "unrestricted" as "as unrestricted as + * possible". + */ + ip4s = ppr->pr_ip4s; + if (ip4s == 0) { + free(ip4, M_PRISON); + ip4 = NULL; + } else if (ip4s <= ip4a) { + /* Inherit the parent's address(es). */ + bcopy(ppr->pr_ip4, ip4, + ip4s * sizeof(*ip4)); + } else { + /* + * There's no room for the parent's + * address list. Allocate some more. + */ + ip4a = ip4s; + free(ip4, M_PRISON); + ip4 = malloc(ip4a * sizeof(*ip4), + M_PRISON, M_NOWAIT); + if (ip4 != NULL) + bcopy(ppr->pr_ip4, ip4, + ip4s * sizeof(*ip4)); + else { + /* Allocation failed without + * sleeping. Unlocking the + * prison now will invalidate + * some checks and prematurely + * show an unfinished new jail. + * So let go of everything and + * start over. + */ + prison_deref(pr, created + ? PD_LOCKED | + PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | + PD_LIST_XLOCKED); + if (root != NULL) { + vfslocked = + VFS_LOCK_GIANT( + root->v_mount); + vrele(root); + VFS_UNLOCK_GIANT( + vfslocked); + } + ip4 = malloc(ip4a * + sizeof(*ip4), M_PRISON, + M_WAITOK); + goto again; + } + } + } else if (ip4s > 0) { + /* + * Make sure the new set of IP addresses is a + * subset of the parent's list. Don't worry + * about the parent being unlocked, as any + * setting is done with allprison_lock held. + */ + for (ij = 0; ij < ppr->pr_ip4s; ij++) + if (ip4[0].s_addr == + ppr->pr_ip4[ij].s_addr) + break; + if (ij == ppr->pr_ip4s) { + error = EPERM; + goto done_deref_locked; + } + if (ip4s > 1) { + for (ii = ij = 1; ii < ip4s; ii++) { + if (ip4[ii].s_addr == + ppr->pr_ip4[0]. s_addr) + continue; + for (; ij < ppr->pr_ip4s; ij++) + if (ip4[ii].s_addr == + ppr->pr_ip4[ij].s_addr) + break; + } + if (ij == ppr->pr_ip4s) { + error = EPERM; + goto done_deref_locked; + } + } + } + } + if (ip4s > 0) { + /* + * Check for conflicting IP addresses. We permit them + * if there is no more than one IP on each jail. If + * there is a duplicate on a jail with more than one + * IP stop checking and return error. + */ + FOREACH_PRISON_DESCENDANT(&prison0, tpr, descend) { + if (tpr == pr || tpr->pr_uref == 0) { + descend = 0; + continue; + } + if (!(tpr->pr_flags & PR_IP4_USER)) + continue; + descend = 0; + if (tpr->pr_ip4 == NULL || + (ip4s == 1 && tpr->pr_ip4s == 1)) + continue; + for (ii = 0; ii < ip4s; ii++) { if (_prison_check_ip4(tpr, &ip4[ii]) == 0) { - error = EINVAL; + error = EADDRINUSE; vfs_opterror(opts, "IPv4 addresses clash"); goto done_deref_locked; } + } + } + } + } #endif #ifdef INET6 - if ((ip6s > 0 && tpr->pr_ip6s > 1) || - (ip6s > 1 && tpr->pr_ip6s > 0)) - for (ii = 0; ii < ip6s; ii++) + if (ch_flags & PR_IP6_USER) { + if (!gotmaxips && ip6s > pr->pr_max_af_ips) { + error = EINVAL; + vfs_opterror(opts, "too many IPv6 addresses"); + goto done_deref_locked; + } + if (ppr->pr_flags & PR_IP6) { + if (!(pr_flags & PR_IP6_USER)) { + /* + * Silently ignore attempts to make the IP + * addresses unrestricted when the parent is + * restricted. + */ + ip6s = ppr->pr_ip6s; + if (ip6s == 0) { + free(ip6, M_PRISON); + ip6 = NULL; + } else if (ip6s <= ip6a) { + /* Inherit the parent's address(es). */ + bcopy(ppr->pr_ip6, ip6, + ip6s * sizeof(*ip6)); + } else { + /* + * There's no room for the parent's + * address list. + */ + ip6a = ip6s; + free(ip6, M_PRISON); + ip6 = malloc(ip6a * sizeof(*ip6), + M_PRISON, M_NOWAIT); + if (ip6 != NULL) + bcopy(ppr->pr_ip6, ip6, + ip6s * sizeof(*ip6)); + else { + prison_deref(pr, created + ? PD_LOCKED | + PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | + PD_LIST_XLOCKED); + if (root != NULL) { + vfslocked = + VFS_LOCK_GIANT( + root->v_mount); + vrele(root); + VFS_UNLOCK_GIANT( + vfslocked); + } + ip6 = malloc(ip6a * + sizeof(*ip6), M_PRISON, + M_WAITOK); + goto again; + } + } + } else if (ip6s > 0) { + /* + * Make sure the new set of IP addresses is a + * subset of the parent's list. + */ + for (ij = 0; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL(&ip6[0], + &ppr->pr_ip6[ij])) + break; + if (ij == ppr->pr_ip6s) { + error = EPERM; + goto done_deref_locked; + } + if (ip6s > 1) { + for (ii = ij = 1; ii < ip6s; ii++) { + if (IN6_ARE_ADDR_EQUAL(&ip6[ii], + &ppr->pr_ip6[0])) + continue; + for (; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL( + &ip6[ii], + &ppr->pr_ip6[ij])) + break; + } + if (ij == ppr->pr_ip6s) { + error = EPERM; + goto done_deref_locked; + } + } + } + } + if (ip6s > 0) { + /* Check for conflicting IP addresses. */ + FOREACH_PRISON_DESCENDANT(&prison0, tpr, descend) { + if (tpr == pr || tpr->pr_uref == 0) { + descend = 0; + continue; + } + if (!(tpr->pr_flags & PR_IP6_USER)) + continue; + descend = 0; + if (tpr->pr_ip6 == NULL || + (ip6s == 1 && tpr->pr_ip6s == 1)) + continue; + for (ii = 0; ii < ip6s; ii++) { if (_prison_check_ip6(tpr, &ip6[ii]) == 0) { - error = EINVAL; + error = EADDRINUSE; vfs_opterror(opts, "IPv6 addresses clash"); goto done_deref_locked; } -#endif + } + } } + } #endif - if (error == 0 && name != NULL) { + if (name != NULL) { /* Give a default name of the jid. */ if (name[0] == '\0') snprintf(name = numbuf, sizeof(numbuf), "%d", jid); else if (strtoul(name, &p, 10) != jid && *p == '\0') { error = EINVAL; vfs_opterror(opts, "name cannot be numeric"); + goto done_deref_locked; } + if (strlen(ppr->pr_name) + strlen(name) + 2 > + sizeof(pr->pr_name)) { + error = ENAMETOOLONG; + goto done_deref_locked; + } } - if (error) { - done_deref_locked: - /* - * Some parameter had an error so do not set anything. - * If this is a new jail, it will go away without ever - * having been seen. - */ - prison_deref(pr, created - ? PD_LOCKED | PD_LIST_XLOCKED - : PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); - goto done_releroot; + if ((PR_ALLOW_ALL & pr_flags & ~ppr->pr_flags) | + (PR_RESTRICT_ALL & ch_flags & ~pr_flags & ppr->pr_flags)) { + error = EPERM; + goto done_deref_locked; } /* Set the parameters of the prison. */ #ifdef INET - if (ip4s >= 0) { - pr->pr_ip4s = ip4s; - free(pr->pr_ip4, M_PRISON); - pr->pr_ip4 = ip4; - ip4 = NULL; + redo_ip4 = 0; + if (ch_flags & PR_IP4_USER) { + if (pr_flags & PR_IP4_USER) { + /* Some restriction set. */ + pr->pr_flags |= PR_IP4; + if (ip4s >= 0) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = ip4s; + pr->pr_ip4 = ip4; + ip4 = NULL; + } + } else if (ppr->pr_flags & PR_IP4) { + /* This restriction cleared, but keep inherited. */ + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = ip4s; + pr->pr_ip4 = ip4; + ip4 = NULL; + } else { + /* Restriction cleared, now unrestricted. */ + pr->pr_flags &= ~PR_IP4; + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = 0; + } + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip4(tpr, NULL)) { + redo_ip4 = 1; + descend = 0; + } + } } #endif #ifdef INET6 - if (ip6s >= 0) { - pr->pr_ip6s = ip6s; - free(pr->pr_ip6, M_PRISON); - pr->pr_ip6 = ip6; - ip6 = NULL; + redo_ip6 = 0; + if (ch_flags & PR_IP6_USER) { + if (pr_flags & PR_IP6_USER) { + /* Some restriction set. */ + pr->pr_flags |= PR_IP6; + if (ip6s >= 0) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = ip6s; + pr->pr_ip6 = ip6; + ip6 = NULL; + } + } else if (ppr->pr_flags & PR_IP6) { + /* This restriction cleared, but keep inherited. */ + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = ip6s; + pr->pr_ip6 = ip6; + ip6 = NULL; + } else { + /* Restriction cleared, now unrestricted. */ + pr->pr_flags &= ~PR_IP6; + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = 0; + } + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip6(tpr, NULL)) { + redo_ip6 = 1; + descend = 0; + } + } } #endif - if (gotslevel) + if (gotslevel) { pr->pr_securelevel = slevel; - if (name != NULL) - strlcpy(pr->pr_name, name, sizeof(pr->pr_name)); + /* Set all child jails to be at least this level. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_securelevel < slevel) + tpr->pr_securelevel = slevel; + } + if (gotenforce) { + pr->pr_enforce_statfs = enforce; + if (pr->pr_def_enforce_statfs < enforce) + pr->pr_def_enforce_statfs = enforce; + /* Pass this restriction on to the children. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_enforce_statfs < enforce) { + tpr->pr_enforce_statfs = enforce; + if (tpr->pr_def_enforce_statfs < enforce) + tpr->pr_def_enforce_statfs = enforce; + } + } +#if defined(INET) || defined(INET6) + if (gotmaxips) { + pr->pr_max_af_ips = maxips; + if (pr->pr_def_max_af_ips > maxips) + pr->pr_def_max_af_ips = maxips; + /* Pass this restriction on to the children. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_max_af_ips > maxips) { + tpr->pr_max_af_ips = maxips; + if (tpr->pr_def_max_af_ips > maxips) + tpr->pr_def_max_af_ips = maxips; + } + } +#endif + if (name != NULL) { + onamelen = strlen(pr->pr_name); + if (ppr == &prison0) + strlcpy(pr->pr_name, name, sizeof(pr->pr_name)); + else + snprintf(pr->pr_name, sizeof(pr->pr_name), "%s.%s", + ppr->pr_name, name); + namelen = strlen(pr->pr_name); + /* Change this component of child names. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + bcopy(tpr->pr_name + onamelen, tpr->pr_name + namelen, + strlen(tpr->pr_name + onamelen) + 1); + bcopy(pr->pr_name, tpr->pr_name, namelen); + } + } if (path != NULL) { - strlcpy(pr->pr_path, path, sizeof(pr->pr_path)); + /* Try to keep a real-rooted full pathname. */ + if (path[0] == '/' && strcmp(mypr->pr_path, "/")) + snprintf(pr->pr_path, sizeof pr->pr_path, "%s%s", + mypr->pr_path, path); + else + strlcpy(pr->pr_path, path, sizeof(pr->pr_path)); pr->pr_root = root; } if (host != NULL) strlcpy(pr->pr_host, host, sizeof(pr->pr_host)); + if ((tflags = PR_ALLOW_ALL & ch_flags & ~pr_flags)) { + /* Clear allow bits on sysctl and all children. */ + pr->pr_def_perms &= ~tflags; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + tpr->pr_flags &= ~tflags; + tpr->pr_def_perms &= ~tflags; + } + } + if ((tflags = PR_RESTRICT_ALL & pr_flags)) { + /* Set restrict bits on sysctl and all children. */ + pr->pr_def_perms |= tflags; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + tpr->pr_flags |= tflags; + tpr->pr_def_perms |= tflags; + } + } /* * Persistent prisons get an extra reference, and prisons losing their * persist flag lose that reference. Only do this for existing prisons @@ -1002,6 +1484,44 @@ pr->pr_flags = (pr->pr_flags & ~ch_flags) | pr_flags; mtx_unlock(&pr->pr_mtx); + /* Locks may have prevented a complete restriction of child IP + * addresses. If so, allocate some more memory and try again. + */ +#ifdef INET + while (redo_ip4) { + ip4s = pr->pr_ip4s; + ip4 = malloc(ip4s * sizeof(*ip4), M_PRISON, M_WAITOK); + mtx_lock(&pr->pr_mtx); + redo_ip4 = 0; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip4(tpr, ip4)) { + if (ip4 != NULL) + ip4 = NULL; + else + redo_ip4 = 1; + } + } + mtx_unlock(&pr->pr_mtx); + } +#endif +#ifdef INET6 + while (redo_ip6) { + ip6s = pr->pr_ip6s; + ip6 = malloc(ip6s * sizeof(*ip6), M_PRISON, M_WAITOK); + mtx_lock(&pr->pr_mtx); + redo_ip6 = 0; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip6(tpr, ip6)) { + if (ip6 != NULL) + ip6 = NULL; + else + redo_ip6 = 1; + } + } + mtx_unlock(&pr->pr_mtx); + } +#endif + /* Let the modules do their work. */ sx_downgrade(&allprison_lock); if (created) { @@ -1054,6 +1574,11 @@ td->td_retval[0] = pr->pr_id; goto done_errmsg; + done_deref_locked: + prison_deref(pr, created + ? PD_LOCKED | PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); + goto done_releroot; done_unlock_list: sx_xunlock(&allprison_lock); done_releroot: @@ -1131,6 +1656,7 @@ } SYSCTL_JAIL_PARAM(, jid, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail ID"); +SYSCTL_JAIL_PARAM(, parent, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail parent ID"); SYSCTL_JAIL_PARAM_STRING(, name, CTLFLAG_RW, MAXHOSTNAMELEN, "Jail name"); SYSCTL_JAIL_PARAM(, cpuset, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail cpuset ID"); SYSCTL_JAIL_PARAM_STRING(, path, CTLFLAG_RD, MAXPATHLEN, "Jail root path"); @@ -1147,16 +1673,44 @@ #ifdef INET SYSCTL_JAIL_PARAM_NODE(ip4, "Jail IPv4 address virtualization"); +SYSCTL_JAIL_PARAM(, noip4, CTLTYPE_INT | CTLFLAG_RW, + "BN", "Jail w/ no IP address virtualization"); SYSCTL_JAIL_PARAM_STRUCT(_ip4, addr, CTLFLAG_RW, sizeof(struct in_addr), "S,in_addr,a", "Jail IPv4 addresses"); #endif #ifdef INET6 SYSCTL_JAIL_PARAM_NODE(ip6, "Jail IPv6 address virtualization"); +SYSCTL_JAIL_PARAM(, noip6, CTLTYPE_INT | CTLFLAG_RW, + "BN", "Jail w/ no IP address virtualization"); SYSCTL_JAIL_PARAM_STRUCT(_ip6, addr, CTLFLAG_RW, sizeof(struct in6_addr), "S,in6_addr,a", "Jail IPv6 addresses"); #endif +SYSCTL_JAIL_PARAM_NODE(perm, "Jail permissions"); +SYSCTL_JAIL_PARAM(_perm, set_hostname_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may set hostname"); +SYSCTL_JAIL_PARAM(_perm, sysvipc_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may use SYSV IPC"); +SYSCTL_JAIL_PARAM(_perm, allow_raw_sockets, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may create raw sockets"); +SYSCTL_JAIL_PARAM(_perm, chflags_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may alter system file flags"); +SYSCTL_JAIL_PARAM(_perm, mount_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may mount/unmount jail-friendly file systems"); +SYSCTL_JAIL_PARAM(_perm, allow_quotas, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may set file quotas"); +SYSCTL_JAIL_PARAM(_perm, allow_jails, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may create child jails"); +SYSCTL_JAIL_PARAM(_perm, socket_unixiproute_only, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail limited to creating UNIX/IPv4/IPv6/route sockets only"); +SYSCTL_JAIL_PARAM(_perm, enforce_statfs, CTLTYPE_INT | CTLFLAG_RW, + "I", "Jail cannot see all mounted file systems"); +#if defined(INET) || defined(INET6) +SYSCTL_JAIL_PARAM(_perm, max_af_ips, CTLTYPE_INT | CTLFLAG_RW, + "I", "Number of IP addresses a jail may have at most per address family"); +#endif + /* * struct jail_get_args { * struct iovec *iovp; @@ -1188,28 +1742,21 @@ int kern_jail_get(struct thread *td, struct uio *optuio, int flags) { - struct prison *pr; + struct prison *pr, *mypr; struct vfsopt *opt; struct vfsoptlist *opts; char *errmsg, *name; - int error, errmsg_len, errmsg_pos, i, jid, len, locked, pos; + int error, errmsg_len, errmsg_pos, fi, i, jid, len, locked, pos; if (flags & ~JAIL_GET_MASK) return (EINVAL); - if (jailed(td->td_ucred)) { - /* - * Don't allow a jailed process to see any jails, - * not even its own. - */ - vfs_opterror(opts, "jail not found"); - return (ENOENT); - } /* Get the parameter list. */ error = vfs_buildopts(optuio, &opts); if (error) return (error); errmsg_pos = vfs_getopt_pos(opts, "errmsg"); + mypr = td->td_ucred->cr_prison; /* * Find the prison specified by one of: lastjid, jid, name. @@ -1218,7 +1765,7 @@ error = vfs_copyopt(opts, "lastjid", &jid, sizeof(jid)); if (error == 0) { TAILQ_FOREACH(pr, &allprison, pr_list) { - if (pr->pr_id > jid) { + if (pr->pr_id > jid && prison_ischild(mypr, pr)) { mtx_lock(&pr->pr_mtx); if (pr->pr_ref > 0 && (pr->pr_uref > 0 || (flags & JAIL_DYING))) @@ -1237,7 +1784,7 @@ error = vfs_copyopt(opts, "jid", &jid, sizeof(jid)); if (error == 0) { if (jid != 0) { - pr = prison_find(jid); + pr = prison_find_child(mypr, jid); if (pr != NULL) { if (pr->pr_uref == 0 && !(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); @@ -1261,7 +1808,7 @@ error = EINVAL; goto done_unlock_list; } - pr = prison_find_name(name); + pr = prison_find_name(mypr, name); if (pr != NULL) { if (pr->pr_uref == 0 && !(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); @@ -1290,14 +1837,18 @@ error = vfs_setopt(opts, "jid", &pr->pr_id, sizeof(pr->pr_id)); if (error != 0 && error != ENOENT) goto done_deref; - error = vfs_setopts(opts, "name", pr->pr_name); + i = pr->pr_parent == mypr ? 0 : pr->pr_parent->pr_id; + error = vfs_setopt(opts, "parent", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done_deref; + error = vfs_setopts(opts, "name", prison_name(mypr, pr)); + if (error != 0 && error != ENOENT) + goto done_deref; error = vfs_setopt(opts, "cpuset", &pr->pr_cpuset->cs_id, sizeof(pr->pr_cpuset->cs_id)); if (error != 0 && error != ENOENT) goto done_deref; - error = vfs_setopts(opts, "path", pr->pr_path); + error = vfs_setopts(opts, "path", prison_path(mypr, pr)); if (error != 0 && error != ENOENT) goto done_deref; #ifdef INET @@ -1319,14 +1870,29 @@ error = vfs_setopts(opts, "host.hostname", pr->pr_host); if (error != 0 && error != ENOENT) goto done_deref; - i = pr->pr_flags & PR_PERSIST ? 1 : 0; - error = vfs_setopt(opts, "persist", &i, sizeof(i)); + error = vfs_setopt(opts, "perm.enforce_statfs", &pr->pr_enforce_statfs, + sizeof(pr->pr_enforce_statfs)); if (error != 0 && error != ENOENT) goto done_deref; - i = !i; - error = vfs_setopt(opts, "nopersist", &i, sizeof(i)); +#if defined(INET) || defined(INET6) + error = vfs_setopt(opts, "perm.max_af_ips", &pr->pr_max_af_ips, + sizeof(pr->pr_max_af_ips)); if (error != 0 && error != ENOENT) goto done_deref; +#endif + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) { + if (pr_flag_names[fi] == NULL) + continue; + i = (pr->pr_flags & (1 << fi)) ? 1 : 0; + error = vfs_setopt(opts, pr_flag_names[fi], &i, sizeof(i)); + if (error != 0 && error != ENOENT) + goto done_deref; + i = !i; + error = vfs_setopt(opts, pr_flag_nonames[fi], &i, sizeof(i)); + if (error != 0 && error != ENOENT) + goto done_deref; + } i = (pr->pr_uref == 0); error = vfs_setopt(opts, "dying", &i, sizeof(i)); if (error != 0 && error != ENOENT) @@ -1402,6 +1968,159 @@ } /* + * Jail permission sysctls. These are companions to the jail parameters of + * similar names, and provide the default values for child jails. + */ + +static int +sysctl_jail_perm(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + /* Get the current flag value, and convert it to a boolean. */ + i = (pr->pr_def_perms & arg2) ? 1 : 0; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + i = i ? arg2 : 0; + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if ((i & PR_ALLOW_ALL & ~pr->pr_flags) | + (arg2 & PR_RESTRICT_ALL & pr->pr_flags & ~i)) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_perms = (pr->pr_def_perms & ~arg2) | i; + /* Reflect restrictions to child jails. */ + if ((arg2 & PR_ALLOW_ALL & ~i) | (arg2 & PR_RESTRICT_ALL & i)) + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) { + cpr->pr_flags = (cpr->pr_flags & ~arg2) | i; + cpr->pr_def_perms = (cpr->pr_def_perms & ~arg2) | i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} + +SYSCTL_PROC(_security_jail, OID_AUTO, set_hostname_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_SET_HOSTNAME, sysctl_jail_perm, "I", + "Processes in jail can set their hostnames"); +SYSCTL_PROC(_security_jail, OID_AUTO, socket_unixiproute_only, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_RESTRICT_SOCKET_UNIXIPROUTE, sysctl_jail_perm, "I", + "Processes in jail are limited to creating UNIX/IP/route sockets only"); +SYSCTL_PROC(_security_jail, OID_AUTO, sysvipc_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_SYSVIPC, sysctl_jail_perm, "I", + "Processes in jail can use System V IPC primitives"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_raw_sockets, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_RAW_SOCKETS, sysctl_jail_perm, "I", + "Prison root can create raw sockets"); +SYSCTL_PROC(_security_jail, OID_AUTO, chflags_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_CHFLAGS, sysctl_jail_perm, "I", + "Processes in jail can alter system file flags"); +SYSCTL_PROC(_security_jail, OID_AUTO, mount_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_MOUNT, sysctl_jail_perm, "I", + "Processes in jail can mount/unmount jail-friendly file systems"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_quotas, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_QUOTAS, sysctl_jail_perm, "I", + "Processes in jail can set file quotas"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_jails, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_JAILS, sysctl_jail_perm, "I", + "Processes in jail can create child jails"); + +static int +sysctl_jail_enforce_statfs(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + i = pr->pr_def_enforce_statfs; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + if (i < 0 || i > 2) + return (EINVAL); + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (i < pr->pr_enforce_statfs) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_enforce_statfs = i; + /* Reflect restrictions to child jails. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) + if (cpr->pr_enforce_statfs < i) { + cpr->pr_enforce_statfs = i; + if (cpr->pr_def_enforce_statfs < i) + cpr->pr_def_enforce_statfs = i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} +SYSCTL_PROC(_security_jail, OID_AUTO, enforce_statfs, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, 0, sysctl_jail_enforce_statfs, "I", + "Processes in jail cannot see all mounted file systems"); + +#if defined(INET) || defined(INET6) +static int +sysctl_jail_max_af_ips(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + i = pr->pr_def_max_af_ips; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + if (i < 1) + return (EINVAL); + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (i > pr->pr_max_af_ips) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_max_af_ips = i; + /* Reflect restrictions to child jails. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) + if (cpr->pr_max_af_ips > i) { + cpr->pr_max_af_ips = i; + if (cpr->pr_def_max_af_ips > i) + cpr->pr_def_max_af_ips = i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} +SYSCTL_PROC(_security_jail, OID_AUTO, max_af_ips, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, 0, sysctl_jail_max_af_ips, "I", + "Number of IP addresses a jail may have at most per address family"); +#endif + +/* * struct jail_remove_args { * int jid; * }; @@ -1409,21 +2128,61 @@ int jail_remove(struct thread *td, struct jail_remove_args *uap) { - struct prison *pr; - struct proc *p; - int deuref, error; + struct prison *pr, *cpr, *lpr, *tpr; + int descend, error; error = priv_check(td, PRIV_JAIL_REMOVE); if (error) return (error); sx_xlock(&allprison_lock); - pr = prison_find(uap->jid); + pr = prison_find_child(td->td_ucred->cr_prison, uap->jid); if (pr == NULL) { sx_xunlock(&allprison_lock); return (EINVAL); } + /* Remove all descendants of this prison, then remove this prison. */ + pr->pr_ref++; + pr->pr_flags |= PR_REMOVE; + if (!LIST_EMPTY(&pr->pr_children)) { + mtx_unlock(&pr->pr_mtx); + lpr = NULL; + FOREACH_PRISON_DESCENDANT(pr, cpr, descend) { + mtx_lock(&cpr->pr_mtx); + if (cpr->pr_ref > 0) { + tpr = cpr; + cpr->pr_ref++; + cpr->pr_flags |= PR_REMOVE; + } else { + /* Already removed - do not do it again. */ + tpr = NULL; + } + mtx_unlock(&cpr->pr_mtx); + if (lpr != NULL) { + mtx_lock(&lpr->pr_mtx); + prison_remove1(lpr); + sx_xlock(&allprison_lock); + } + lpr = tpr; + } + if (lpr != NULL) { + mtx_lock(&lpr->pr_mtx); + prison_remove1(lpr); + sx_xlock(&allprison_lock); + } + mtx_lock(&pr->pr_mtx); + } + prison_remove1(pr); + return (0); +} + +static void +prison_remove1(struct prison *pr) +{ + struct proc *p; + int deuref; + /* If the prison was persistent, it is not anymore. */ deuref = 0; if (pr->pr_flags & PR_PERSIST) { @@ -1432,17 +2191,18 @@ pr->pr_flags &= ~PR_PERSIST; } - /* If there are no references left, remove the prison now. */ - if (pr->pr_ref == 0) { + /* + * jail_remove added a reference. If that's the only one, remove + * the prison now. + */ + KASSERT(pr->pr_ref > 0, + ("prison_remove1 removing a dead prison (jid=%d)", pr->pr_id)); + if (pr->pr_ref == 1) { prison_deref(pr, deuref | PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); - return (0); + return; } - /* - * Keep a temporary reference to make sure this prison sticks around. - */ - pr->pr_ref++; mtx_unlock(&pr->pr_mtx); sx_xunlock(&allprison_lock); /* @@ -1457,9 +2217,8 @@ PROC_UNLOCK(p); } sx_sunlock(&allproc_lock); - /* Remove the temporary reference. */ + /* Remove the temporary reference added by jail_remove. */ prison_deref(pr, deuref | PD_DEREF); - return (0); } @@ -1479,7 +2238,7 @@ return (error); sx_slock(&allprison_lock); - pr = prison_find(uap->jid); + pr = prison_find_child(td->td_ucred->cr_prison, uap->jid); if (pr == NULL) { sx_sunlock(&allprison_lock); return (EINVAL); @@ -1501,6 +2260,7 @@ static int do_jail_attach(struct thread *td, struct prison *pr) { + struct prison *ppr; struct proc *p; struct ucred *newcred, *oldcred; int vfslocked, error; @@ -1528,6 +2288,7 @@ /* * Reparent the newly attached process to this jail. */ + ppr = td->td_ucred->cr_prison; p = td->td_proc; error = cpuset_setproc_update_set(p, pr->pr_cpuset); if (error) @@ -1555,6 +2316,7 @@ p->p_ucred = newcred; PROC_UNLOCK(p); crfree(oldcred); + prison_deref(ppr, PD_DEREF | PD_DEUREF); return (0); e_unlock: VOP_UNLOCK(pr->pr_root, 0); @@ -1562,7 +2324,7 @@ VFS_UNLOCK_GIANT(vfslocked); e_revert_osd: /* Tell modules this thread is still in its old jail after all. */ - (void)osd_jail_call(td->td_ucred->cr_prison, PR_METHOD_ATTACH, td); + (void)osd_jail_call(ppr, PR_METHOD_ATTACH, td); prison_deref(pr, PD_DEREF | PD_DEUREF); return (error); } @@ -1588,18 +2350,42 @@ } /* - * Look for the named prison. Returns a locked prison or NULL. + * Find a prison that is a descendant of mypr. Returns a locked prison or NULL. */ struct prison * -prison_find_name(const char *name) +prison_find_child(struct prison *mypr, int prid) { + struct prison *pr; + int descend; + + sx_assert(&allprison_lock, SX_LOCKED); + FOREACH_PRISON_DESCENDANT(mypr, pr, descend) { + if (pr->pr_id == prid) { + mtx_lock(&pr->pr_mtx); + if (pr->pr_ref > 0) + return (pr); + mtx_unlock(&pr->pr_mtx); + } + } + return (NULL); +} + +/* + * Look for the name relative to mypr. Returns a locked prison or NULL. + */ +struct prison * +prison_find_name(struct prison *mypr, const char *name) +{ struct prison *pr, *deadpr; + size_t mylen; + int descend; sx_assert(&allprison_lock, SX_LOCKED); + mylen = mypr == &prison0 ? 0 : strlen(mypr->pr_name) + 1; again: deadpr = NULL; - TAILQ_FOREACH(pr, &allprison, pr_list) { - if (!strcmp(pr->pr_name, name)) { + FOREACH_PRISON_DESCENDANT(mypr, pr, descend) { + if (!strcmp(pr->pr_name + mylen, name)) { mtx_lock(&pr->pr_mtx); if (pr->pr_ref > 0) { if (pr->pr_uref > 0) @@ -1609,7 +2395,7 @@ mtx_unlock(&pr->pr_mtx); } } - /* There was no valid prison - perhaps there was a dying one */ + /* There was no valid prison - perhaps there was a dying one. */ if (deadpr != NULL) { mtx_lock(&deadpr->pr_mtx); if (deadpr->pr_ref == 0) { @@ -1663,66 +2449,87 @@ static void prison_deref(struct prison *pr, int flags) { + struct prison *ppr, *tpr; int vfslocked; if (!(flags & PD_LOCKED)) mtx_lock(&pr->pr_mtx); + /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { - pr->pr_uref--; + for (tpr = pr;; tpr = tpr->pr_parent) { + if (tpr != pr) + mtx_lock(&tpr->pr_mtx); + if (--tpr->pr_uref > 0) + break; + KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); + mtx_unlock(&tpr->pr_mtx); + } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&tpr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } + if (tpr != pr) { + mtx_unlock(&tpr->pr_mtx); + mtx_lock(&pr->pr_mtx); + } } - if (flags & PD_DEREF) - pr->pr_ref--; - /* If the prison still has references, nothing else to do. */ - if (pr->pr_ref > 0) { - mtx_unlock(&pr->pr_mtx); - if (flags & PD_LIST_SLOCKED) - sx_sunlock(&allprison_lock); - else if (flags & PD_LIST_XLOCKED) - sx_xunlock(&allprison_lock); - return; - } - KASSERT(pr->pr_uref == 0, - ("%s: Trying to remove an active prison (jid=%d).", __func__, - pr->pr_id)); - mtx_unlock(&pr->pr_mtx); - if (flags & PD_LIST_SLOCKED) { - if (!sx_try_upgrade(&allprison_lock)) { - sx_sunlock(&allprison_lock); - sx_xlock(&allprison_lock); + for (;;) { + if (flags & PD_DEREF) + pr->pr_ref--; + /* If the prison still has references, nothing else to do. */ + if (pr->pr_ref > 0) { + mtx_unlock(&pr->pr_mtx); + if (flags & PD_LIST_SLOCKED) + sx_sunlock(&allprison_lock); + else if (flags & PD_LIST_XLOCKED) + sx_xunlock(&allprison_lock); + return; } - } else if (!(flags & PD_LIST_XLOCKED)) - sx_xlock(&allprison_lock); - TAILQ_REMOVE(&allprison, pr, pr_list); - prisoncount--; - sx_xunlock(&allprison_lock); + mtx_unlock(&pr->pr_mtx); + if (flags & PD_LIST_SLOCKED) { + if (!sx_try_upgrade(&allprison_lock)) { + sx_sunlock(&allprison_lock); + sx_xlock(&allprison_lock); + } + } else if (!(flags & PD_LIST_XLOCKED)) + sx_xlock(&allprison_lock); - if (pr->pr_root != NULL) { - vfslocked = VFS_LOCK_GIANT(pr->pr_root->v_mount); - vrele(pr->pr_root); - VFS_UNLOCK_GIANT(vfslocked); - } - mtx_destroy(&pr->pr_mtx); + TAILQ_REMOVE(&allprison, pr, pr_list); + LIST_REMOVE(pr, pr_sibling); + ppr = pr->pr_parent; + for (tpr = ppr; tpr != NULL; tpr = tpr->pr_parent) + tpr->pr_prisoncount--; + sx_downgrade(&allprison_lock); + + if (pr->pr_root != NULL) { + vfslocked = VFS_LOCK_GIANT(pr->pr_root->v_mount); + vrele(pr->pr_root); + VFS_UNLOCK_GIANT(vfslocked); + } + mtx_destroy(&pr->pr_mtx); #ifdef INET - free(pr->pr_ip4, M_PRISON); + free(pr->pr_ip4, M_PRISON); #endif #ifdef INET6 - free(pr->pr_ip6, M_PRISON); + free(pr->pr_ip6, M_PRISON); #endif - if (pr->pr_cpuset != NULL) - cpuset_rel(pr->pr_cpuset); - osd_jail_exit(pr); - free(pr, M_PRISON); + if (pr->pr_cpuset != NULL) + cpuset_rel(pr->pr_cpuset); + osd_jail_exit(pr); + free(pr, M_PRISON); + + /* Removing a prison frees a reference on its parent. */ + pr = ppr; + mtx_lock(&pr->pr_mtx); + flags = PD_DEREF | PD_LIST_SLOCKED; + } } void @@ -1768,10 +2575,94 @@ #ifdef INET /* + * Restrict a prison's IP address list with its parent's, possibly replacing + * it. Return true if the replacement buffer was used (or would have been). + */ +static int +prison_restrict_ip4(struct prison *pr, struct in_addr *newip4) +{ + int ii, ij, used; + struct prison *ppr; + + ppr = pr->pr_parent; + if (!(pr->pr_flags & PR_IP4_USER)) { + /* This has no user settings, so just copy the parent's list. */ + if (pr->pr_ip4s < ppr->pr_ip4s) { + /* + * There's no room for the parent's list. Use the + * new list buffer, which is assumed to be big enough + * (if it was passed). If there's no buffer, try to + * allocate one. + */ + used = 1; + if (newip4 == NULL) { + newip4 = malloc(ppr->pr_ip4s * sizeof(*newip4), + M_PRISON, M_NOWAIT); + if (newip4 != NULL) + used = 0; + } + if (newip4 != NULL) { + pr->pr_ip4s = ppr->pr_ip4s; + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = newip4; + bcopy(ppr->pr_ip4, newip4, + pr->pr_ip4s * sizeof(*newip4)); + pr->pr_flags |= PR_IP4; + } + return (used); + } + pr->pr_ip4s = ppr->pr_ip4s; + if (pr->pr_ip4s > 0) + bcopy(ppr->pr_ip4, pr->pr_ip4, + pr->pr_ip4s * sizeof(*newip4)); + else if (pr->pr_ip4 != NULL) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = NULL; + } + pr->pr_flags = + (pr->pr_flags & ~PR_IP4) | (ppr->pr_flags & PR_IP4); + } else if (pr->pr_ip4s > 0 && (ppr->pr_flags & PR_IP4)) { + /* Remove addresses that aren't in the parent. */ + for (ij = 0; ij < ppr->pr_ip4s; ij++) + if (pr->pr_ip4[0].s_addr == ppr->pr_ip4[ij].s_addr) + break; + if (ij == ppr->pr_ip4s) + bcopy(pr->pr_ip4 + 1, pr->pr_ip4, + --pr->pr_ip4s * sizeof(*pr->pr_ip4)); + for (ii = ij = 1; ii < pr->pr_ip4s; ii++) { + if (pr->pr_ip4[ii].s_addr == ppr->pr_ip4[0].s_addr) + continue; + for (; ij < ppr->pr_ip4s; ij++) { + if (qcmp_v4(&pr->pr_ip4[ii], + &ppr->pr_ip4[ij].s_addr) <= 0) + break; + } + if (ij == ppr->pr_ip4s) { + pr->pr_ip4s = ii; + break; + } + if (qcmp_v4(&pr->pr_ip4[ii], &ppr->pr_ip4[ij]) > 0) { + if (ii < --pr->pr_ip4s) + bcopy(pr->pr_ip4 + ii + 1, + pr->pr_ip4 + ii, + (pr->pr_ip4s - ii) * + sizeof(*pr->pr_ip4)); + ii--; + } + } + if (pr->pr_ip4s == 0) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = NULL; + } + } + return (0); +} + +/* * Pass back primary IPv4 address of this jail. * - * If not jailed return success but do not alter the address. Caller has to - * make sure to initialize it correctly (e.g. INADDR_ANY). + * If not restricted return success but do not alter the address. Caller has + * to make sure to initialize it correctly (e.g. INADDR_ANY). * * Returns 0 on success, EAFNOSUPPORT if the jail doesn't allow IPv4. * Address returned in NBO. @@ -1784,10 +2675,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1799,12 +2694,36 @@ } /* + * Return true if pr1 and pr2 have the same IPv4 address restrictions. + */ +int +prison_equal_ip4(struct prison *pr1, struct prison *pr2) +{ + if (pr1 == pr2) + return (1); + + /* + * jail_set maintains an exclusive hold on allprison_lock while it + * changes the IP addresses, so only a shared hold is needed. This is + * easier than locking the two prisons which would require finding the + * proper locking order and end up needing allprison_lock anyway. + */ + sx_slock(&allprison_lock); + while (pr1 != &prison0 && !(pr1->pr_flags & PR_IP4_USER)) + pr1 = pr1->pr_parent; + while (pr2 != &prison0 && !(pr2->pr_flags & PR_IP4_USER)) + pr2 = pr2->pr_parent; + sx_sunlock(&allprison_lock); + return (pr1 == pr2); +} + +/* * Make sure our (source) address is set to something meaningful to this * jail. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv4. - * Address passed in in NBO and returned in NBO. + * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv4. Address passed in in NBO and returned in NBO. */ int prison_local_ip4(struct ucred *cred, struct in_addr *ia) @@ -1816,10 +2735,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1861,10 +2784,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1886,9 +2813,9 @@ /* * Check if given address belongs to the jail referenced by cred/prison. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv4. - * Address passed in in NBO. + * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv4. Address passed in in NBO. */ static int _prison_check_ip4(struct prison *pr, struct in_addr *ia) @@ -1929,10 +2856,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1945,11 +2876,93 @@ #endif #ifdef INET6 +static int +prison_restrict_ip6(struct prison *pr, struct in6_addr *newip6) +{ + int ii, ij, used; + struct prison *ppr; + + ppr = pr->pr_parent; + if (!(pr->pr_flags & PR_IP6_USER)) { + /* This has no user settings, so just copy the parent's list. */ + if (pr->pr_ip6s < ppr->pr_ip6s) { + /* + * There's no room for the parent's list. Use the + * new list buffer, which is assumed to be big enough + * (if it was passed). If there's no buffer, try to + * allocate one. + */ + used = 1; + if (newip6 == NULL) { + newip6 = malloc(ppr->pr_ip6s * sizeof(*newip6), + M_PRISON, M_NOWAIT); + if (newip6 != NULL) + used = 0; + } + if (newip6 != NULL) { + pr->pr_ip6s = ppr->pr_ip6s; + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = newip6; + bcopy(ppr->pr_ip6, newip6, + ppr->pr_ip6s * sizeof(*newip6)); + pr->pr_flags |= PR_IP6; + } + return (used); + } + pr->pr_ip6s = ppr->pr_ip6s; + if (pr->pr_ip6s > 0) + bcopy(ppr->pr_ip6, pr->pr_ip6, + pr->pr_ip6s * sizeof(*newip6)); + else if (pr->pr_ip6 != NULL) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = NULL; + } + pr->pr_flags = + (pr->pr_flags & ~PR_IP6) | (ppr->pr_flags & PR_IP6); + } else if (pr->pr_ip6s > 0 && (ppr->pr_flags & PR_IP6)) { + /* Remove addresses that aren't in the parent. */ + for (ij = 0; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL(&pr->pr_ip6[0], + &ppr->pr_ip6[ij])) + break; + if (ij == ppr->pr_ip6s) + bcopy(pr->pr_ip6 + 1, pr->pr_ip6, + --pr->pr_ip6s * sizeof(*pr->pr_ip6)); + for (ii = ij = 1; ii < pr->pr_ip6s; ii++) { + if (IN6_ARE_ADDR_EQUAL(&pr->pr_ip6[ii], + &ppr->pr_ip6[0])) + continue; + for (; ij < ppr->pr_ip6s; ij++) { + if (qcmp_v6(&pr->pr_ip6[ii], + &ppr->pr_ip6[ij]) <= 0) + break; + } + if (ij == ppr->pr_ip6s) { + pr->pr_ip6s = ii; + break; + } + if (qcmp_v6(&pr->pr_ip6[ii], &ppr->pr_ip6[ij]) > 0) { + if (ii < --pr->pr_ip6s) + bcopy(pr->pr_ip6 + ii + 1, + pr->pr_ip6 + ii, + (pr->pr_ip6s - ii) * + sizeof(*pr->pr_ip6)); + ii--; + } + } + if (pr->pr_ip6s == 0) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = NULL; + } + } + return 0; +} + /* * Pass back primary IPv6 address for this jail. * - * If not jailed return success but do not alter the address. Caller has to - * make sure to initialize it correctly (e.g. IN6ADDR_ANY_INIT). + * If not restricted return success but do not alter the address. Caller has + * to make sure to initialize it correctly (e.g. IN6ADDR_ANY_INIT). * * Returns 0 on success, EAFNOSUPPORT if the jail doesn't allow IPv6. */ @@ -1961,10 +2974,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1976,13 +2993,32 @@ } /* + * Return true if pr1 and pr2 have the same IPv6 address restrictions. + */ +int +prison_equal_ip6(struct prison *pr1, struct prison *pr2) +{ + if (pr1 == pr2) + return (1); + + sx_slock(&allprison_lock); + while (pr1 != &prison0 && !(pr1->pr_flags & PR_IP6_USER)) + pr1 = pr1->pr_parent; + while (pr2 != &prison0 && !(pr2->pr_flags & PR_IP6_USER)) + pr2 = pr1->pr_parent; + sx_sunlock(&allprison_lock); + return (pr1 == pr2); +} + +/* * Make sure our (source) address is set to something meaningful to this jail. * * v6only should be set based on (inp->inp_flags & IN6P_IPV6_V6ONLY != 0) * when needed while binding. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv6. + * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv6. */ int prison_local_ip6(struct ucred *cred, struct in6_addr *ia6, int v6only) @@ -1993,10 +3029,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2037,10 +3077,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2062,8 +3106,9 @@ /* * Check if given address belongs to the jail referenced by cred/prison. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv6. + * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv6. */ static int _prison_check_ip6(struct prison *pr, struct in6_addr *ia6) @@ -2104,10 +3149,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2128,34 +3177,42 @@ int prison_check_af(struct ucred *cred, int af) { + struct prison *pr; int error; KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); - - if (!jailed(cred)) - return (0); - + pr = cred->cr_prison; error = 0; switch (af) { #ifdef INET case AF_INET: - if (cred->cr_prison->pr_ip4 == NULL) - error = EAFNOSUPPORT; + if (pr->pr_flags & PR_IP4) + { + mtx_lock(&pr->pr_mtx); + if ((pr->pr_flags & PR_IP4) && pr->pr_ip4 == NULL) + error = EAFNOSUPPORT; + mtx_unlock(&pr->pr_mtx); + } break; #endif #ifdef INET6 case AF_INET6: - if (cred->cr_prison->pr_ip6 == NULL) - error = EAFNOSUPPORT; + if (pr->pr_flags & PR_IP6) + { + mtx_lock(&pr->pr_mtx); + if ((pr->pr_flags & PR_IP6) && pr->pr_ip6 == NULL) + error = EAFNOSUPPORT; + mtx_unlock(&pr->pr_mtx); + } break; #endif case AF_LOCAL: case AF_ROUTE: break; default: - if (jail_socket_unixiproute_only) + if (pr->pr_flags & PR_RESTRICT_SOCKET_UNIXIPROUTE) error = EAFNOSUPPORT; } return (error); @@ -2165,9 +3222,9 @@ * Check if given address belongs to the jail referenced by cred (wrapper to * prison_check_ip[46]). * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow - * the address family. IPv4 Address passed in in NBO. + * Returns 0 if jail doesn't restrict the address family or if address belongs + * to jail, EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if + * the jail doesn't allow the address family. IPv4 Address passed in in NBO. */ int prison_if(struct ucred *cred, struct sockaddr *sa) @@ -2199,7 +3256,7 @@ break; #endif default: - if (jailed(cred) && jail_socket_unixiproute_only) + if (cred->cr_prison->pr_flags & PR_RESTRICT_SOCKET_UNIXIPROUTE) error = EAFNOSUPPORT; } return (error); @@ -2212,13 +3269,20 @@ prison_check(struct ucred *cred1, struct ucred *cred2) { - if (jailed(cred1)) { - if (!jailed(cred2)) - return (ESRCH); - if (cred2->cr_prison != cred1->cr_prison) - return (ESRCH); - } + return (cred1->cr_prison == cred2->cr_prison || + prison_ischild(cred1->cr_prison, cred2->cr_prison) ? 0 : ESRCH); +} +/* + * Return 1 if p2 is a child of p1, otherwise 0. + */ +int +prison_ischild(struct prison *pr1, struct prison *pr2) +{ + + for (pr2 = pr2->pr_parent; pr2 != NULL; pr2 = pr2->pr_parent) + if (pr1 == pr2) + return (1); return (0); } @@ -2229,7 +3293,7 @@ jailed(struct ucred *cred) { - return (cred->cr_prison != NULL); + return (cred->cr_prison != &prison0); } /* @@ -2265,12 +3329,12 @@ struct statfs *sp; size_t len; - if (!jailed(cred) || jail_enforce_statfs == 0) + pr = cred->cr_prison; + if (pr->pr_enforce_statfs == 0) return (0); - pr = cred->cr_prison; if (pr->pr_root->v_mount == mp) return (0); - if (jail_enforce_statfs == 2) + if (pr->pr_enforce_statfs == 2) return (ENOENT); /* * If jail's chroot directory is set to "/" we should be able to see @@ -2300,9 +3364,9 @@ struct prison *pr; size_t len; - if (!jailed(cred) || jail_enforce_statfs == 0) + pr = cred->cr_prison; + if (pr->pr_enforce_statfs == 0) return; - pr = cred->cr_prison; if (prison_canseemount(cred, mp) != 0) { bzero(sp->f_mntonname, sizeof(sp->f_mntonname)); strlcpy(sp->f_mntonname, "[restricted]", @@ -2416,6 +3480,13 @@ case PRIV_MQ_ADMIN: /* + * Jail operations within a jail work on child jails. + */ + case PRIV_JAIL_ATTACH: + case PRIV_JAIL_SET: + case PRIV_JAIL_REMOVE: + + /* * Jail implements its own inter-process limits, so allow * root processes in jail to change scheduling on other * processes in the same jail. Likewise for signalling. @@ -2467,7 +3538,7 @@ * setting system flags. */ case PRIV_VFS_SYSFLAGS: - if (jail_chflags_allowed) + if (cred->cr_prison->pr_flags & PR_ALLOW_CHFLAGS) return (0); else return (EPERM); @@ -2480,7 +3551,7 @@ case PRIV_VFS_UNMOUNT: case PRIV_VFS_MOUNT_NONUSER: case PRIV_VFS_MOUNT_OWNER: - if (jail_mount_allowed) + if (cred->cr_prison->pr_flags & PR_ALLOW_MOUNT) return (0); else return (EPERM); @@ -2503,7 +3574,7 @@ * Conditionally allow creating raw sockets in jail. */ case PRIV_NETINET_RAW: - if (jail_allow_raw_sockets) + if (cred->cr_prison->pr_flags & PR_ALLOW_RAW_SOCKETS) return (0); else return (EPERM); @@ -2526,11 +3597,61 @@ } } +/* + * Return the part of pr2's name that is relative to pr1, or the whole name + * if it does not directly follow. + */ + +char * +prison_name(struct prison *pr1, struct prison *pr2) +{ + char *name; + + /* Jails see themselves as "0" (if they see themselves at all). */ + if (pr1 == pr2) + return "0"; + name = pr2->pr_name; + if (prison_ischild(pr1, pr2)) { + /* + * pr1 isn't locked (and allprison_lock may not be either) + * so its length can't be counted on. But the number of dots + * can be counted on - and counted. + */ + for (; pr1 != &prison0; pr1 = pr1->pr_parent) + name = strchr(name, '.') + 1; + } + return (name); +} + +/* + * Return the part of pr2's path that is relative to pr1, or the whole path + * if it does not directly follow. + */ +static char * +prison_path(struct prison *pr1, struct prison *pr2) +{ + char *path1, *path2; + int len1; + + path1 = pr1->pr_path; + path2 = pr2->pr_path; + if (!strcmp(path1, "/")) + return (path2); + len1 = strlen(path1); + if (strncmp(path1, path2, len1)) + return (path2); + if (path2[len1] == '\0') + return "/"; + if (path2[len1] == '/') + return (path2 + len1); + return (path2); +} + static int sysctl_jail_list(SYSCTL_HANDLER_ARGS) { struct xprison *xp; - struct prison *pr; + struct prison *pr, *cpr; #ifdef INET struct in_addr *ip4 = NULL; int ip4s = 0; @@ -2539,62 +3660,60 @@ struct in_addr *ip6 = NULL; int ip6s = 0; #endif - int error; + int descend, error; - if (jailed(req->td->td_ucred)) - return (0); - xp = malloc(sizeof(*xp), M_TEMP, M_WAITOK); + pr = req->td->td_ucred->cr_prison; error = 0; sx_slock(&allprison_lock); - TAILQ_FOREACH(pr, &allprison, pr_list) { + FOREACH_PRISON_DESCENDANT(pr, cpr, descend) { again: - mtx_lock(&pr->pr_mtx); + mtx_lock(&cpr->pr_mtx); #ifdef INET - if (pr->pr_ip4s > 0) { - if (ip4s < pr->pr_ip4s) { - ip4s = pr->pr_ip4s; - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ip4s > 0) { + if (ip4s < cpr->pr_ip4s) { + ip4s = cpr->pr_ip4s; + mtx_unlock(&cpr->pr_mtx); ip4 = realloc(ip4, ip4s * sizeof(struct in_addr), M_TEMP, M_WAITOK); goto again; } - bcopy(pr->pr_ip4, ip4, - pr->pr_ip4s * sizeof(struct in_addr)); + bcopy(cpr->pr_ip4, ip4, + cpr->pr_ip4s * sizeof(struct in_addr)); } #endif #ifdef INET6 - if (pr->pr_ip6s > 0) { - if (ip6s < pr->pr_ip6s) { - ip6s = pr->pr_ip6s; - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ip6s > 0) { + if (ip6s < cpr->pr_ip6s) { + ip6s = cpr->pr_ip6s; + mtx_unlock(&cpr->pr_mtx); ip6 = realloc(ip6, ip6s * sizeof(struct in6_addr), M_TEMP, M_WAITOK); goto again; } - bcopy(pr->pr_ip6, ip6, - pr->pr_ip6s * sizeof(struct in6_addr)); + bcopy(cpr->pr_ip6, ip6, + cpr->pr_ip6s * sizeof(struct in6_addr)); } #endif - if (pr->pr_ref == 0) { - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ref == 0) { + mtx_unlock(&cpr->pr_mtx); continue; } bzero(xp, sizeof(*xp)); xp->pr_version = XPRISON_VERSION; - xp->pr_id = pr->pr_id; - xp->pr_state = pr->pr_uref > 0 + xp->pr_id = cpr->pr_id; + xp->pr_state = cpr->pr_uref > 0 ? PRISON_STATE_ALIVE : PRISON_STATE_DYING; - strlcpy(xp->pr_path, pr->pr_path, sizeof(xp->pr_path)); - strlcpy(xp->pr_host, pr->pr_host, sizeof(xp->pr_host)); - strlcpy(xp->pr_name, pr->pr_name, sizeof(xp->pr_name)); + strlcpy(xp->pr_path, prison_path(pr, cpr), sizeof(xp->pr_path)); + strlcpy(xp->pr_host, cpr->pr_host, sizeof(xp->pr_host)); + strlcpy(xp->pr_name, prison_name(pr, cpr), sizeof(xp->pr_name)); #ifdef INET - xp->pr_ip4s = pr->pr_ip4s; + xp->pr_ip4s = cpr->pr_ip4s; #endif #ifdef INET6 - xp->pr_ip6s = pr->pr_ip6s; + xp->pr_ip6s = cpr->pr_ip6s; #endif - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&cpr->pr_mtx); error = SYSCTL_OUT(req, xp, sizeof(*xp)); if (error) break; @@ -2649,6 +3768,7 @@ static void db_show_prison(struct prison *pr) { + int fi; #if defined(INET) || defined(INET6) int ii; #endif @@ -2659,6 +3779,7 @@ db_printf("prison %p:\n", pr); db_printf(" jid = %d\n", pr->pr_id); db_printf(" name = %s\n", pr->pr_name); + db_printf(" parent = %p\n", pr->pr_parent); db_printf(" ref = %d\n", pr->pr_ref); db_printf(" uref = %d\n", pr->pr_uref); db_printf(" path = %s\n", pr->pr_path); @@ -2666,10 +3787,18 @@ ? pr->pr_cpuset->cs_id : -1); db_printf(" root = %p\n", pr->pr_root); db_printf(" securelevel = %d\n", pr->pr_securelevel); + db_printf(" child = %p\n", LIST_FIRST(&pr->pr_children)); + db_printf(" sibling = %p\n", LIST_NEXT(pr, pr_sibling)); db_printf(" flags = %x", pr->pr_flags); - if (pr->pr_flags & PR_PERSIST) - db_printf(" persist"); + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) + if (pr_flag_names[fi] != NULL && (pr->pr_flags & (1 << fi))) + db_printf(" %s", pr_flag_names[fi]); db_printf("\n"); + db_printf(" enforce_statfs = %d\n", pr->pr_enforce_statfs); +#if defined(INET) || defined(INET6) + db_printf(" max_af_ips = %d\n", pr->pr_max_af_ips); +#endif db_printf(" host.hostname = %s\n", pr->pr_host); #ifdef INET db_printf(" ip4s = %d\n", pr->pr_ip4s); @@ -2692,7 +3821,11 @@ struct prison *pr; if (!have_addr) { - /* Show all prisons in the list. */ + /* + * Show all prisons in the list, and prison0 which is not + * listed. + */ + db_show_prison(&prison0); TAILQ_FOREACH(pr, &allprison, pr_list) { db_show_prison(pr); if (db_pager_quit) @@ -2701,18 +3834,22 @@ return; } - /* Look for a prison with the ID and with references. */ - TAILQ_FOREACH(pr, &allprison, pr_list) - if (pr->pr_id == addr && pr->pr_ref > 0) - break; - if (pr == NULL) - /* Look again, without requiring a reference. */ + if (addr == 0) + pr = &prison0; + else { + /* Look for a prison with the ID and with references. */ TAILQ_FOREACH(pr, &allprison, pr_list) - if (pr->pr_id == addr) + if (pr->pr_id == addr && pr->pr_ref > 0) break; - if (pr == NULL) - /* Assume address points to a valid prison. */ - pr = (struct prison *)addr; + if (pr == NULL) + /* Look again, without requiring a reference. */ + TAILQ_FOREACH(pr, &allprison, pr_list) + if (pr->pr_id == addr) + break; + if (pr == NULL) + /* Assume address points to a valid prison. */ + pr = (struct prison *)addr; + } db_show_prison(pr); } Index: sys/kern/sysv_msg.c =================================================================== --- sys/kern/sysv_msg.c (revision 191896) +++ sys/kern/sysv_msg.c (working copy) @@ -337,7 +337,7 @@ { int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(msgcalls)/sizeof(msgcalls[0])) @@ -410,7 +410,7 @@ int rval, error, msqix; register struct msqid_kernel *msqkptr; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); msqix = IPCID_TO_IX(msqid); @@ -564,7 +564,7 @@ DPRINTF(("msgget(0x%x, 0%o)\n", key, msgflg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&msq_mtx); @@ -674,7 +674,7 @@ register struct msg *msghdr; short next; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&msq_mtx); @@ -1012,7 +1012,7 @@ int msqix, error = 0; short next; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); msqix = IPCID_TO_IX(msqid); Index: sys/kern/vfs_syscalls.c =================================================================== --- sys/kern/vfs_syscalls.c (revision 191896) +++ sys/kern/vfs_syscalls.c (working copy) @@ -164,12 +164,6 @@ return (0); } -/* XXX PRISON: could be per prison flag */ -static int prison_quotas; -#if 0 -SYSCTL_INT(_kern_prison, OID_AUTO, quotas, CTLFLAG_RW, &prison_quotas, 0, ""); -#endif - /* * Change filesystem quotas. */ @@ -198,7 +192,7 @@ AUDIT_ARG(cmd, uap->cmd); AUDIT_ARG(uid, uap->uid); - if (jailed(td->td_ucred) && !prison_quotas) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_QUOTAS)) return (EPERM); NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | MPSAFE | AUDITVNODE1, UIO_USERSPACE, uap->path, td); Index: sys/kern/init_main.c =================================================================== --- sys/kern/init_main.c (revision 191896) +++ sys/kern/init_main.c (working copy) @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -436,6 +437,7 @@ td->td_oncpu = 0; td->td_flags = TDF_INMEM|TDP_KTHREAD; td->td_cpuset = cpuset_thread0(); + prison0.pr_cpuset = cpuset_ref(td->td_cpuset); p->p_peers = 0; p->p_leader = p; @@ -452,7 +454,7 @@ p->p_ucred->cr_ngroups = 1; /* group 0 */ p->p_ucred->cr_uidinfo = uifind(0); p->p_ucred->cr_ruidinfo = uifind(0); - p->p_ucred->cr_prison = NULL; /* Don't jail it. */ + p->p_ucred->cr_prison = &prison0; #ifdef VIMAGE p->p_ucred->cr_vnet = LIST_FIRST(&vnet_head); #endif Index: sys/kern/sysv_sem.c =================================================================== --- sys/kern/sysv_sem.c (revision 191896) +++ sys/kern/sysv_sem.c (working copy) @@ -344,7 +344,7 @@ { int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(semcalls)/sizeof(semcalls[0])) @@ -583,7 +583,7 @@ DPRINTF(("call to semctl(%d, %d, %d, 0x%p)\n", semid, semnum, cmd, arg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); array = NULL; @@ -855,7 +855,7 @@ struct ucred *cred = td->td_ucred; DPRINTF(("semget(0x%x, %d, 0%o)\n", key, nsems, semflg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&sem_mtx); @@ -982,7 +982,7 @@ #endif DPRINTF(("call to semop(%d, %p, %u)\n", semid, sops, nsops)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); semid = IPCID_TO_IX(semid); /* Convert back to zero origin */ Index: sys/kern/kern_proc.c =================================================================== --- sys/kern/kern_proc.c (revision 191896) +++ sys/kern/kern_proc.c (working copy) @@ -739,8 +739,8 @@ /* If jailed(cred), emulate the old P_JAILED flag. */ if (jailed(cred)) { kp->ki_flag |= P_JAILED; - /* If inside a jail, use 0 as a jail ID. */ - if (!jailed(curthread->td_ucred)) + /* If inside the jail, use 0 as a jail ID. */ + if (cred->cr_prison != curthread->td_ucred->cr_prison) kp->ki_jid = cred->cr_prison->pr_id; } } Index: sys/kern/kern_linker.c =================================================================== --- sys/kern/kern_linker.c (revision 191896) +++ sys/kern/kern_linker.c (working copy) @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -375,7 +376,7 @@ int foundfile, error; /* Refuse to load modules if securelevel raised */ - if (securelevel > 0) + if (prison0.pr_securelevel > 0) return (EPERM); KLD_LOCK_ASSERT(); @@ -580,7 +581,7 @@ int error, i; /* Refuse to unload modules if securelevel raised. */ - if (securelevel > 0) + if (prison0.pr_securelevel > 0) return (EPERM); KLD_LOCK_ASSERT(); Index: sys/kern/sysv_shm.c =================================================================== --- sys/kern/sysv_shm.c (revision 191896) +++ sys/kern/sysv_shm.c (working copy) @@ -303,7 +303,7 @@ int i; int error = 0; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmmap_s = p->p_vmspace->vm_shm; @@ -357,7 +357,7 @@ int rv; int error = 0; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmmap_s = p->p_vmspace->vm_shm; @@ -480,7 +480,7 @@ struct shmid_kernel *shmseg; struct oshmid_ds outbuf; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmseg = shm_find_segment_by_shmid(uap->shmid); @@ -542,7 +542,7 @@ int error = 0; struct shmid_kernel *shmseg; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); @@ -823,7 +823,7 @@ int segnum, mode; int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); mode = uap->shmflg & ACCESSPERMS; @@ -861,7 +861,7 @@ #if defined(__i386__) && (defined(COMPAT_FREEBSD4) || defined(COMPAT_43)) int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(shmcalls)/sizeof(shmcalls[0])) Index: sys/kern/vfs_mount.c =================================================================== --- sys/kern/vfs_mount.c (revision 191896) +++ sys/kern/vfs_mount.c (working copy) @@ -1421,6 +1421,11 @@ root_mount_done(void) { + /* Keep prison0's root in sync with the global rootvnode. */ + mtx_lock(&prison0.pr_mtx); + prison0.pr_root = rootvnode; + vref(prison0.pr_root); + mtx_unlock(&prison0.pr_mtx); /* * Use a mutex to prevent the wakeup being missed and waiting for * an extra 1 second sleep. Index: sys/kern/kern_exit.c =================================================================== --- sys/kern/kern_exit.c (revision 191896) +++ sys/kern/kern_exit.c (working copy) @@ -454,9 +454,8 @@ p->p_xstat = rv; p->p_xthread = td; - /* In case we are jailed tell the prison that we are gone. */ - if (jailed(p->p_ucred)) - prison_proc_free(p->p_ucred->cr_prison); + /* Tell the prison that we are gone. */ + prison_proc_free(p->p_ucred->cr_prison); #ifdef KDTRACE_HOOKS /* Index: sys/kern/kern_prot.c =================================================================== --- sys/kern/kern_prot.c (revision 191896) +++ sys/kern/kern_prot.c (working copy) @@ -1262,33 +1262,25 @@ * (securelevel >= level). Note that the logic is inverted -- these * functions return EPERM on "success" and 0 on "failure". * + * Due to care taken when setting the securelevel, we know that no jail will + * be less secure that its parent (or the physical system), so it is sufficient + * to test the current jail only. + * * XXXRW: Possibly since this has to do with privilege, it should move to * kern_priv.c. */ int securelevel_gt(struct ucred *cr, int level) { - int active_securelevel; - active_securelevel = securelevel; - KASSERT(cr != NULL, ("securelevel_gt: null cr")); - if (cr->cr_prison != NULL) - active_securelevel = imax(cr->cr_prison->pr_securelevel, - active_securelevel); - return (active_securelevel > level ? EPERM : 0); + return (cr->cr_prison->pr_securelevel > level ? EPERM : 0); } int securelevel_ge(struct ucred *cr, int level) { - int active_securelevel; - active_securelevel = securelevel; - KASSERT(cr != NULL, ("securelevel_ge: null cr")); - if (cr->cr_prison != NULL) - active_securelevel = imax(cr->cr_prison->pr_securelevel, - active_securelevel); - return (active_securelevel >= level ? EPERM : 0); + return (cr->cr_prison->pr_securelevel >= level ? EPERM : 0); } /* @@ -1822,7 +1814,7 @@ /* * Free a prison, if any. */ - if (jailed(cr)) + if (cr->cr_prison != NULL) prison_free(cr->cr_prison); #ifdef AUDIT audit_cred_destroy(cr); @@ -1857,8 +1849,7 @@ (caddr_t)&src->cr_startcopy)); uihold(dest->cr_uidinfo); uihold(dest->cr_ruidinfo); - if (jailed(dest)) - prison_hold(dest->cr_prison); + prison_hold(dest->cr_prison); #ifdef AUDIT audit_cred_copy(src, dest); #endif Index: sys/kern/kern_descrip.c =================================================================== --- sys/kern/kern_descrip.c (revision 191896) +++ sys/kern/kern_descrip.c (working copy) @@ -2363,24 +2363,25 @@ } /* - * Scan all active processes to see if any of them have a current or root - * directory of `olddp'. If so, replace them with the new mount point. + * Scan all active processes and prisons to see if any of them have a current + * or root directory of `olddp'. If so, replace them with the new mount point. */ void mountcheckdirs(struct vnode *olddp, struct vnode *newdp) { struct filedesc *fdp; + struct prison *pr; struct proc *p; int nrele; if (vrefcnt(olddp) == 1) return; + nrele = 0; sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { fdp = fdhold(p); if (fdp == NULL) continue; - nrele = 0; FILEDESC_XLOCK(fdp); if (fdp->fd_cdir == olddp) { vref(newdp); @@ -2392,17 +2393,40 @@ fdp->fd_rdir = newdp; nrele++; } + if (fdp->fd_jdir == olddp) { + vref(newdp); + fdp->fd_jdir = newdp; + nrele++; + } FILEDESC_XUNLOCK(fdp); fddrop(fdp); - while (nrele--) - vrele(olddp); } sx_sunlock(&allproc_lock); if (rootvnode == olddp) { - vrele(rootvnode); vref(newdp); rootvnode = newdp; + nrele++; } + mtx_lock(&prison0.pr_mtx); + if (prison0.pr_root == olddp) { + vref(newdp); + prison0.pr_root = newdp; + nrele++; + } + mtx_unlock(&prison0.pr_mtx); + sx_slock(&allprison_lock); + TAILQ_FOREACH(pr, &allprison, pr_list) { + mtx_lock(&pr->pr_mtx); + if (pr->pr_root == olddp) { + vref(newdp); + pr->pr_root = newdp; + nrele++; + } + mtx_unlock(&pr->pr_mtx); + } + sx_sunlock(&allprison_lock); + while (nrele--) + vrele(olddp); } struct filedesc_to_leader * Index: sys/kern/kern_fork.c =================================================================== --- sys/kern/kern_fork.c (revision 191896) +++ sys/kern/kern_fork.c (working copy) @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -54,7 +55,6 @@ #include #include #include -#include #include #include #include @@ -455,9 +455,8 @@ p2->p_ucred = crhold(td->td_ucred); - /* In case we are jailed tell the prison that we exist. */ - if (jailed(p2->p_ucred)) - prison_proc_hold(p2->p_ucred->cr_prison); + /* Tell the prison that we exist. */ + prison_proc_hold(p2->p_ucred->cr_prison); PROC_UNLOCK(p2); Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c (revision 191896) +++ sys/kern/kern_cpuset.c (working copy) @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -53,7 +54,6 @@ #include #include #include -#include /* Must come after sys/proc.h */ #include @@ -225,23 +225,16 @@ KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); if (set != NULL && jailed(td->td_ucred)) { - struct cpuset *rset, *jset; - struct prison *pr; + struct cpuset *jset, *tset; - rset = cpuset_refroot(set); - - pr = td->td_ucred->cr_prison; - mtx_lock(&pr->pr_mtx); - cpuset_ref(pr->pr_cpuset); - jset = pr->pr_cpuset; - mtx_unlock(&pr->pr_mtx); - - if (jset->cs_id != rset->cs_id) { + jset = td->td_ucred->cr_prison->pr_cpuset; + for (tset = set; tset != NULL; tset = tset->cs_parent) + if (tset == jset) + break; + if (tset == NULL) { cpuset_rel(set); set = NULL; } - cpuset_rel(jset); - cpuset_rel(rset); } return (set); @@ -303,7 +296,7 @@ /* * Recursively check for errors that would occur from applying mask to * the tree of sets starting at 'set'. Checks for sets that would become - * empty as well as RDONLY flags. + * empty as well as RDONLY flags. Do not check jails. */ static int cpuset_testupdate(struct cpuset *set, cpuset_t *mask) @@ -320,14 +313,19 @@ CPU_COPY(&set->cs_mask, &newmask); CPU_AND(&newmask, mask); error = 0; - LIST_FOREACH(nset, &set->cs_children, cs_siblings) + LIST_FOREACH(nset, &set->cs_children, cs_siblings) { + if (set->cs_flags & CPU_SET_ROOT) + continue; if ((error = cpuset_testupdate(nset, &newmask)) != 0) break; + } return (error); } /* - * Applies the mask 'mask' without checking for empty sets or permissions. + * Apply the mask 'mask' to the cpuset and its children. Ignore permission + * errors, and replace any empty sets (which may occur under jails) with their + * parent's mask. */ static void cpuset_update(struct cpuset *set, cpuset_t *mask) @@ -336,6 +334,8 @@ mtx_assert(&cpuset_lock, MA_OWNED); CPU_AND(&set->cs_mask, mask); + if (CPU_EMPTY(&set->cs_mask)) + CPU_COPY(mask, &set->cs_mask); LIST_FOREACH(nset, &set->cs_children, cs_siblings) cpuset_update(nset, &set->cs_mask); @@ -456,25 +456,14 @@ struct prison *pr; sx_slock(&allprison_lock); - pr = prison_find(id); + pr = prison_find_child(curthread->td_ucred->cr_prison, id); sx_sunlock(&allprison_lock); if (pr == NULL) return (ESRCH); - if (jailed(curthread->td_ucred)) { - if (curthread->td_ucred->cr_prison == pr) { - cpuset_ref(pr->pr_cpuset); - set = pr->pr_cpuset; - } - } else { - cpuset_ref(pr->pr_cpuset); - set = pr->pr_cpuset; - } + cpuset_ref(pr->pr_cpuset); + *setp = pr->pr_cpuset; mtx_unlock(&pr->pr_mtx); - if (set) { - *setp = set; - return (0); - } - return (ESRCH); + return (0); } case CPU_WHICH_IRQ: return (0); @@ -731,21 +720,17 @@ * In case of no error, returns the set in *setp locked with a reference. */ int -cpuset_create_root(struct thread *td, struct cpuset **setp) +cpuset_create_root(struct prison *pr, struct cpuset **setp) { struct cpuset *root; struct cpuset *set; int error; - KASSERT(td != NULL, ("[%s:%d] invalid td", __func__, __LINE__)); + KASSERT(pr != NULL, ("[%s:%d] invalid pr", __func__, __LINE__)); KASSERT(setp != NULL, ("[%s:%d] invalid setp", __func__, __LINE__)); - thread_lock(td); - root = cpuset_refroot(td->td_cpuset); - thread_unlock(td); - - error = cpuset_create(setp, td->td_cpuset, &root->cs_mask); - cpuset_rel(root); + root = pr->pr_cpuset; + error = cpuset_create(setp, root, &root->cs_mask); if (error) return (error); Index: sys/kern/vfs_cache.c =================================================================== --- sys/kern/vfs_cache.c (revision 191896) +++ sys/kern/vfs_cache.c (working copy) @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -1078,6 +1079,7 @@ char *bp; int error, i, slash_prefixed; struct namecache *ncp; + struct vnode *pr_root; #ifdef KDTRACE_HOOKS struct vnode *startvp = vp; #endif @@ -1130,7 +1132,8 @@ buflen--; slash_prefixed = 1; } - while (vp != rdir && vp != rootvnode) { + pr_root = td->td_ucred->cr_prison->pr_root; + while (vp != rdir && vp != pr_root && vp != rootvnode) { if (vp->v_vflag & VV_ROOT) { if (vp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); Index: sys/kern/kern_mib.c =================================================================== --- sys/kern/kern_mib.c (revision 191896) +++ sys/kern/kern_mib.c (working copy) @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -228,7 +229,7 @@ pr = req->td->td_ucred->cr_prison; if (pr != NULL) { - if (!jail_set_hostname_allowed && req->newptr) + if (!(pr->pr_flags & PR_ALLOW_SET_HOSTNAME) && req->newptr) return (EPERM); /* * Process is in jail, so make a local copy of jail @@ -277,55 +278,43 @@ ®ression_securelevel_nonmonotonic, 0, "securelevel may be lowered"); #endif -int securelevel = -1; -static struct mtx securelevel_mtx; - -MTX_SYSINIT(securelevel_lock, &securelevel_mtx, "securelevel mutex lock", - MTX_DEF); - static int sysctl_kern_securelvl(SYSCTL_HANDLER_ARGS) { - struct prison *pr; - int error, level; + struct prison *pr, *cpr; + int descend, error, level; pr = req->td->td_ucred->cr_prison; /* - * If the process is in jail, return the maximum of the global and - * local levels; otherwise, return the global level. Perform a - * lockless read since the securelevel is an integer. + * Reading the securelevel is easy, since the current jail's level + * is known to be at least as secure as any higher levels. Perform + * a lockless read since the securelevel is an integer. */ - if (pr != NULL) - level = imax(securelevel, pr->pr_securelevel); - else - level = securelevel; + level = pr->pr_securelevel; error = sysctl_handle_int(oidp, &level, 0, req); if (error || !req->newptr) return (error); + /* Permit update only if the new securelevel exceeds the old. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (!regression_securelevel_nonmonotonic && + level < pr->pr_securelevel) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_securelevel = level; /* - * Permit update only if the new securelevel exceeds the - * global level, and local level if any. + * Set all child jails to be at least this level, but do not lower + * them (even if regression_securelevel_nonmonotonic). */ - if (pr != NULL) { - mtx_lock(&pr->pr_mtx); - if (!regression_securelevel_nonmonotonic && - (level < imax(securelevel, pr->pr_securelevel))) { - mtx_unlock(&pr->pr_mtx); - return (EPERM); - } - pr->pr_securelevel = level; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&securelevel_mtx); - if (!regression_securelevel_nonmonotonic && - (level < securelevel)) { - mtx_unlock(&securelevel_mtx); - return (EPERM); - } - securelevel = level; - mtx_unlock(&securelevel_mtx); + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) { + if (cpr->pr_securelevel < level) + cpr->pr_securelevel = level; } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); return (error); } Index: sys/kern/vfs_subr.c =================================================================== --- sys/kern/vfs_subr.c (revision 191896) +++ sys/kern/vfs_subr.c (working copy) @@ -467,22 +467,14 @@ return (EPERM); /* - * If the file system was mounted outside a jail and a jailed thread - * tries to access it, deny immediately. + * If the file system was mounted outside the jail of the calling + * thread, deny immediately. */ - if (!jailed(mp->mnt_cred) && jailed(td->td_ucred)) + if (mp->mnt_cred->cr_prison != td->td_ucred->cr_prison && + !prison_ischild(td->td_ucred->cr_prison, mp->mnt_cred->cr_prison)) return (EPERM); /* - * If the file system was mounted inside different jail that the jail of - * the calling thread, deny immediately. - */ - if (jailed(mp->mnt_cred) && jailed(td->td_ucred) && - mp->mnt_cred->cr_prison != td->td_ucred->cr_prison) { - return (EPERM); - } - - /* * If file system supports delegated administration, we don't check * for the PRIV_VFS_MOUNT_OWNER privilege - it will be better verified * by the file system itself. @@ -2900,7 +2892,7 @@ db_printf(" mnt_cred = { uid=%u ruid=%u", (u_int)mp->mnt_cred->cr_uid, (u_int)mp->mnt_cred->cr_ruid); - if (mp->mnt_cred->cr_prison != NULL) + if (jailed(mp->mnt_cred)) db_printf(", jail=%d", mp->mnt_cred->cr_prison->pr_id); db_printf(" }\n"); db_printf(" mnt_ref = %d\n", mp->mnt_ref); Index: sys/netinet/in_pcb.c =================================================================== --- sys/netinet/in_pcb.c (revision 191896) +++ sys/netinet/in_pcb.c (working copy) @@ -600,7 +600,7 @@ goto done; } - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { laddr->s_addr = ia->ia_addr.sin_addr.s_addr; goto done; } @@ -644,7 +644,7 @@ struct ifnet *ifp; /* If not jailed, use the default returned. */ - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa; laddr->s_addr = ia->ia_addr.sin_addr.s_addr; goto done; @@ -709,7 +709,7 @@ if (ia == NULL) ia = ifatoia(ifa_ifwithnet(sintosa(&sain))); - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { #if __FreeBSD_version < 800000 if (ia == NULL) ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa; @@ -1220,7 +1220,8 @@ * Found? */ if (cred == NULL || - inp->inp_cred->cr_prison == cred->cr_prison) + prison_equal_ip4(cred->cr_prison, + inp->inp_cred->cr_prison)) return (inp); } } @@ -1252,7 +1253,8 @@ LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) { wildcard = 0; if (cred != NULL && - inp->inp_cred->cr_prison != cred->cr_prison) + !prison_equal_ip4(inp->inp_cred->cr_prison, + cred->cr_prison)) continue; #ifdef INET6 /* XXX inp locking */ @@ -1333,7 +1335,7 @@ * the inp here, without any checks. * Well unless both bound with SO_REUSEPORT? */ - if (jailed(inp->inp_cred)) + if (inp->inp_cred->cr_prison->pr_flags & PR_IP4) return (inp); if (tmpinp == NULL) tmpinp = inp; @@ -1378,7 +1380,7 @@ (inp->inp_flags & INP_FAITH) == 0) continue; - injail = jailed(inp->inp_cred); + injail = inp->inp_cred->cr_prison->pr_flags & PR_IP4; if (injail) { if (prison_check_ip4(inp->inp_cred, &laddr) != 0) Index: sys/netinet/udp_usrreq.c =================================================================== --- sys/netinet/udp_usrreq.c (revision 191896) +++ sys/netinet/udp_usrreq.c (working copy) @@ -988,7 +988,7 @@ * Remember addr if jailed, to prevent * rebinding. */ - if (jailed(td->td_ucred)) + if (td->td_ucred->cr_prison->pr_flags & PR_IP4) inp->inp_laddr = laddr; inp->inp_lport = lport; if (in_pcbinshash(inp) != 0) { Index: sys/fs/procfs/procfs_status.c =================================================================== --- sys/fs/procfs/procfs_status.c (revision 191896) +++ sys/fs/procfs/procfs_status.c (working copy) @@ -151,10 +151,11 @@ sbuf_printf(sb, ",%lu", (u_long)cr->cr_groups[i]); } - if (jailed(p->p_ucred)) { - mtx_lock(&p->p_ucred->cr_prison->pr_mtx); - sbuf_printf(sb, " %s", p->p_ucred->cr_prison->pr_host); - mtx_unlock(&p->p_ucred->cr_prison->pr_mtx); + if (jailed(cr)) { + mtx_lock(&cr->cr_prison->pr_mtx); + sbuf_printf(sb, " %s", + prison_name(td->td_ucred->cr_prison, cr->cr_prison)); + mtx_unlock(&cr->cr_prison->pr_mtx); } else { sbuf_printf(sb, " -"); } Index: sys/nfsserver/nfs_srvsock.c =================================================================== --- sys/nfsserver/nfs_srvsock.c (revision 191896) +++ sys/nfsserver/nfs_srvsock.c (working copy) @@ -43,6 +43,7 @@ #include #include +#include #include #include #include @@ -699,6 +700,8 @@ nd = malloc(sizeof (struct nfsrv_descript), M_NFSRVDESC, M_WAITOK); nd->nd_cr = crget(); + nd->nd_cr->cr_prison = &prison0; + prison_hold(&prison0); NFSD_LOCK(); nd->nd_md = nd->nd_mrep = m; nd->nd_nam2 = nam; Index: sys/compat/freebsd32/freebsd32_misc.c =================================================================== --- sys/compat/freebsd32/freebsd32_misc.c (revision 191896) +++ sys/compat/freebsd32/freebsd32_misc.c (working copy) @@ -112,8 +112,6 @@ CTASSERT(sizeof(struct stat32) == 96); CTASSERT(sizeof(struct sigaction32) == 24); -extern int jail_max_af_ips; - static int freebsd32_kevent_copyout(void *arg, struct kevent *kevp, int count); static int freebsd32_kevent_copyin(void *arg, struct kevent *kevp, int count); @@ -2126,7 +2124,7 @@ return (error); tmplen = MAXPATHLEN + MAXHOSTNAMELEN + MAXHOSTNAMELEN; #ifdef INET - if (j32.ip4s > jail_max_af_ips) + if (j32.ip4s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j32.ip4s * sizeof(struct in_addr); #else @@ -2134,7 +2132,7 @@ return (EINVAL); #endif #ifdef INET6 - if (j32.ip6s > jail_max_af_ips) + if (j32.ip6s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j32.ip6s * sizeof(struct in6_addr); #else Index: sys/compat/linux/linux_mib.c =================================================================== --- sys/compat/linux/linux_mib.c (revision 191896) +++ sys/compat/linux/linux_mib.c (working copy) @@ -57,16 +57,18 @@ int pr_use_linux26; /* flag to determine whether to use 2.6 emulation */ }; +static struct linux_prison lprison0 = { + .pr_osname = "Linux", + .pr_osrelease = "2.6.16", + .pr_oss_version = 0x030600, + .pr_use_linux26 = 1, +}; + static unsigned linux_osd_jail_slot; SYSCTL_NODE(_compat, OID_AUTO, linux, CTLFLAG_RW, 0, "Linux mode"); -static struct mtx osname_lock; -MTX_SYSINIT(linux_osname, &osname_lock, "linux osname", MTX_DEF); - -static char linux_osname[LINUX_MAX_UTSNAME] = "Linux"; - static int linux_sysctl_osname(SYSCTL_HANDLER_ARGS) { @@ -86,9 +88,6 @@ 0, 0, linux_sysctl_osname, "A", "Linux kernel OS name"); -static char linux_osrelease[LINUX_MAX_UTSNAME] = "2.6.16"; -static int linux_use_linux26 = 1; - static int linux_sysctl_osrelease(SYSCTL_HANDLER_ARGS) { @@ -108,8 +107,6 @@ 0, 0, linux_sysctl_osrelease, "A", "Linux kernel OS release"); -static int linux_oss_version = 0x030600; - static int linux_sysctl_oss_version(SYSCTL_HANDLER_ARGS) { @@ -130,69 +127,74 @@ "Linux OSS version"); /* - * Returns holding the prison mutex if return non-NULL. + * Find a prison with Linux info. + * Return the Linux info and the (locked) prison. */ static struct linux_prison * -linux_get_prison(struct thread *td, struct prison **prp) +linux_find_prison(struct prison *spr, struct prison **prp) { struct prison *pr; struct linux_prison *lpr; - KASSERT(td == curthread, ("linux_get_prison() called on !curthread")); - *prp = pr = td->td_ucred->cr_prison; - if (pr == NULL || !linux_osd_jail_slot) - return (NULL); - mtx_lock(&pr->pr_mtx); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr == NULL) + if (!linux_osd_jail_slot) + /* In case osd_register failed. */ + spr = &prison0; + for (pr = spr;; pr = pr->pr_parent) { + mtx_lock(&pr->pr_mtx); + lpr = (pr == &prison0) + ? &lprison0 + : osd_jail_get(pr, linux_osd_jail_slot); + if (lpr != NULL) + break; mtx_unlock(&pr->pr_mtx); + } + *prp = pr; return (lpr); } /* - * Ensure a prison has its own Linux info. The prison should be locked on - * entrance and will be locked on exit (though it may get unlocked in the - * interrim). + * Ensure a prison has its own Linux info. If lprp is non-null, point it to + * the Linux info and lock the prison. */ static int linux_alloc_prison(struct prison *pr, struct linux_prison **lprp) { + struct prison *ppr; struct linux_prison *lpr, *nlpr; int error; /* If this prison already has Linux info, return that. */ error = 0; - mtx_assert(&pr->pr_mtx, MA_OWNED); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr != NULL) + lpr = linux_find_prison(pr, &ppr); + if (ppr == pr) goto done; /* * Allocate a new info record. Then check again, in case something * changed during the allocation. */ - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&ppr->pr_mtx); nlpr = malloc(sizeof(struct linux_prison), M_PRISON, M_WAITOK); - mtx_lock(&pr->pr_mtx); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr != NULL) { + lpr = linux_find_prison(pr, &ppr); + if (ppr == pr) { free(nlpr, M_PRISON); goto done; } + /* Inherit the initial values from the ancestor. */ + mtx_lock(&pr->pr_mtx); error = osd_jail_set(pr, linux_osd_jail_slot, nlpr); - if (error) - free(nlpr, M_PRISON); - else { + if (error == 0) { + bcopy(lpr, nlpr, sizeof(*lpr)); lpr = nlpr; - mtx_lock(&osname_lock); - strncpy(lpr->pr_osname, linux_osname, LINUX_MAX_UTSNAME); - strncpy(lpr->pr_osrelease, linux_osrelease, LINUX_MAX_UTSNAME); - lpr->pr_oss_version = linux_oss_version; - lpr->pr_use_linux26 = linux_use_linux26; - mtx_unlock(&osname_lock); + } else { + free(nlpr, M_PRISON); + lpr = NULL; } -done: + mtx_unlock(&ppr->pr_mtx); + done: if (lprp != NULL) *lprp = lpr; + else + mtx_unlock(&pr->pr_mtx); return (error); } @@ -202,7 +204,6 @@ static int linux_prison_create(void *obj, void *data) { - int error; struct prison *pr = obj; struct vfsoptlist *opts = data; @@ -212,10 +213,7 @@ * Inherit a prison's initial values from its parent * (different from NULL which also inherits changes). */ - mtx_lock(&pr->pr_mtx); - error = linux_alloc_prison(pr, NULL); - mtx_unlock(&pr->pr_mtx); - return (error); + return linux_alloc_prison(pr, NULL); } static int @@ -223,8 +221,7 @@ { struct vfsoptlist *opts = data; char *osname, *osrelease; - size_t len; - int error, oss_version; + int error, len, oss_version; /* Check that the parameters are correct. */ (void)vfs_flagopt(opts, "linux", NULL, 0); @@ -263,8 +260,7 @@ struct prison *pr = obj; struct vfsoptlist *opts = data; char *osname, *osrelease; - size_t len; - int error, gotversion, nolinux, oss_version, yeslinux; + int error, gotversion, len, nolinux, oss_version, yeslinux; /* Set the parameters, which should be correct. */ yeslinux = vfs_flagopt(opts, "linux", NULL, 0); @@ -281,7 +277,7 @@ yeslinux = 1; error = vfs_copyopt(opts, "linux.oss_version", &oss_version, sizeof(oss_version)); - gotversion = error == 0; + gotversion = (error == 0); yeslinux |= gotversion; if (nolinux) { /* "nolinux": inherit the parent's Linux info. */ @@ -293,7 +289,6 @@ * "linux" or "linux.*": * the prison gets its own Linux info. */ - mtx_lock(&pr->pr_mtx); error = linux_alloc_prison(pr, &lpr); if (error) { mtx_unlock(&pr->pr_mtx); @@ -328,14 +323,18 @@ linux_prison_get(void *obj, void *data) { struct linux_prison *lpr; + struct prison *ppr; struct prison *pr = obj; struct vfsoptlist *opts = data; int error, i; - mtx_lock(&pr->pr_mtx); - /* Tell whether this prison has its own Linux info. */ - lpr = osd_jail_get(pr, linux_osd_jail_slot); - i = lpr != NULL; + /* + * Report on the prison that actually has the Linux info. It's + * kind of bogus to give an ancestor's info, but leave it to the + * caller to check the flag set below. + */ + lpr = linux_find_prison(pr, &ppr); + i = (ppr == pr); error = vfs_setopt(opts, "linux", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done; @@ -343,39 +342,20 @@ error = vfs_setopt(opts, "nolinux", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done; - /* - * It's kind of bogus to give the root info, but leave it to the caller - * to check the above flag. - */ - if (lpr != NULL) { - error = vfs_setopts(opts, "linux.osname", lpr->pr_osname); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopts(opts, "linux.osrelease", lpr->pr_osrelease); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopt(opts, "linux.oss_version", - &lpr->pr_oss_version, sizeof(lpr->pr_oss_version)); - if (error != 0 && error != ENOENT) - goto done; - } else { - mtx_lock(&osname_lock); - error = vfs_setopts(opts, "linux.osname", linux_osname); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopts(opts, "linux.osrelease", linux_osrelease); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopt(opts, "linux.oss_version", - &linux_oss_version, sizeof(linux_oss_version)); - if (error != 0 && error != ENOENT) - goto done; - mtx_unlock(&osname_lock); - } + error = vfs_setopts(opts, "linux.osname", lpr->pr_osname); + if (error != 0 && error != ENOENT) + goto done; + error = vfs_setopts(opts, "linux.osrelease", lpr->pr_osrelease); + if (error != 0 && error != ENOENT) + goto done; + error = vfs_setopt(opts, "linux.oss_version", &lpr->pr_oss_version, + sizeof(lpr->pr_oss_version)); + if (error != 0 && error != ENOENT) + goto done; error = 0; done: - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&ppr->pr_mtx); return (error); } @@ -402,11 +382,8 @@ if (linux_osd_jail_slot > 0) { /* Copy the system linux info to any current prisons. */ sx_xlock(&allprison_lock); - TAILQ_FOREACH(pr, &allprison, pr_list) { - mtx_lock(&pr->pr_mtx); + TAILQ_FOREACH(pr, &allprison, pr_list) (void)linux_alloc_prison(pr, NULL); - mtx_unlock(&pr->pr_mtx); - } sx_xunlock(&allprison_lock); } } @@ -425,15 +402,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - bcopy(lpr->pr_osname, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - bcopy(linux_osname, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&osname_lock); - } + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + bcopy(lpr->pr_osname, dst, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); } int @@ -442,16 +413,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - strlcpy(lpr->pr_osname, osname, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - strcpy(linux_osname, osname); - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + strlcpy(lpr->pr_osname, osname, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); return (0); } @@ -461,15 +425,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - bcopy(lpr->pr_osrelease, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - bcopy(linux_osrelease, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&osname_lock); - } + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + bcopy(lpr->pr_osrelease, dst, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); } int @@ -479,12 +437,9 @@ struct linux_prison *lpr; int use26; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - use26 = lpr->pr_use_linux26; - mtx_unlock(&pr->pr_mtx); - } else - use26 = linux_use_linux26; + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + use26 = lpr->pr_use_linux26; + mtx_unlock(&pr->pr_mtx); return (use26); } @@ -494,20 +449,10 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - strlcpy(lpr->pr_osrelease, osrelease, LINUX_MAX_UTSNAME); - lpr->pr_use_linux26 = - strlen(osrelease) >= 3 && osrelease[2] == '6'; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - strcpy(linux_osrelease, osrelease); - linux_use_linux26 = - strlen(osrelease) >= 3 && osrelease[2] == '6'; - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + strlcpy(lpr->pr_osrelease, osrelease, LINUX_MAX_UTSNAME); + lpr->pr_use_linux26 = strlen(osrelease) >= 3 && osrelease[2] == '6'; + mtx_unlock(&pr->pr_mtx); return (0); } @@ -518,12 +463,9 @@ struct linux_prison *lpr; int version; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - version = lpr->pr_oss_version; - mtx_unlock(&pr->pr_mtx); - } else - version = linux_oss_version; + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + version = lpr->pr_oss_version; + mtx_unlock(&pr->pr_mtx); return (version); } @@ -533,16 +475,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - lpr->pr_oss_version = oss_version; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - linux_oss_version = oss_version; - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + lpr->pr_oss_version = oss_version; + mtx_unlock(&pr->pr_mtx); return (0); } Index: sys/net/rtsock.c =================================================================== --- sys/net/rtsock.c (revision 191896) +++ sys/net/rtsock.c (working copy) @@ -373,6 +373,8 @@ /* * As a last resort return the 'default' jail address. */ + ia = ((struct sockaddr_in *)rt->rt_ifa->ifa_addr)-> + sin_addr; if (prison_get_ip4(cred, &ia) != 0) return (ESRCH); } @@ -414,6 +416,8 @@ /* * As a last resort return the 'default' jail address. */ + ia6 = ((struct sockaddr_in6 *)rt->rt_ifa->ifa_addr)-> + sin6_addr; if (prison_get_ip6(cred, &ia6) != 0) return (ESRCH); } Index: sys/netinet6/in6_pcb.c =================================================================== --- sys/netinet6/in6_pcb.c (revision 191896) +++ sys/netinet6/in6_pcb.c (working copy) @@ -666,7 +666,8 @@ inp->inp_lport == lport) { /* Found. */ if (cred == NULL || - inp->inp_cred->cr_prison == cred->cr_prison) + prison_equal_ip6(cred->cr_prison, + inp->inp_cred->cr_prison)) return (inp); } } @@ -698,7 +699,8 @@ LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) { wildcard = 0; if (cred != NULL && - inp->inp_cred->cr_prison != cred->cr_prison) + !prison_equal_ip6(cred->cr_prison, + inp->inp_cred->cr_prison)) continue; /* XXX inp locking */ if ((inp->inp_vflag & INP_IPV6) == 0) @@ -838,7 +840,7 @@ * the inp here, without any checks. * Well unless both bound with SO_REUSEPORT? */ - if (jailed(inp->inp_cred)) + if (inp->inp_cred->cr_prison->pr_flags & PR_IP6) return (inp); if (tmpinp == NULL) tmpinp = inp; @@ -878,7 +880,7 @@ if (faith && (inp->inp_flags & INP_FAITH) == 0) continue; - injail = jailed(inp->inp_cred); + injail = inp->inp_cred->cr_prison->pr_flags & PR_IP6; if (injail) { if (prison_check_ip6(inp->inp_cred, laddr) != 0) Index: sys/contrib/ipfilter/netinet/ip_nat.c =================================================================== --- sys/contrib/ipfilter/netinet/ip_nat.c (revision 191896) +++ sys/contrib/ipfilter/netinet/ip_nat.c (working copy) @@ -662,7 +662,11 @@ return EPERM; } # else +# if defined(__FreeBSD_version) && (__FreeBSD_version >= 500034) + if (securelevel_ge(curthread->td_ucred, 3) && (mode & FWRITE)) { +# else if ((securelevel >= 3) && (mode & FWRITE)) { +# endif return EPERM; } # endif Index: sys/contrib/ipfilter/netinet/ip_fil_freebsd.c =================================================================== --- sys/contrib/ipfilter/netinet/ip_fil_freebsd.c (revision 191896) +++ sys/contrib/ipfilter/netinet/ip_fil_freebsd.c (working copy) @@ -318,8 +318,10 @@ # if (__FreeBSD_version >= 500024) struct thread *p; # if (__FreeBSD_version >= 500043) +# define p_cred td_ucred # define p_uid td_ucred->cr_ruid # else +# define p_cred t_proc->p_cred # define p_uid t_proc->p_cred->p_ruid # endif # else @@ -342,7 +344,11 @@ SPL_INT(s); #if (BSD >= 199306) && defined(_KERNEL) +# if (__FreeBSD_version >= 500034) + if (securelevel_ge(p->p_cred, 3) && (mode & FWRITE)) +# else if ((securelevel >= 3) && (mode & FWRITE)) +# endif return EPERM; #endif Index: sys/security/mac_bsdextended/mac_bsdextended.c =================================================================== --- sys/security/mac_bsdextended/mac_bsdextended.c (revision 191896) +++ sys/security/mac_bsdextended/mac_bsdextended.c (working copy) @@ -271,8 +271,8 @@ } if (rule->mbr_subject.mbs_flags & MBS_PRISON_DEFINED) { - match = (cred->cr_prison != NULL && - cred->cr_prison->pr_id == rule->mbr_subject.mbs_prison); + match = + (cred->cr_prison->pr_id == rule->mbr_subject.mbs_prison); if (rule->mbr_subject.mbs_neg & MBS_PRISON_DEFINED) match = !match; if (!match) Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h (revision 191896) +++ sys/sys/cpuset.h (working copy) @@ -169,6 +169,7 @@ #define CPU_SET_RDONLY 0x0002 /* No modification allowed. */ extern cpuset_t *cpuset_root; +struct prison; struct proc; struct thread; @@ -176,7 +177,7 @@ struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); -int cpuset_create_root(struct thread *, struct cpuset **); +int cpuset_create_root(struct prison *, struct cpuset **); int cpuset_setproc_update_set(struct proc *, struct cpuset *); #else Index: sys/sys/jail.h =================================================================== --- sys/sys/jail.h (revision 191896) +++ sys/sys/jail.h (working copy) @@ -122,8 +122,8 @@ #include #include -#include -#include +#include +#include #include #define JAIL_MAX 999999 @@ -137,8 +137,6 @@ #include -struct cpuset; - /* * This structure describes a prison. It is pointed to by all struct * ucreds's of the inmates. pr_ref keeps track of them and is used to @@ -162,7 +160,7 @@ struct vnode *pr_root; /* (c) vnode to rdir */ char pr_host[MAXHOSTNAMELEN]; /* (p) jail hostname */ char pr_name[MAXHOSTNAMELEN]; /* (p) admin jail name */ - void *pr_spare; /* was pr_linux */ + struct prison *pr_parent; /* (c) containing jail */ int pr_securelevel; /* (p) securelevel */ struct task pr_task; /* (d) destroy task */ struct mtx pr_mtx; @@ -171,6 +169,14 @@ struct in_addr *pr_ip4; /* (p) v4 IPs of jail */ int pr_ip6s; /* (p) number of v6 IPs */ struct in6_addr *pr_ip6; /* (p) v6 IPs of jail */ + LIST_HEAD(, prison) pr_children; /* (a) list of child jails */ + LIST_ENTRY(prison) pr_sibling; /* (a) next in parent's list */ + int pr_prisoncount; /* (a) number of child jails */ + int pr_enforce_statfs; /* (p) statfs permission */ + int pr_max_af_ips; /* (p) IP address limit */ + unsigned pr_def_perms; /* (p) child PR_PERM_* flags */ + int pr_def_enforce_statfs; /* (p) child statfs */ + int pr_def_max_af_ips; /* (p) child IP limit */ }; #endif /* _KERNEL || _WANT_PRISON */ @@ -179,7 +185,24 @@ * Flag bits set via options or internally */ #define PR_PERSIST 0x00000001 /* Can exist without processes */ +#define PR_IP4_USER 0x00000004 /* Virtualize IPv4 addresses */ +#define PR_IP6_USER 0x00000008 /* Virtualize IPv6 addresses */ + +#define PR_ALLOW_SET_HOSTNAME 0x00010000 +#define PR_ALLOW_SYSVIPC 0x00020000 +#define PR_ALLOW_RAW_SOCKETS 0x00040000 +#define PR_ALLOW_CHFLAGS 0x00080000 +#define PR_ALLOW_MOUNT 0x00100000 +#define PR_ALLOW_QUOTAS 0x00200000 +#define PR_ALLOW_JAILS 0x00400000 +#define PR_RESTRICT_SOCKET_UNIXIPROUTE 0x00800000 + +#define PR_ALLOW_ALL 0x007f0000 +#define PR_RESTRICT_ALL 0x00800000 + #define PR_REMOVE 0x01000000 /* In process of being removed */ +#define PR_IP4 0x02000000 /* Virtualize IPv4 (maybe inherited) */ +#define PR_IP6 0x04000000 /* Virtualize IPv6 (maybe inherited) */ /* * OSD methods @@ -192,17 +215,67 @@ #define PR_MAXMETHOD 5 /* - * Sysctl-set variables that determine global jail policy - * - * XXX MIB entries will need to be protected by a mutex. + * Lock/unlock a prison. + * XXX These exist not so much for general convenience, but to be useable in + * the FOREACH_PRISON_DESCENDANT_LOCKED macro which can't handle them in + * non-function form as currently defined. */ -extern int jail_set_hostname_allowed; -extern int jail_socket_unixiproute_only; -extern int jail_sysvipc_allowed; -extern int jail_getfsstat_jailrootonly; -extern int jail_allow_raw_sockets; -extern int jail_chflags_allowed; +static __inline void +prison_lock(struct prison *pr) +{ + mtx_lock(&pr->pr_mtx); +} +static __inline void +prison_unlock(struct prison *pr) +{ + mtx_unlock(&pr->pr_mtx); +} + +/* Traverse a prison's immediate children */ +#define FOREACH_PRISON_CHILD(ppr, cpr) \ + LIST_FOREACH(cpr, &(ppr)->pr_children, pr_sibling) + +/* + * Preorder traversal of all of a prison's descendants. + * This ugly loop allows the macro to be followed by a single block + * as expected in a looping primitive. + */ +#define FOREACH_PRISON_DESCENDANT(ppr, cpr, descend) \ + for ((cpr) = (ppr), (descend) = 1; \ + ((cpr) = ((descend) && !LIST_EMPTY(&(cpr)->pr_children)) \ + ? LIST_FIRST(&(cpr)->pr_children) \ + : (cpr) == (ppr) \ + ? NULL \ + : ((descend) = LIST_NEXT(cpr, pr_sibling) != NULL) \ + ? LIST_NEXT(cpr, pr_sibling) \ + : (cpr)->pr_parent);) \ + if (!(descend)) \ + ; \ + else + +/* + * As above, but lock descendants on the way down and unlock on the way up. + */ +#define FOREACH_PRISON_DESCENDANT_LOCKED(ppr, cpr, descend) \ + for ((cpr) = (ppr), (descend) = 1; \ + ((cpr) = ((descend) && !LIST_EMPTY(&(cpr)->pr_children)) \ + ? LIST_FIRST(&(cpr)->pr_children) \ + : (cpr) == (ppr) \ + ? NULL \ + : (prison_unlock(cpr), \ + (descend) = LIST_NEXT(cpr, pr_sibling) != NULL) \ + ? LIST_NEXT(cpr, pr_sibling) \ + : (cpr)->pr_parent);) \ + if ((descend) ? (prison_lock(cpr), 0) : 1) \ + ; \ + else + +/* + * Attributes of the physical system, and the root of the jail tree. + */ +extern struct prison prison0; + TAILQ_HEAD(prisonlist, prison); extern struct prisonlist allprison; extern struct sx allprison_lock; @@ -240,18 +313,22 @@ void prison_enforce_statfs(struct ucred *cred, struct mount *mp, struct statfs *sp); struct prison *prison_find(int prid); -struct prison *prison_find_name(const char *name); +struct prison *prison_find_child(struct prison *mypr, int prid); +struct prison *prison_find_name(struct prison *mypr, const char *name); void prison_free(struct prison *pr); void prison_free_locked(struct prison *pr); void prison_hold(struct prison *pr); void prison_hold_locked(struct prison *pr); void prison_proc_hold(struct prison *); void prison_proc_free(struct prison *); +int prison_ischild(struct prison *pr1, struct prison *pr2); +int prison_equal_ip4(struct prison *, struct prison *); int prison_get_ip4(struct ucred *cred, struct in_addr *ia); int prison_local_ip4(struct ucred *cred, struct in_addr *ia); int prison_remote_ip4(struct ucred *cred, struct in_addr *ia); int prison_check_ip4(struct ucred *cred, struct in_addr *ia); #ifdef INET6 +int prison_equal_ip6(struct prison *, struct prison *); int prison_get_ip6(struct ucred *, struct in6_addr *); int prison_local_ip6(struct ucred *, struct in6_addr *, int); int prison_remote_ip6(struct ucred *, struct in6_addr *); @@ -259,6 +336,7 @@ #endif int prison_check_af(struct ucred *cred, int af); int prison_if(struct ucred *cred, struct sockaddr *sa); +char *prison_name(struct prison *pr1, struct prison *pr2); int prison_priv_check(struct ucred *cred, int priv); int sysctl_jail_param(struct sysctl_oid *, void *, int , struct sysctl_req *); Index: sys/sys/systm.h =================================================================== --- sys/sys/systm.h (revision 191896) +++ sys/sys/systm.h (working copy) @@ -45,8 +45,6 @@ #include #include /* for people using printf mainly */ -extern int securelevel; /* system security level (see init(8)) */ - extern int cold; /* nonzero if we are doing a cold boot */ extern int rebooting; /* boot() has been called. */ extern const char *panicstr; /* panic message */ From julian at elischer.org Sat May 9 06:46:45 2009 From: julian at elischer.org (Julian Elischer) Date: Sat May 9 06:46:51 2009 Subject: Hierarchical jails In-Reply-To: <4A051DE3.30705@FreeBSD.org> References: <4A051DE3.30705@FreeBSD.org> Message-ID: <4A0526D7.7090000@elischer.org> Jamie Gritton wrote: > Here's the first round of hierarchical jails under the new framework. > > Instead of creds having either a prison or a NULL pointer, they all have > a prison pointer with the default being the global "prison0" that > contains information about the real environment. Jailed root may (if > granted permission) create prisons that would be under its place in the > hierarchy, but may not alter (or even see) prisons at its level or > above. agree > > The JID space is flat, i.e. every prison in the system has a unique ID. > The prison name space is hierarchical, with jails having dot-separated > component names. this matches vimage, and I agree. > > prison0 contains three fields that were system globals: pr_root, > pr_host, and pr_securelevel. I've kept the globals rootvnode and > hostname, and take care that when one is changed the other changes too > (not yet true for hostname - read on). But I've actually removed the > global securelevel, instead forcing people to use securelevel_gt() and > securelevel_ge() (or in very rare cases to check prison0.pr_securelevel > directly). I chose to do that because while using the global rootvnode > and hostname may be incorrect, using the wrong securelevel is, well, > insecure. Actually it would be insecure to use the wrong rootvnode too, > but I'm not convinced removing that global is worth the headache. fair enough at this time. > > Other globals are subsumed into prison0, but they were only ever part of > the jail system anyway: the various jail-related permission bits and > such administrative things as prisoncount. > > The prison hierarchy keeps track of restrictions placed on prisons, and > will reflect them downward so a child jail is always at least as > restricted as its ancestors. It doesn't go the other way though: if a > prison's restrictions are loosened, the children stay as they are. yes. I agree. > > This patch doesn't have anything for userland, and hierarchical jails > won't work without that patch (because jails don't have permission to > create sub-jails by default, and jail(2) can't grant that permission). > A userland patch will follow soon, very similar to the version I posted > here recently. > > - Jamie patch removed by mailng list... (but I saw it in the privately received version...) > > > ------------------------------------------------------------------------ > > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" From jamie at FreeBSD.org Sat May 9 06:53:31 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Sat May 9 06:53:40 2009 Subject: Hierarchical jails (user side) Message-ID: <4A052867.2090806@FreeBSD.org> These are the extended versions of jail and jls to handle the arbitrary name-value pairs, as well as small changes to jexec and killall. There's actually nothing hierarchical about these programs, they just allow setting the parameters necessary to allow hierarchical prisons. They also include a bit of text in jail(8) about hierarchical jails and what's necessary to set them up. This might look familiar - it's excactly the same patches I posted a few days back, with the except of the jail(8) man page. I'd appreciate if anyone who's interested in hierarchical jails, or in the new jail subsystem, or in vimage (which will soon merge with this) could try out these patches and give a bit of feedback. - Jamie -------------- next part -------------- Index: usr.bin/killall/killall.1 =================================================================== --- usr.bin/killall/killall.1 (revision 191896) +++ usr.bin/killall/killall.1 (working copy) @@ -24,7 +24,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 9, 2007 +.Dd April 30, 2009 .Os .Dt KILLALL 1 .Sh NAME @@ -34,7 +34,7 @@ .Nm .Op Fl delmsvz .Op Fl help -.Op Fl j Ar jid +.Op Fl j Ar jail .Op Fl u Ar user .Op Fl t Ar tty .Op Fl c Ar procname @@ -91,9 +91,9 @@ (with or without a leading .Dq Li SIG ) , or numerically. -.It Fl j Ar jid -Kill processes in the jail specified by -.Ar jid . +.It Fl j Ar jail +Kill processes in the specified +.Ar jail . .It Fl u Ar user Limit potentially matching processes to those belonging to the specified Index: usr.bin/killall/killall.c =================================================================== --- usr.bin/killall/killall.c (revision 191896) +++ usr.bin/killall/killall.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -51,7 +52,7 @@ usage(void) { - fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jid]\n"); + fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jail]\n"); fprintf(stderr, " [-u user] [-t tty] [-c cmd] [-SIGNAL] [cmd]...\n"); fprintf(stderr, "At least one option or argument to specify processes must be given.\n"); @@ -100,6 +101,7 @@ int main(int ac, char **av) { + struct iovec jparams[2]; struct kinfo_proc *procs = NULL, *newprocs; struct stat sb; struct passwd *pw; @@ -159,12 +161,21 @@ } jflag++; if (*av == NULL) - errx(1, "must specify jid"); - jid = strtol(*av, &ep, 10); - if (!*av || *ep) - errx(1, "illegal jid: %s", *av); + errx(1, "must specify jail"); + jid = strtoul(*av, &ep, 10); + if (!**av || *ep) { + *(const void **)&jparams[0].iov_base = + "name"; + jparams[0].iov_len = sizeof("name"); + jparams[1].iov_base = *av; + jparams[1].iov_len = strlen(*av) + 1; + jid = jail_get(jparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", + *av); + } if (jail_attach(jid) == -1) - err(1, "jail_attach(): %d", jid); + err(1, "jail_attach(%d)", jid); break; case 'u': ++*av; Index: usr.sbin/jls/jls.c =================================================================== --- usr.sbin/jls/jls.c (revision 191896) +++ usr.sbin/jls/jls.c (working copy) @@ -1,6 +1,7 @@ /*- * Copyright (c) 2003 Mike Barcroft * Copyright (c) 2008 Bjoern A. Zeeb + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -23,18 +24,20 @@ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. - * - * $FreeBSD$ */ +#include +__FBSDID("$FreeBSD$"); + #include -#include #include +#include #include +#include -#include +#include #include -#include + #include #include #include @@ -43,215 +46,672 @@ #include #include -#define FLAG_A 0x00001 -#define FLAG_V 0x00002 +#define SJPARAM "security.jail.param" +#define ARRAY_SLOP 5 -#ifdef SUPPORT_OLD_XPRISON -static -char *print_xprison_v1(void *p, char *end, unsigned flags) +#define CTLTYPE_BOOL (CTLTYPE + 1) +#define CTLTYPE_NOBOOL (CTLTYPE + 2) +#define CTLTYPE_IPADDR (CTLTYPE + 3) +#define CTLTYPE_IP6ADDR (CTLTYPE + 4) + +#define PARAM_KEY 0x1 +#define PARAM_USER 0x2 +#define PARAM_ARRAY 0x4 +#define PARAM_OPT 0x8 + +#define PRINT_DEFAULT 0x01 +#define PRINT_VDEFAULT 0x02 +#define PRINT_HEADER 0x04 +#define PRINT_NAMEVAL 0x08 +#define PRINT_QUOTED 0x10 + +struct param { + char *name; + void *value; + size_t size; + int type; + unsigned flags; +}; + +struct iovec2 { + struct iovec name; + struct iovec value; +}; + +static struct param *params; +static int nparams; +static char errmsg[256]; + +static void add_param(const char *name, void *value, unsigned flags); +static int get_param(const char *name, struct param *param); +static int sort_param(const void *a, const void *b); +static char *noname(const char *name); +static char *nononame(const char *name); +static int print_jail(int pflags, int jflags); +static void quoted_print(char *str, int len); + +int +main(int argc, char **argv) { - struct xprison_v1 *xp; - struct in_addr in; + char *ep, *jname; + int c, i, jflags, jid, lastjid, pflags; - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); + jname = NULL; + pflags = jflags = jid = 0; + while ((c = getopt(argc, argv, "dj:hnqv")) >= 0) + switch (c) { + case 'd': + jflags |= JAIL_DYING; + break; + case 'j': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) + jname = optarg; + break; + case 'h': + pflags |= PRINT_HEADER; + break; + case 'n': + pflags |= PRINT_NAMEVAL; + break; + case 'q': + pflags |= PRINT_QUOTED; + break; + case 'v': + pflags |= PRINT_VDEFAULT; + break; + default: + errx(1, "usage: jls [-dhnqv] [-j jail] [param ...]"); + } - xp = (struct xprison_v1 *)p; - if (flags & FLAG_V) { - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); - /* We are not printing an empty line here for state and name. */ - /* We are not printing an empty line here for cpusetid. */ - /* IPv4 address. */ - in.s_addr = htonl(xp->pr_ip); - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + /* Add the parameters to print. */ + if (optind == argc) { + if (pflags & PRINT_VDEFAULT) { + add_param("jid", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + add_param("name", NULL, PARAM_USER); + add_param("dying", NULL, PARAM_USER); + add_param("cpuset", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("ip6.addr", NULL, PARAM_USER | PARAM_OPT); + } else { + pflags |= PRINT_DEFAULT; + add_param("jid", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + } + } else + while (optind < argc) + add_param(argv[optind++], NULL, PARAM_USER); + + /* Add the index key and errmsg parameters. */ + if (jid != 0) + add_param("jid", &jid, PARAM_KEY); + else if (jname != NULL) + add_param("name", jname, PARAM_KEY); + else + add_param("lastjid", &lastjid, PARAM_KEY); + add_param("errmsg", errmsg, PARAM_KEY); + + /* Print a header line if requested. */ + if (pflags & PRINT_VDEFAULT) + printf(" JID Hostname Path\n" + " Name State\n" + " CPUSetID\n" + " IP Address(es)\n"); + else if (pflags & PRINT_DEFAULT) + printf(" JID IP Address " + "Hostname Path\n"); + else if (pflags & PRINT_HEADER) { + for (i = 0; i < nparams; i++) + if (params[i].flags & PARAM_USER) { + if (i > 0) + putchar(' '); + fputs(params[i].name, stdout); + } + putchar('\n'); + } + + /* Fetch the jail(s) and print the paramters. */ + if (jid != 0 || jname != NULL) { + if (print_jail(pflags, jflags) < 0) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } else { - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, inet_ntoa(in), xp->pr_host, xp->pr_path); + for (lastjid = 0; + (lastjid = print_jail(pflags, jflags)) >= 0; ) + ; + if (errno != 0 && errno != ENOENT) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } - return ((char *)(xp + 1)); + return (0); } -#endif -static -char *print_xprison_v3(void *p, char *end, unsigned flags) +static void +add_param(const char *name, void *value, unsigned flags) { - struct xprison *xp; - struct in_addr *iap, in; - struct in6_addr *ia6p; - char buf[INET6_ADDRSTRLEN]; - const char *state; - char *q; - uint32_t i; + struct param *param; + char *nname; + size_t mlen1, mlen2, buflen; + int mib1[CTL_MAXNAME], mib2[CTL_MAXNAME - 2]; + int i, tnparams; + char buf[MAXPATHLEN]; - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - xp = (struct xprison *)p; + static int paramlistsize; - if (xp->pr_state < 0 || xp->pr_state >= (int) - ((sizeof(prison_states) / sizeof(struct prison_state)))) - state = "(bogus)"; - else - state = prison_states[xp->pr_state].state_name; + /* The pseudo-parameter "all" scans the list of available parameters. */ + if (!strcmp(name, "all")) { + tnparams = nparams; + mib1[0] = 0; + mib1[1] = 2; + mlen1 = CTL_MAXNAME - 2; + if (sysctlnametomib(SJPARAM, mib1 + 2, &mlen1) < 0) + err(1, "sysctlnametomib(" SJPARAM ")"); + for (;;) { + /* Get the next parameter. */ + mlen2 = sizeof(mib2); + if (sysctl(mib1, mlen1 + 2, mib2, &mlen2, NULL, 0) < 0) + err(1, "sysctl(0.2)"); + if (mib2[0] != mib1[2] || mib2[1] != mib1[3] || + mib2[2] != mib1[4]) + break; + /* Convert it to an ascii name. */ + memcpy(mib1 + 2, mib2, mlen2); + mlen1 = mlen2 / sizeof(int); + mib1[1] = 1; + buflen = sizeof(buf); + if (sysctl(mib1, mlen1 + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.1)"); + add_param(buf + sizeof(SJPARAM), NULL, flags); + /* + * Convert nobool parameters to bool if their + * counterpart is a node, ortherwise discard them. + */ + param = ¶ms[nparams - 1]; + if (param->type == CTLTYPE_NOBOOL) { + nname = nononame(param->name); + if (get_param(nname, param) >= 0 && + param->type != CTLTYPE_NODE) { + free(nname); + nparams--; + } else { + free(param->name); + param->name = nname; + param->type = CTLTYPE_BOOL; + param->size = sizeof(int); + param->value = NULL; + } + } + mib1[1] = 2; + } - /* See if we should print non-ACTIVE jails. No? */ - if ((flags & FLAG_A) == 0 && strcmp(state, "ALIVE")) { - q = (char *)(xp + 1); - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - return (q); + qsort(params + tnparams, (size_t)(nparams - tnparams), + sizeof(struct param), sort_param); + return; } - if (flags & FLAG_V) - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); + /* Check for repeat parameters. */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name)) { + params[i].value = value; + params[i].flags |= flags; + return; + } - /* Jail state and name. */ - if (flags & FLAG_V) - printf("%6s %-29.29s %.74s\n", - "", (xp->pr_name[0] != '\0') ? xp->pr_name : "", state); + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } - /* cpusetid. */ - if (flags & FLAG_V) - printf("%6s %-6d\n", - "", xp->pr_cpusetid); + /* Look up the parameter. */ + param = params + nparams++; + memset(param, 0, sizeof *param); + param->name = strdup(name); + if (param->name == NULL) + err(1, "strdup"); + param->flags = flags; + /* We have to know about pseudo-parameters without asking. */ + if (!strcmp(param->name, "lastjid")) { + param->type = CTLTYPE_INT; + param->size = sizeof(int); + goto got_type; + } + if (!strcmp(param->name, "errmsg")) { + param->type = CTLTYPE_STRING; + param->size = sizeof(errmsg); + goto got_type; + } + if (get_param(name, param) < 0) { + if (errno != ENOENT) + err(1, "sysctl(0.3.%s)", name); + /* See if this the "no" part of an existing boolean. */ + if ((nname = nononame(name))) { + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_BOOL) { + param->type = CTLTYPE_NOBOOL; + goto got_type; + } + } + if (flags & PARAM_OPT) { + nparams--; + return; + } + errx(1, "unknown parameter: %s", name); + } + if (param->type == CTLTYPE_NODE) { + /* + * A node isn't normally a parameter, but may be a boolean + * if its "no" counterpart exists. + */ + nname = noname(name); + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_NOBOOL) { + param->type = CTLTYPE_BOOL; + goto got_type; + } + errx(1, "unknown parameter: %s", name); + } - q = (char *)(xp + 1); - /* IPv4 addresses. */ - iap = (struct in_addr *)(void *)q; - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - in.s_addr = 0; - for (i = 0; i < xp->pr_ip4s; i++) { - if (i == 0 || flags & FLAG_V) - in.s_addr = iap[i].s_addr; - if (flags & FLAG_V) - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + got_type: + param->value = value; +} + +static int +get_param(const char *name, struct param *param) +{ + char *bufi, *p; + size_t buflen, mlen; + int mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; + + /* Look up the MIB. */ + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + return (-1); + /* Get the type and size. */ + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + param->type = *(int *)buf & CTLTYPE; + bufi = buf + sizeof(int); + p = strchr(bufi, '\0'); + if (p - 2 >= bufi && !strcmp(p - 2, ",a")) { + p[-2] = 0; + param->flags |= PARAM_ARRAY; } - /* IPv6 addresses. */ - ia6p = (struct in6_addr *)(void *)q; - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - for (i = 0; i < xp->pr_ip6s; i++) { - if (flags & FLAG_V) { - inet_ntop(AF_INET6, &ia6p[i], buf, sizeof(buf)); - printf("%6s %s\n", "", buf); + switch (param->type) { + case CTLTYPE_INT: + /* An integer parameter might be a boolean. */ + if (bufi[0] == 'B') + param->type = bufi[1] == 'N' + ? CTLTYPE_NOBOOL : CTLTYPE_BOOL; + case CTLTYPE_UINT: + param->size = sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->size = sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(bufi, "S,in_addr")) { + param->type = CTLTYPE_IPADDR; + param->size = sizeof(struct in_addr); + } else if (!strcmp(bufi, "S,in6_addr")) { + param->type = CTLTYPE_IP6ADDR; + param->size = sizeof(struct in6_addr); } + break; + case CTLTYPE_STRING: + buf[0] = 0; + sysctl(mib + 2, mlen / sizeof(int), buf, &buflen, NULL, 0); + param->size = strtoul(buf, NULL, 10); + if (param->size == 0) + param->size = BUFSIZ; } + return (0); +} - /* If requested print the old style single line version. */ - if (!(flags & FLAG_V)) - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, (in.s_addr) ? inet_ntoa(in) : "", - xp->pr_host, xp->pr_path); +static int +sort_param(const void *a, const void *b) +{ + const struct param *parama, *paramb; + char *ap, *bp; - return (q); + /* Put top-level parameters first. */ + parama = a; + paramb = b; + ap = strchr(parama->name, '.'); + bp = strchr(paramb->name, '.'); + if (ap && !bp) + return (1); + if (bp && !ap) + return (-1); + return (strcmp(parama->name, paramb->name)); } -static void -usage(void) +static char * +noname(const char *name) { + char *nname, *p; - (void)fprintf(stderr, "usage: jls [-av]\n"); - exit(1); + nname = malloc(strlen(name) + 3); + if (nname == NULL) + err(1, "malloc"); + p = strrchr(name, '.'); + if (p != NULL) + sprintf(nname, "%.*s.no%s", p - name, name, p + 1); + else + sprintf(nname, "no%s", name); + return nname; } -int -main(int argc, char *argv[]) -{ - int ch, version; - unsigned flags; - size_t i, j, len; - void *p, *q; +static char * +nononame(const char *name) +{ + char *nname, *p; - flags = 0; - while ((ch = getopt(argc, argv, "av")) != -1) { - switch (ch) { - case 'a': - flags |= FLAG_A; - break; - case 'v': - flags |= FLAG_V; - break; - default: - usage(); - } - } - argc -= optind; - argv += optind; + p = strrchr(name, '.'); + if (strncmp(p ? p + 1 : name, "no", 2)) + return NULL; + nname = malloc(strlen(name) - 1); + if (nname == NULL) + err(1, "malloc"); + if (p != NULL) + sprintf(nname, "%.*s.%s", p - name, name, p + 3); + else + strcpy(nname, name + 2); + return nname; +} - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); +static int +print_jail(int pflags, int jflags) +{ + char *nname; + int i, ai, jid, count, sanity; + char ipbuf[INET6_ADDRSTRLEN]; - j = len; - for (i = 0; i < 4; i++) { - if (len <= 0) - exit(0); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); + static struct iovec2 *iov, *aiov; + static int narray, nkey; - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; + /* Set up the parameter list(s) the first time around. */ + if (iov == NULL) { + iov = malloc(nparams * sizeof(struct iovec2)); + if (iov == NULL) + err(1, "malloc"); + for (i = narray = 0; i < nparams; i++) { + iov[i].name.iov_base = params[i].name; + iov[i].name.iov_len = strlen(params[i].name) + 1; + iov[i].value.iov_base = params[i].value; + iov[i].value.iov_len = + params[i].type == CTLTYPE_STRING && + params[i].value != NULL && + ((char *)params[i].value)[0] != '\0' + ? strlen(params[i].value) + 1 : params[i].size; + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + narray++; + if (params[i].flags & PARAM_KEY) + nkey++; + } + } + if (narray > nkey) { + aiov = malloc(narray * sizeof(struct iovec2)); + if (aiov == NULL) + err(1, "malloc"); + for (i = ai = 0; i < nparams; i++) + if (params[i].flags & + (PARAM_KEY | PARAM_ARRAY)) + aiov[ai++] = iov[i]; + } + } + /* If there are array parameters, find their sizes. */ + if (aiov != NULL) { + for (ai = 0; ai < narray; ai++) + if (aiov[ai].value.iov_base == NULL) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)aiov, 2 * narray, jflags) < 0) + return (-1); + } + /* Allocate storage for all parameters. */ + for (i = ai = 0; i < nparams; i++) { + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + malloc(iov[i].value.iov_len); + } + ai++; + } else + iov[i].value.iov_base = malloc(params[i].size); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + if (params[i].value == NULL) + memset(iov[i].value.iov_base, 0, iov[i].value.iov_len); + } + /* + * Get the actual prison. If there are array elements, retry a few + * times in case the size changed from under us. + */ + if ((jid = jail_get((struct iovec *)iov, 2 * nparams, jflags)) < 0) { + if (errno != EINVAL || aiov == NULL || errmsg[0]) + return (-1); + for (sanity = 0;; sanity++) { + if (sanity == 10) + return (-1); + for (ai = 0; ai < narray; ai++) + if (params[i].flags & PARAM_ARRAY) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)iov, 2 * narray, jflags) < + 0) + return (-1); + for (i = ai = 0; i < nparams; i++) { + if (!(params[i].flags & + (PARAM_KEY | PARAM_ARRAY))) + continue; + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = + aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + realloc(iov[i].value.iov_base, + iov[i].value.iov_len); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + } + ai++; + } + } + } + if (pflags & PRINT_VDEFAULT) { + printf("%6d %-29.29s %.74s\n" + "%6s %-29.29s %.74s\n" + "%6s %-6d\n", + *(int *)iov[0].value.iov_base, + (char *)iov[1].value.iov_base, + (char *)iov[2].value.iov_base, + "", + (char *)iov[3].value.iov_base, + *(int *)iov[4].value.iov_base ? "DYING" : "ACTIVE", + "", + *(int *)iov[5].value.iov_base); + count = iov[6].value.iov_len / sizeof(struct in_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET, + &((struct in_addr *)iov[6].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + if (!strcmp(params[7].name, "ip6.addr")) { + count = iov[7].value.iov_len / sizeof(struct in6_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET6, &((struct in_addr *) + iov[7].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + } + } else if (pflags & PRINT_DEFAULT) + printf("%6d %-15.15s %-29.29s %.74s\n", + *(int *)iov[0].value.iov_base, + iov[1].value.iov_len == 0 ? "-" + : inet_ntoa(*(struct in_addr *)iov[1].value.iov_base), + (char *)iov[2].value.iov_base, + (char *)iov[3].value.iov_base); + else { + for (i = 0; i < nparams; i++) { + if (!(params[i].flags & PARAM_USER)) continue; + if (i > 0) + putchar(' '); + if (pflags & PRINT_NAMEVAL) { + /* + * Generally "name=value", but for booleans + * either "name" or "noname". + */ + switch (params[i].type) { + case CTLTYPE_BOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = noname(params[i].name); + printf("%s", nname); + free(nname); + } + break; + case CTLTYPE_NOBOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = + nononame(params[i].name); + printf("%s", nname); + free(nname); + } + break; + default: + printf("%s=", params[i].name); + } } - err(1, "sysctlbyname(): security.jail.list"); + count = params[i].flags & PARAM_ARRAY + ? iov[i].value.iov_len / params[i].size : 1; + if (count == 0) + putchar('-'); + for (ai = 0; ai < count; ai++) { + if (ai > 0) + putchar(','); + switch (params[i].type) { + case CTLTYPE_INT: + printf("%d", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_UINT: + printf("%u", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_IPADDR: + if (inet_ntop(AF_INET, + &((struct in_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_IP6ADDR: + if (inet_ntop(AF_INET6, + &((struct in6_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_LONG: + printf("%ld", ((long *) + iov[i].value.iov_base)[ai]); + case CTLTYPE_ULONG: + printf("%lu", ((long *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_STRING: + if (pflags & PRINT_QUOTED) + quoted_print((char *) + iov[i].value.iov_base, + params[i].size); + else + printf("%.*s", + params[i].size, (char *) + iov[i].value.iov_base); + break; + case CTLTYPE_BOOL: + case CTLTYPE_NOBOOL: + if (!(pflags & PRINT_NAMEVAL)) + printf(((int *) + iov[i].value.iov_base)[ai] + ? "true" : "false"); + } + } } - break; + putchar('\n'); } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); + for (i = 0; i < nparams; i++) + if (params[i].value == NULL) + free(iov[i].value.iov_base); + return (jid); +} - if (flags & FLAG_V) { - printf(" JID Hostname Path\n"); - printf(" Name State\n"); - printf(" CPUSetID\n"); - printf(" IP Address(es)\n"); - } else { - printf(" JID IP Address Hostname" - " Path\n"); +static void +quoted_print(char *str, int len) +{ + int c, qc; + char *p = str; + char *ep = str + len; + + /* An empty string needs quoting. */ + if (!*p) { + fputs("\"\"", stdout); + return; } - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - q = print_xprison_v1(q, (char *)p + len, flags); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = print_xprison_v3(q, (char *)p + len, flags); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } + + /* + * The value will be surrounded by quotes if it contains spaces + * or quotes. + */ + qc = strchr(p, '\'') ? '"' + : strchr(p, '"') ? '\'' + : strchr(p, ' ') || strchr(p, '\t') ? '"' + : 0; + if (qc) + putchar(qc); + while (p < ep && (c = *p++)) { + if (c == '\\' || c == qc) + putchar('\\'); + putchar(c); } - - free(p); - exit(0); + if (qc) + putchar(qc); } Index: usr.sbin/jls/Makefile =================================================================== --- usr.sbin/jls/Makefile (revision 191896) +++ usr.sbin/jls/Makefile (working copy) @@ -4,6 +4,4 @@ MAN= jls.8 WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jls/jls.8 =================================================================== --- usr.sbin/jls/jls.8 (revision 191896) +++ usr.sbin/jls/jls.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JLS 8 .Os .Sh NAME @@ -33,38 +33,59 @@ .Nd "list jails" .Sh SYNOPSIS .Nm -.Op Fl av +.Op Fl dhnqv +.Op Fl j Ar jail +.Op Ar parameter ... .Sh DESCRIPTION The .Nm -utility lists all jails. -By default only active jails are listed. +utility lists all active jails, or the specified jail. +Each jail is represented by one row which contains space-separated values of +the listed +.Ar parameters , +including the pseudo-parameter +.Va all +which will show all available jail parameters. +A list of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . .Pp -The options are as follows: -.Bl -tag -width ".Fl a" -.It Fl a -Show jails in all states, not only active ones. +If no +.Ar parameters +are given, the following four columns will be printed: +jail identifier (jid), IP address (ip4.addr), hostname (host.hostname), +and path (path). +.Pp +The following options are available: +.Bl -tag -width indent +.It Fl d +List +.Va dying +as well as active jails. +.It Fl h +Print a header line containing the parameters listed. +If no parameters are given on the command line, the default four-column +output always contains a header. +.It Fl n +Print parameters in +.Dq name=value +format, where each parameter is preceded by its name. +This option is ignored for the default four-column output. +.It Fl q +Put quotes around string parameters if they contain spaces or quotes, or are +the empty string. .It Fl v -Show more verbose information. -This also lists cpusets, jail state, multi-IP, etc. instead of the -classic single-IP jail output. +Print a multiple-line summary per jail, with the following parameters: +jail identifier (jid), hostname (host.hostname), path (path), +jail name (name), jail state (dying), cpuset ID (cpuset), +IP address(es) (ip4.addr and ip6.addr). +.It Fl j Ar jail +The jid or name of the +.Ar jail +to list. +Without this option, all active jails will be listed. .El -.Pp -Each jail is represented by rows which, depending on -.Fl v , -contain the following columns: -.Bl -item -offset indent -compact -.It -jail identifier (JID), hostname and path -.It -jail state and name -.It -jail cpuset -.It -followed by one IP adddress per line. -.El .Sh SEE ALSO -.Xr jail 2 , +.Xr jail_get 2 , .Xr jail 8 , .Xr jexec 8 .Sh HISTORY @@ -72,3 +93,5 @@ .Nm utility was added in .Fx 5.1 . +Extensible jail parameters were introduced in +.Fx 8.0 . Index: usr.sbin/jexec/jexec.c =================================================================== --- usr.sbin/jexec/jexec.c (revision 191896) +++ usr.sbin/jexec/jexec.c (working copy) @@ -29,12 +29,16 @@ #include #include +#include #include +#include +#include #include #include #include +#include #include #include #include @@ -43,154 +47,8 @@ #include static void usage(void); +static int addr2jid(const char *addr); -#ifdef SUPPORT_OLD_XPRISON -static -char *lookup_xprison_v1(void *p, char *end, int *id) -{ - struct xprison_v1 *xp; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison_v1 *)p; - - *id = xp->pr_id; - return ((char *)(xp + 1)); -} -#endif - -static -char *lookup_xprison_v3(void *p, char *end, int *id, char *jailname) -{ - struct xprison *xp; - char *q; - int ok; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison *)p; - ok = 1; - - /* Jail state and name. */ - if (xp->pr_state < 0 || xp->pr_state >= - (int)((sizeof(prison_states) / sizeof(struct prison_state)))) - errx(1, "Invalid jail state."); - else if (xp->pr_state != PRISON_STATE_ALIVE) - ok = 0; - if (jailname != NULL) { - if (xp->pr_name[0] == '\0') - ok = 0; - else if (strcmp(jailname, xp->pr_name) != 0) - ok = 0; - } - - q = (char *)(xp + 1); - /* IPv4 addresses. */ - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - /* IPv6 addresses. */ - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - - if (ok) - *id = xp->pr_id; - return (q); -} - -static int -lookup_jail(int jid, char *jailname) -{ - size_t i, j, len; - void *p, *q; - int version, id, xid, count; - - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); - - j = len; - for (i = 0; i < 4; i++) { - if (len == 0) - return (-1); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); - - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; - continue; - } - err(1, "sysctlbyname(): security.jail.list"); - } - break; - } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - - count = 0; - xid = -1; - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - id = -1; - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - if (jailname != NULL) - errx(1, "Version 1 prisons did not " - "support jail names."); - q = lookup_xprison_v1(q, (char *)p + len, &id); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = lookup_xprison_v3(q, (char *)p + len, &id, jailname); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } - /* Possible match; see if we have a jail ID to match as well. */ - if (id > 0 && (jid <= 0 || id == jid)) { - xid = id; - count++; - } - } - - free(p); - - if (count == 1) - return (xid); - else if (count > 1) - errx(1, "Could not uniquely identify the jail."); - else - return (-1); -} - #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -210,22 +68,18 @@ int main(int argc, char *argv[]) { + struct iovec params[2]; int jid; login_cap_t *lcap = NULL; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, ngroups, uflag, Uflag; - char *jailname, *username; + int ch, ngroups, uflag, Uflag, hflag; + char *ep, *username; + ch = uflag = Uflag = hflag = 0; + username = NULL; - ch = uflag = Uflag = 0; - jailname = username = NULL; - jid = -1; - - while ((ch = getopt(argc, argv, "i:n:u:U:")) != -1) { + while ((ch = getopt(argc, argv, "u:U:h")) != -1) { switch (ch) { - case 'n': - jailname = optarg; - break; case 'u': username = optarg; uflag = 1; @@ -234,6 +88,9 @@ username = optarg; Uflag = 1; break; + case 'h': + hflag = 1; + break; default: usage(); } @@ -242,22 +99,24 @@ argv += optind; if (argc < 2) usage(); - if (strlen(argv[0]) > 0) { - jid = (int)strtol(argv[0], NULL, 10); - if (errno) - err(1, "Unable to parse jail ID."); - } - if (jid <= 0 && jailname == NULL) { - fprintf(stderr, "Neither jail ID nor jail name given.\n"); - usage(); - } if (uflag && Uflag) usage(); if (uflag) GET_USER_INFO; - jid = lookup_jail(jid, jailname); - if (jid <= 0) - errx(1, "Cannot identify jail."); + if (hflag) + jid = addr2jid(argv[0]); + else { + jid = strtoul(argv[0], &ep, 10); + if (!*argv[0] || *ep) { + *(const void **)¶ms[0].iov_base = "name"; + params[0].iov_len = sizeof("name"); + params[1].iov_base = argv[0]; + params[1].iov_len = strlen(argv[0]) + 1; + jid = jail_get(params, 2, 0); + if (jid < 0) + errx(1, "Unknown jail: %s", argv[0]); + } + } if (jail_attach(jid) == -1) err(1, "jail_attach(): %d", jid); if (chdir("/") == -1) @@ -285,6 +144,108 @@ fprintf(stderr, "%s%s\n", "usage: jexec [-u username | -U username]", - " [-n jailname] jid command ..."); + " [-h hostname | -h ip-number | jail] command ..."); exit(1); } + +static int +addr2jid(const char *addr) +{ + struct iovec params[6]; + struct in_addr ia; + struct in6_addr ia6; + int cnt, doip, foundjid, ii, jid, lastjid, sanity; + char hostbuf[MAXHOSTNAMELEN]; + + if (inet_pton(AF_INET, addr, &ia) > 0) + doip = 4; + else if (inet_pton(AF_INET6, addr, &ia6) > 0) + doip = 6; + else + doip = 0; + + *(const void **)¶ms[0].iov_base = "lastjid"; + params[0].iov_len = sizeof("lastjid"); + params[1].iov_base = &lastjid; + params[1].iov_len = sizeof(lastjid); + switch (doip) { + case 4: + *(const void **)¶ms[2].iov_base = "ip4.addr"; + params[2].iov_len = sizeof("ip4.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + case 6: + *(const void **)¶ms[2].iov_base = "ip6.addr"; + params[2].iov_len = sizeof("ip6.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + default: + *(const void **)¶ms[2].iov_base = "host.hostname"; + params[2].iov_len = sizeof("host.hostname"); + params[3].iov_base = hostbuf; + params[3].iov_len = MAXHOSTNAMELEN; + } + + cnt = foundjid = sanity = 0; + for (jid = 0;; jid = lastjid) { + if (doip != 0) { + params[3].iov_base = NULL; + params[3].iov_len = 0; + if (jail_get(params, 4, 0) < 0) + break; + params[3].iov_len += 5 * sizeof(struct in6_addr); + params[3].iov_base = malloc(params[3].iov_len); + jid = jail_get(params, 6, 0); + } else + jid = jail_get(params, 4, 0); + if (jid > 0) { + sanity = 0; + if (!strcmp(hostbuf, addr)) { + cnt++; + foundjid = jid; + } else switch (doip) { + case 4: + for (ii = (params[3].iov_len / + sizeof(struct in_addr)) - 1; ii >= 0; ii--) + if (((struct in_addr *)params[3]. + iov_base)[ii].s_addr == ia.s_addr) { + cnt++; + foundjid = jid; + break; + } + break; + case 6: + for (ii = (params[3].iov_len / + sizeof(struct in6_addr)) - 1; ii >= 0; + ii--) + if (IN6_ARE_ADDR_EQUAL(&ia6, + &((struct in6_addr *) + params[3].iov_base)[ii])) { + cnt++; + foundjid = jid; + break; + } + } + } else if (errno == ENOENT || ++sanity > 10) + break; + else + jid = lastjid; + if (doip != 0) + free(params[3].iov_base); + } + switch (cnt) + { + case 0: + errx(1, "Unknown jail: %s", addr); + case 1: + return foundjid; + default: + errx(1, "Could not uniquely identify the jail: %s", addr); + } +} Index: usr.sbin/jexec/jexec.8 =================================================================== --- usr.sbin/jexec/jexec.8 (revision 191896) +++ usr.sbin/jexec/jexec.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JEXEC 8 .Os .Sh NAME @@ -34,36 +34,22 @@ .Sh SYNOPSIS .Nm .Op Fl u Ar username | Fl U Ar username -.Op Fl n Ar jailname -.Ar jid command ... +.Op Fl h Ar hostname | Fl h Ar ip | Ar jid | Ar name +.Ar command ... .Sh DESCRIPTION The .Nm utility executes .Ar command -inside the jail identified by either -.Ar jailname +inside the jail identified by +.Ar hostname , +.Ar ip , +.Ar jid , or -.Ar jid -or both. +.Ar name . .Pp -If the jail cannot be identified uniquely by the given parameters, -an error message is printed. -.Nm -will also check the state of the jail (once supported) to be -.Dv ALIVE -and ignore jails in other states. -The mandatory argument -.Ar jid -is the unique jail identifier as given by -.Xr jls 8 . -In case you only want to match on other criteria, give an empty string. -.Pp The following options are available: .Bl -tag -width indent -.It Fl n Ar jailname -The name of the jail, if given upon creation of the jail. -This is not the hostname of the jail. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -73,6 +59,9 @@ .Ar command should run. .El +.Sh "CAUTIONS" +Only a jail's jid or name is guaranteed to uniquely identify the jail. +Hostname or ip only work here if matched to one unique jail. .Sh SEE ALSO .Xr jail_attach 2 , .Xr jail 8 , Index: usr.sbin/jexec/Makefile =================================================================== --- usr.sbin/jexec/Makefile (revision 191896) +++ usr.sbin/jexec/Makefile (working copy) @@ -6,6 +6,4 @@ LDADD= -lutil WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jail/jail.c =================================================================== --- usr.sbin/jail/jail.c (revision 191896) +++ usr.sbin/jail/jail.c (working copy) @@ -1,5 +1,6 @@ /*- * Copyright (c) 1999 Poul-Henning Kamp. + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -29,51 +30,43 @@ #include #include -#include #include #include -#include +#include +#include #include -#include -#include +#include #include #include #include #include +#include #include #include #include #include -#include #include #include -static void usage(void); -static int add_addresses(struct addrinfo *); -static struct in_addr *copy_addr4(void); -#ifdef INET6 -static struct in6_addr *copy_addr6(void); -#endif +#define SJPARAM "security.jail.param" +#define ERRMSG_SIZE 256 -extern char **environ; - -struct addr4entry { - STAILQ_ENTRY(addr4entry) addr4entries; - struct in_addr ip4; - int count; +struct param { + struct iovec name; + struct iovec value; }; -struct addr6entry { - STAILQ_ENTRY(addr6entry) addr6entries; -#ifdef INET6 - struct in6_addr ip6; -#endif - int count; -}; -STAILQ_HEAD(addr4head, addr4entry) addr4 = STAILQ_HEAD_INITIALIZER(addr4); -STAILQ_HEAD(addr6head, addr6entry) addr6 = STAILQ_HEAD_INITIALIZER(addr6); +static struct param *params; +static int nparams; + +static void set_param(const char *name, char *value); +static void set_param_ip_hostname(char *value, int family); +static void usage(void); + +extern char **environ; + #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -94,27 +87,28 @@ main(int argc, char **argv) { login_cap_t *lcap = NULL; - struct jail j; + struct iovec rparams[2]; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, error, i, ngroups, securelevel; - int hflag, iflag, Jflag, lflag, uflag, Uflag; - char path[PATH_MAX], *jailname, *ep, *username, *JidFile, *ip; + int ch, cmdarg, i, jail_set_flags, jid, ngroups, oldargs, securelevel; + int iflag, Jflag, lflag, rflag, uflag, Uflag; + char *ep, *username, *JidFile; + char errmsg[ERRMSG_SIZE]; static char *cleanenv; const char *shell, *p = NULL; long ltmp; FILE *fp; - struct addrinfo hints, *res0; - hflag = iflag = Jflag = lflag = uflag = Uflag = 0; - securelevel = -1; - jailname = username = JidFile = cleanenv = NULL; + iflag = Jflag = lflag = rflag = uflag = Uflag = 0; + jail_set_flags = JAIL_CREATE | JAIL_UPDATE; + cmdarg = jid = securelevel = -1; + username = JidFile = cleanenv = NULL; fp = NULL; - while ((ch = getopt(argc, argv, "hiln:s:u:U:J:")) != -1) { + while ((ch = getopt(argc, argv, "cdilor:s:u:U:J:")) != -1) { switch (ch) { - case 'h': - hflag = 1; + case 'd': + jail_set_flags |= JAIL_DYING; break; case 'i': iflag = 1; @@ -123,9 +117,6 @@ JidFile = optarg; Jflag = 1; break; - case 'n': - jailname = optarg; - break; case 's': ltmp = strtol(optarg, &ep, 0); if (*ep || ep == optarg || ltmp > INT_MAX || !ltmp) @@ -143,13 +134,41 @@ case 'l': lflag = 1; break; + case 'c': + jail_set_flags = + (jail_set_flags & ~JAIL_UPDATE) | JAIL_CREATE; + break; + case 'o': + jail_set_flags = + (jail_set_flags & ~JAIL_CREATE) | JAIL_UPDATE; + break; + case 'r': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) { + *(const void **)&rparams[0].iov_base = "name"; + rparams[0].iov_len = sizeof("name"); + rparams[1].iov_base = optarg; + rparams[1].iov_len = strlen(optarg) + 1; + jid = jail_get(rparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", optarg); + } + rflag = 1; + break; default: usage(); } } argc -= optind; argv += optind; - if (argc < 4) + if (rflag) { + if (argc > 0 || iflag || Jflag || lflag || uflag || Uflag) + usage(); + if (jail_remove(jid) < 0) + err(1, "jail_remove"); + exit (0); + } + if (argc == 0) usage(); if (uflag && Uflag) usage(); @@ -157,92 +176,70 @@ usage(); if (uflag) GET_USER_INFO; - if (realpath(argv[0], path) == NULL) - err(1, "realpath: %s", argv[0]); - if (chdir(path) != 0) - err(1, "chdir: %s", path); - /* Initialize struct jail. */ - memset(&j, 0, sizeof(j)); - j.version = JAIL_API_VERSION; - j.path = path; - j.hostname = argv[1]; - if (jailname != NULL) - j.jailname = jailname; - /* Handle IP addresses. If requested resolve hostname too. */ - bzero(&hints, sizeof(struct addrinfo)); - hints.ai_protocol = IPPROTO_TCP; - hints.ai_socktype = SOCK_STREAM; - if (JAIL_API_VERSION < 2) - hints.ai_family = PF_INET; - else - hints.ai_family = PF_UNSPEC; - /* Handle hostname. */ - if (hflag != 0) { - error = getaddrinfo(j.hostname, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle hostname: %s", - gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); + /* + * If the first argument (path) starts with a slash, and the third + * argument (IP address) starts with a digit, it is likely to be + * an old-style fixed-parameter command line. + */ + oldargs = argc >= 4 && argv[0][0] == '/' && isdigit(argv[2][0]); + if (oldargs) { + if ((jail_set_flags & (JAIL_CREATE | JAIL_UPDATE)) != + (JAIL_CREATE | JAIL_UPDATE)) + usage(); + jail_set_flags = JAIL_CREATE | JAIL_ATTACH; + set_param("path", argv[0]); + set_param("host.hostname", argv[1]); + set_param("ip4.addr", argv[2]); + cmdarg = 3; + } else { + for (i = 0; i < argc; i++) + if (!strncmp(argv[i], "command=", 8)) { + cmdarg = i; + argv[cmdarg] += 8; + jail_set_flags |= JAIL_ATTACH; + break; + } else + set_param(NULL, argv[i]); } - /* Handle IP addresses. */ - hints.ai_flags = AI_NUMERICHOST; - ip = strtok(argv[2], ","); - while (ip != NULL) { - error = getaddrinfo(ip, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle ip: %s", gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); - ip = strtok(NULL, ","); - } - /* Count IP addresses and add them to struct jail. */ - if (!STAILQ_EMPTY(&addr4)) { - j.ip4s = STAILQ_FIRST(&addr4)->count; - j.ip4 = copy_addr4(); - if (j.ip4s > 0 && j.ip4 == NULL) - errx(1, "copy_addr4()"); - } -#ifdef INET6 - if (!STAILQ_EMPTY(&addr6)) { - j.ip6s = STAILQ_FIRST(&addr6)->count; - j.ip6 = copy_addr6(); - if (j.ip6s > 0 && j.ip6 == NULL) - errx(1, "copy_addr6()"); - } -#endif + errmsg[0] = 0; + set_param("errmsg", errmsg); if (Jflag) { fp = fopen(JidFile, "w"); if (fp == NULL) errx(1, "Could not create JidFile: %s", JidFile); } - i = jail(&j); - if (i == -1) - err(1, "syscall failed with"); + jid = jail_set(¶ms->name, 2 * nparams, jail_set_flags); + if (jid < 0) { + if (errmsg[0] != '\0') + errx(1, "%s", errmsg); + err(1, "jail_set"); + } if (iflag) { - printf("%d\n", i); + printf("%d\n", jid); fflush(stdout); } if (Jflag) { - if (fp != NULL) { + if (oldargs) fprintf(fp, "%d\t%s\t%s\t%s\t%s\n", - i, j.path, j.hostname, argv[2], argv[3]); - (void)fclose(fp); - } else { - errx(1, "Could not write JidFile: %s", JidFile); + jid, (char *)params[0].value.iov_base, + argv[1], argv[2], argv[3]); + else { + fprintf(fp, "%d", jid); + for (i = 0; i < argc; i++) + fprintf(fp, "\t%s", argv[i]); + fprintf(fp, "\n"); } + (void)fclose(fp); } if (securelevel > 0) { if (sysctlbyname("kern.securelevel", NULL, 0, &securelevel, sizeof(securelevel))) err(1, "Can not set securelevel to %d", securelevel); } + if (cmdarg < 0) + exit(0); if (username != NULL) { if (Uflag) GET_USER_INFO; @@ -272,158 +269,256 @@ if (p) setenv("TERM", p, 1); } - if (execv(argv[3], argv + 3) != 0) - err(1, "execv: %s", argv[3]); - exit(0); + execvp(argv[cmdarg], argv + cmdarg); + err(1, "execvp: %s", argv[cmdarg]); } static void -usage(void) +set_param(const char *name, char *value) { + struct param *param; + char *ep, *p; + size_t buflen, mlen; + int i, nval, mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; - (void)fprintf(stderr, "%s%s%s\n", - "usage: jail [-hi] [-n jailname] [-J jid_file] ", - "[-s securelevel] [-l -u username | -U username] ", - "path hostname [ip[,..]] command ..."); - exit(1); -} + static int paramlistsize; -static int -add_addresses(struct addrinfo *res0) -{ - int error; - struct addrinfo *res; - struct addr4entry *a4p; - struct sockaddr_in *sai; + /* Separate the name from the value, if not done already. */ + if (name == NULL) { + name = value; + if ((value = strchr(value, '='))) + *value++ = '\0'; + } + + /* Handle pseudo-parameters separately. */ + if (!strcmp(name, "ip4_hostname")) { + set_param_ip_hostname(value, AF_INET); + return; + } #ifdef INET6 - struct addr6entry *a6p; - struct sockaddr_in6 *sai6; + if (!strcmp(name, "ip6_hostname")) { + set_param_ip_hostname(value, AF_INET6); + return; + } #endif - int count; - error = 0; - for (res = res0; res && error == 0; res = res->ai_next) { - switch (res->ai_family) { - case AF_INET: - sai = (struct sockaddr_in *)(void *)res->ai_addr; - STAILQ_FOREACH(a4p, &addr4, addr4entries) { - if (bcmp(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)) == 0) { - err(1, "Ignoring duplicate IPv4 address."); - break; - } - } - a4p = (struct addr4entry *) malloc( - sizeof(struct addr4entry)); - if (a4p == NULL) { - error = 1; - break; - } - bzero(a4p, sizeof(struct addr4entry)); - bcopy(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)); - if (!STAILQ_EMPTY(&addr4)) - count = STAILQ_FIRST(&addr4)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr4, a4p, addr4entries); - STAILQ_FIRST(&addr4)->count = count + 1; + /* Check for repeat parameters */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name.iov_base)) { + memcpy(params + i, params + i + 1, + (--nparams - i) * sizeof(struct param)); break; + } + + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } + + /* Look up the paramter. */ + param = params + nparams++; + *(const void **)¶m->name.iov_base = name; + param->name.iov_len = strlen(name) + 1; + /* Trivial values - no value or errmsg. */ + if (value == NULL) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + if (!strcmp(name, "errmsg")) { + param->value.iov_base = value; + param->value.iov_len = ERRMSG_SIZE; + return; + } + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + errx(1, "unknown parameter: %s", name); + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + /* + * See if this is an array type. + * Treat non-arrays as an array of one. + */ + p = strchr(buf + sizeof(int), '\0'); + nval = 1; + if (p - 2 >= buf && !strcmp(p - 2, ",a")) { + if (value[0] == '\0' || + (value[0] == '-' && value[1] == '\0')) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + p[-2] = 0; + for (p = strchr(value, ','); p; p = strchr(p + 1, ',')) { + *p = 0; + nval++; + } + } + + /* Set the values according to the parameter type. */ + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + case CTLTYPE_UINT: + param->value.iov_len = nval * sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->value.iov_len = nval * sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) + param->value.iov_len = nval * sizeof(struct in_addr); #ifdef INET6 - case AF_INET6: - sai6 = (struct sockaddr_in6 *)(void *)res->ai_addr; - STAILQ_FOREACH(a6p, &addr6, addr6entries) { - if (bcmp(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)) == 0) { - err(1, "Ignoring duplicate IPv6 address."); - break; - } + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) + param->value.iov_len = nval * sizeof(struct in6_addr); +#endif + else + errx(1, "%s: unknown parameter structure (%s)", + name, buf + sizeof(int)); + break; + case CTLTYPE_STRING: + if (!strcmp(name, "path")) { + param->value.iov_base = malloc(MAXPATHLEN); + if (param->value.iov_base == NULL) + err(1, "malloc"); + if (realpath(value, param->value.iov_base) == NULL) + err(1, "%s: realpath(%s)", name, value); + if (chdir(param->value.iov_base) != 0) + err(1, "chdir: %s", + (char *)param->value.iov_base); + } else + param->value.iov_base = value; + param->value.iov_len = strlen(param->value.iov_base) + 1; + return; + default: + errx(1, "%s: unknown parameter type %d (%s)", + name, *(int *)buf, buf + sizeof(int)); + } + param->value.iov_base = malloc(param->value.iov_len); + for (i = 0; i < nval; i++) { + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + ((int *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_UINT: + ((unsigned *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_LONG: + ((long *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_ULONG: + ((unsigned long *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) { + if (inet_pton(AF_INET, value, + &((struct in_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv4 address: %s", + name, value); } - a6p = (struct addr6entry *) malloc( - sizeof(struct addr6entry)); - if (a6p == NULL) { - error = 1; - break; +#ifdef INET6 + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) { + if (inet_pton(AF_INET6, value, + &((struct in6_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv6 address: %s", + name, value); } - bzero(a6p, sizeof(struct addr6entry)); - bcopy(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)); - if (!STAILQ_EMPTY(&addr6)) - count = STAILQ_FIRST(&addr6)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr6, a6p, addr6entries); - STAILQ_FIRST(&addr6)->count = count + 1; - break; #endif - default: - err(1, "Address family %d not supported. Ignoring.\n", - res->ai_family); - break; } + value = strchr(value, '\0') + 1; } - - return (error); } -static struct in_addr * -copy_addr4(void) +static void +set_param_ip_hostname(char *value, int family) { - size_t len; - struct in_addr *ip4s, *p, ia; - struct addr4entry *a4p; + struct addrinfo hints, *ai0, *ai; + char *avalue, *nextav; + socklen_t avlen; + int error; - if (STAILQ_EMPTY(&addr4)) - return NULL; + /* Look up the hostname in the specified address family. */ + memset(&hints, 0, sizeof(hints)); + hints.ai_family = family; + error = getaddrinfo(value, NULL, &hints, &ai0); + if (error != 0) + errx(1, "hostname %s: %s", value, gai_strerror(error)); - len = STAILQ_FIRST(&addr4)->count * sizeof(struct in_addr); - - ip4s = p = (struct in_addr *)malloc(len); - if (ip4s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr4)) { - a4p = STAILQ_FIRST(&addr4); - STAILQ_REMOVE_HEAD(&addr4, addr4entries); - ia.s_addr = a4p->ip4.s_addr; - bcopy(&ia, p, sizeof(struct in_addr)); - p++; - free(a4p); + /* Convert the addresses to ASCII so set_param can convert them back. */ + avlen = 0; + for (ai = ai0; ai; ai = ai->ai_next) + avlen++; + avlen *= +#ifdef INET6 + family == AF_INET6 ? INET6_ADDRSTRLEN : +#endif + INET_ADDRSTRLEN; + avalue = malloc(avlen); + if (avalue == NULL) + err(1, "malloc"); + avalue[0] = 0; + for (nextav = avalue, ai = ai0; ai; ai = ai->ai_next) { + if (inet_ntop(family, +#ifdef INET6 + family == AF_INET6 ? + (void *)&((struct sockaddr_in6 *)&ai->ai_addr)->sin6_addr : +#endif + (void *)&((struct sockaddr_in *)&ai->ai_addr)->sin_addr, + nextav, avlen - (nextav - avalue)) == NULL) + err(1, "inet_ntop"); + if (ai->ai_next) { + nextav = strchr(nextav, '\0'); + *nextav++ = ','; + } } - - return (ip4s); + set_param( +#ifdef INET6 + family == AF_INET6 ? "ip6.addr" : +#endif + "ip4.addr", avalue); } -#ifdef INET6 -static struct in6_addr * -copy_addr6(void) +static void +usage(void) { - size_t len; - struct in6_addr *ip6s, *p; - struct addr6entry *a6p; - if (STAILQ_EMPTY(&addr6)) - return NULL; - - len = STAILQ_FIRST(&addr6)->count * sizeof(struct in6_addr); - - ip6s = p = (struct in6_addr *)malloc(len); - if (ip6s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr6)) { - a6p = STAILQ_FIRST(&addr6); - STAILQ_REMOVE_HEAD(&addr6, addr6entries); - bcopy(&a6p->ip6, p, sizeof(struct in6_addr)); - p++; - free(a6p); - } - - return (ip6s); + (void)fprintf(stderr, + "usage: jail [-d] [-i] [-J jid_file] [-s securelevel]\n" + " [-l -u username | -U username]\n" + " [[-c | -o] param=value ... [command=command ...] |\n" + " path hostname ip command ...]\n" + " jail [-r jail]\n"); + exit(1); } -#endif - Index: usr.sbin/jail/jail.8 =================================================================== --- usr.sbin/jail/jail.8 (revision 191896) +++ usr.sbin/jail/jail.8 (working copy) @@ -1,5 +1,6 @@ .\" .\" Copyright (c) 2000, 2003 Robert N. M. Watson +.\" Copyright (c) 2008 James Gritton .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without @@ -33,49 +34,37 @@ .\" .\" $FreeBSD$ .\" -.Dd January 24, 2009 +.Dd May 9, 2009 .Dt JAIL 8 .Os .Sh NAME .Nm jail -.Nd "imprison process and its descendants" +.Nd "create or modify a system jail" .Sh SYNOPSIS .Nm -.Op Fl hi -.Op Fl n Ar jailname +.Op Fl di .Op Fl J Ar jid_file .Op Fl s Ar securelevel .Op Fl l u Ar username | Fl U Ar username -.Ar path hostname [ip[,..]] command ... +.Op Fl c | o +.Op Ar parameter=value ... | path hostname ip command ... +.Br +.Nm +.Op Fl r Ar jail .Sh DESCRIPTION The .Nm -utility imprisons a process and all future descendants. +utility creates a new jail or modifies an existing jail, optionally +imprisoning the current process (and future descendants) inside it. .Pp The options are as follows: -.Bl -tag -width ".Fl u Ar username" -.It Fl h -Resolve -.Va hostname -and add all IP addresses returned by the resolver -to the list of -.Va ip-addresses -for this prison. -This may affect default address selection for outgoing IPv4 connections -of prisons. -The address first returned by the resolver for each address family -will be used as primary address. -See -.Va ip-addresses -further down for details. +.Bl -tag -width indent +.It Fl d +Allow making changes to a +.Va +dying jail. .It Fl i Output the jail identifier of the newly created jail. -.It Fl n Ar jailname -Assign and administrative name to the jail that can be used for management -or auditing purposes. -The system will -.Sy not enforce -the name to be unique. .It Fl J Ar jid_file Write a .Ar jid_file @@ -100,7 +89,10 @@ .It Fl s Ar securelevel Sets the .Va kern.securelevel -sysctl variable to the specified value inside the newly created jail. +MIB entry to the specified value inside the newly created jail. +This is equivalent to setting the jail's +.Va securelevel +parameter. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -109,20 +101,159 @@ The user name from jailed environment as whom the .Ar command should run. -.It Ar path +.It Fl c +Create a new jail, but do not modify an existing one. +Default behavior is to allow modification if a +.Va jid +or +.Va name +parameter refers to an existing jail. +.It Fl o +Only modify an existing jail, but do not create one. +One of the +.Va jid +or +.Va name +parameters must exist and refer to an existing jail. +.It Fl r +Remove the +.Ar jail +specified by jid or name. +All jailed processes are killed, and all children of this jail are also +removed. +.El +.Pp +.Ar Parameters +are listed in +.Dq name=value +form, following the options. +Some parameters are boolean, and do not have a value but are set by the +name alone with or without a +.Dq no +prefix, e.g. +.Va persist +or +.Va nopersist . +Any parameters not set will be given default values, generally based on the +current environment. +.Pp +The pseudo-parameter +.Va command +specifies that the current process should enter the new (or modified) jail, +and run the specified command. +It must be the last parameter specified, because it includes not only +the value following the +.Sq = +sign, but also passes the rest of the arguments to the command. +.Pp +Instead of supplying named +.Ar parameters , +four fixed parameters may be supplied in order on the command line: +.Ar path , +.Ar hostname , +.Ar ip , +and +.Ar command . +As the +.Va jid +and +.Va name +parameters aren't in this list, this mode will always create a new jail, and +the +.Fl c +and +.Fl o +options don't apply. +.Pp +Jails have a set a core parameters, and modules can add their own jail +parameters. +The current set of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . +Some of the notable core parameters include: +.Bl -tag -width indent +.It Va jid +The jail identifier. +This will be assigned automatically to a new jail (or can be explicitly +set), and can be used to identify the jail for later modification, or +for such commands as +.Xr jls 8 +or +.Xr jexec 8 . +.It Va name +The jail name. +This is an arbitrary string that identifies a jail (except it may not +contain a +.Sq \&. ) . +Like the +.Va jid , +it can be passed to later +.Nm +commands, or to +.Xr jls 8 +or +.Xr jexec 8 . +If no +.Va name +is supplied, a default is assumed that is the same as the +.Va jid . +.It Va path Directory which is to be the root of the prison. -.It Ar hostname -Hostname of the prison. -.It Ar ip-addresses -None, one or more IPv4 and IPv6 addresses assigned to the prison. -The first address of each address family that was assigned to the jail will -be used as the source address in case source address selection on unbound -sockets cannot find a better match. +The +.Va command +(if any) is run from this directory, as are commands from +.Xr jexec 8 . +.It Va ip4.addr +A comma-separated list of IPv4 addresses assigned to the prison. +If this is set, the jail is restricted to using only these address. +Any attempts to use other addresses fail, and attempts to use wildcard +addresses silently use the jailed address instead. +For IPv4 the first address given will be kept used as the source address +in case source address selection on unbound sockets cannot find a better +match. It is only possible to start multiple jails with the same IP address, if none of the jails has more than this single overlapping IP address -assigned to itself for the address family in question. -.It Ar command -Pathname of the program which is to be executed. +assigned to itself. +.Pp +A list of zero elements (an empty string) will stop the jail from using IPv4 +entirely; setting the boolean parameter +.Ar noip4 +will not restrict the jail at all. +.It Va ip6.addr +A list of IPv6 addresses assigned to the prison, the counterpart to +.Ar ip4.addr +above. +.It Va host.hostname +Hostname of the prison. +If not specified, a jail will use the system hostname. +.It Va ip4_hostname +.It Va ip6_hostname +These psuedo-parameters actually set the jail's +.Va ip4 +and +.Va ip6 +parameters, but will get those addresses by resolving the supplied hostname. +.It Va securelevel +The value of the jail's +.Va kern.securelevel +sysctl. +A jail never has a lower securelevel than the default system, but by +setting this parameter it may have a higher one. +If the system securelevel is changed, any jail securelevels will be at +least as secure. +.It Va persist +Setting this boolean parameter allows a jail to exist without any +processes. +Normally, a jail is destroyed as its last process exits. +.It Va command +The command to run after creating or modifying the jail. +This command is run inside the jail, under the +.Va path +directory. +A new jail must have either the +.Va persist +or +.Va command +parameter set. .El .Pp Jails are typically set up using one of two philosophies: either to @@ -142,10 +273,6 @@ This manual page documents the configuration steps necessary to support either of these steps, although the configuration steps may be refined based on local requirements. -.Pp -Please see the -.Xr jail 2 -man page for further details. .Sh EXAMPLES .Ss "Setting up a Jail Directory Tree" To set up a jail directory tree containing an entire @@ -359,15 +486,6 @@ virtual host interface, and then start the jail's .Pa /etc/rc script from within the jail. -.Pp -NOTE: If you plan to allow untrusted users to have root access inside the -jail, you may wish to consider setting the -.Va security.jail.set_hostname_allowed -sysctl variable to 0. -Please see the management discussion later in this document as to why this -may be a good idea. -If you do decide to set this variable, -it must be set before starting any jails, and once each boot. .Bd -literal -offset indent ifconfig ed0 inet alias 192.0.2.100/32 mount -t procfs proc /data/jail/192.0.2.100/proc @@ -445,7 +563,7 @@ .Pp The .Pa /proc/ Ns Ar pid Ns Pa /status -file contains, as its last field, the hostname of the jail in which the +file contains, as its last field, the name of the jail in which the process runs, or .Dq Li - to indicate that the process is not running within a jail. @@ -454,21 +572,7 @@ command also shows a .Ql J flag for processes in a jail. -However, the hostname for a jail may be, by -default, modified from within the jail, so the -.Pa /proc -status entry is unreliable by default. -To disable the setting of the hostname -from within a jail, set the -.Va security.jail.set_hostname_allowed -sysctl variable in the host environment to 0, which will affect all jails. -You can have this sysctl set on each boot using -.Xr sysctl.conf 5 . -Just add the following line to -.Pa /etc/sysctl.conf : .Pp -.Dl security.jail.set_hostname_allowed=0 -.Pp You can also list/kill processes based on their jail ID. To show processes and their jail ID, use the following command: .Pp @@ -510,8 +614,6 @@ the host environment using .Xr sysctl 8 MIB variables. -Currently, these variables affect all jails on the system, although in -the future this functionality may be finer grained. .Bl -tag -width XXX .It Va security.jail.allow_raw_sockets This MIB entry determines whether or not prison root is allowed to @@ -555,12 +657,6 @@ .Xr hostname 1 or .Xr sethostname 3 . -In the current jail implementation, the ability to set the hostname from -within the jail can impact management tools relying on the accuracy of jail -information in -.Pa /proc . -As such, this should be disabled in environments where privileged access to -jails is given out to untrusted parties. .It Va security.jail.socket_unixiproute_only The jail functionality binds an IPv4 address to each jail, and limits access to other network addresses in the IPv4 space that may be available @@ -605,12 +701,30 @@ a jail. This functionality is disabled by default, but can be enabled by setting this MIB entry to 1. -.It Va security.jail.jail_max_af_ips +.It Va security.jail.allow_jails +This MIB entry determines if a privileged user inside a jail can create +sub-jails under that jail. It is disabled by default, but can be enabled by +setting this MIB entry to 1. See the section below for more information on +hierarchical jails. +.It Va security.jail.max_af_ips This MIB entry determines how may address per address family a prison may have. The default is 255. .El .Pp -The read-only sysctl variable +These variables affect all jails on the system. Finer grained control is +available via per-jail boolean parameters in the +.Va perm +group. For example, to globally allow raw socket creation, you can set the +.Va security.jail.allow_raw_sockets +MIB entry; to allow a single jail to create raw sockets, set its +.Va perm.allow_raw_sockets +parameter. Or to disallow a single jail from setting its hostname, set +.Va perm.noset_hostname_allowed . +These per-jail permission parameters default to the current value of the +associated sysctls at the time of jail creation, but changing the sysctls +won't change the behavior of existing jails. +.Pp +The read-only MIB entry .Va security.jail.jailed can be used to determine if a process is running inside a jail (value is one) or not (value is zero). @@ -632,6 +746,68 @@ .Va kern.securelevel and .Va kern.hostname . +.Ss "Hierarchical Jails" +By setting the +.Va security.jail.allow_jails +MIB entry or a jail's +.Va perm.allow_jails +parameter, processes within a jail may be able to create jails of their own. +These child jails are kept in a hierarchy, with jails only able to see and/or +modify their own jails (or those jails' children). +Each jail has a read-only +.Va parent +parameter, containing the +.Va jid +of the jail that created it; a +.Va jid +of 0 indicated the jail is a child of the current jail (or is a top-level +jail if the current process isn't jailed). +Jail parameters that are normally inherited from the base system, are in +the hierarchical case inherited from the jail that created them. +.Pp +The global sysctl MIB entries listed above (with the exception of +.Va security.jail.jailed_sockets_first ) +are per-jail, and can be used to define the default permissions of child +jails. +Jailed processes are not allowed to confer greater permissions than they +themselves are given, e.g. if a jail is created with +.Va perm.noset_hostname_allowed , +it is not able to set its +.Va security.jail.set_hostname_allowed +sysctl. +Similarly, such restrictions as +.Va ip4 +and +.Va securelevel +may not be bypassed in child jails. +.Pp +A child jail may in turn create its own child jails, unless its own +.Va perm.noallow_jails +parameter is set (remember, it defaults to the parent jail's value). +These jails are visible to and can be modified by their parent and all +ancestors. +.Pp +Jail names reflect this hierarchy, with a full name being an MIB-type string +separated by dots. +For example, if a base system process creates a jail +.Dq foo , +and a process under that jail creates another jail +.Dq bar , +then the second jail will be seen as +.Dq foo.bar +in the base system (though it is only seen as +.Dq bar +to any processes inside jail +.Dq foo ) . +Jids on the other hand exist in a single space, and each jail must have a +unique jid. +.Pp +Like the names, a child jail's +.Va path +is relative to its creator's own +.Va path . +This is by virtue of the child jail being created in the chrooted +environment of the first jail. .Sh SEE ALSO .Xr killall 1 , .Xr lsvfs 1 , @@ -641,7 +817,7 @@ .Xr ps 1 , .Xr quota 1 , .Xr chroot 2 , -.Xr jail 2 , +.Xr jail_set 2 , .Xr jail_attach 2 , .Xr procfs 5 , .Xr rc.conf 5 , @@ -665,6 +841,8 @@ .Nm utility appeared in .Fx 4.0 . +Extensible jail parameters were introduced in +.Fx 8.0 . .Sh AUTHORS .An -nosplit The jail feature was written by @@ -683,6 +861,9 @@ originally done by .An Pawel Jakub Dawidek for IPv4. +.Pp +.An James Gritton +added the extensible jail parameters and hierchical jails. .Sh BUGS Jail currently lacks the ability to allow access to specific jail information via From 000.fbsd at quip.cz Sat May 9 09:57:48 2009 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Sat May 9 09:57:54 2009 Subject: Hierarchical jails In-Reply-To: <4A051DE3.30705@FreeBSD.org> References: <4A051DE3.30705@FreeBSD.org> Message-ID: <4A054F24.5030206@quip.cz> Jamie Gritton wrote: > Here's the first round of hierarchical jails under the new framework. > > Instead of creds having either a prison or a NULL pointer, they all have > a prison pointer with the default being the global "prison0" that > contains information about the real environment. Jailed root may (if > granted permission) create prisons that would be under its place in the > hierarchy, but may not alter (or even see) prisons at its level or > above. > > The JID space is flat, i.e. every prison in the system has a unique ID. > The prison name space is hierarchical, with jails having dot-separated > component names. [...] I am glad that you are working on this feature! I added info + links to this patches on wiki http://wiki.freebsd.org/Jails I hope I will have some free time to test it soon. Miroslav Lachman From venture37 at gmail.com Sun May 10 15:06:54 2009 From: venture37 at gmail.com (Sevan / Venture37) Date: Sun May 10 15:07:00 2009 Subject: Kernel Compiled with options VIMAGE panics on boot Message-ID: <4A06E7B0.10600@gmail.com> Hi I've installed a fresh copy of this months snapshot, updated src via cvsup, made a copy of the GENERIC config file, added 'options VIMAGE' & 'nooptions SCTP' & compiled & installed it, on boot the system panics with the error: panic: in /usr/src/sys/net/if.c:485 if_alloc() vnet=0 curvnet=0 cpuid = 0 photo of panic: http://img18.imageshack.us/img18/3297/img1057e.jpg Any ideas?? /usr/src/sys/net/if.c is v1.328 if that helps. Sevan / Venture37 From zec at icir.org Mon May 11 02:35:16 2009 From: zec at icir.org (Marko Zec) Date: Mon May 11 02:35:23 2009 Subject: Kernel Compiled with options VIMAGE panics on boot In-Reply-To: <4A06E7B0.10600@gmail.com> References: <4A06E7B0.10600@gmail.com> Message-ID: <200905110409.04612.zec@icir.org> On Sunday 10 May 2009 16:41:52 Sevan / Venture37 wrote: > Hi > I've installed a fresh copy of this months snapshot, updated src via > cvsup, made a copy of the GENERIC config file, added 'options VIMAGE' & > 'nooptions SCTP' & compiled & installed it, on boot the system panics > with the error: > panic: in /usr/src/sys/net/if.c:485 if_alloc() > vnet=0 curvnet=0 > cpuid = 0 > > photo of panic: > http://img18.imageshack.us/img18/3297/img1057e.jpg > > Any ideas?? > > /usr/src/sys/net/if.c is v1.328 if that helps. It seems that the USB code should set the curvnet context when attaching and detaching ifnets (rum0 in your case), which it currently does not. I'll look into this in the next few days - thanks for the report! Marko From 000.fbsd at quip.cz Tue May 12 12:17:48 2009 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Tue May 12 12:17:55 2009 Subject: problem with time and cronjobs in Qemu guest Message-ID: <4A096517.7020104@quip.cz> Hi, I have Qemu 0.10.1 installed on my old Windows 2000 PC and I am running FreeBSD 7.2-RC1 i386 in it for some testing purposes. Today I realized that some cron task was not run at specified time nor some commands in endless loop. This is simple example of strange behavior (written in tcsh shell) root@firstbsd ~/# while 1 while?date while?sleep 10 while?end Tue May 12 13:43:20 CEST 2009 Tue May 12 13:43:58 CEST 2009 Tue May 12 13:44:35 CEST 2009 Tue May 12 13:45:12 CEST 2009 Tue May 12 13:45:50 CEST 2009 Tue May 12 13:46:27 CEST 2009 Tue May 12 13:47:04 CEST 2009 Tue May 12 13:47:42 CEST 2009 Tue May 12 13:48:20 CEST 2009 Tue May 12 13:48:57 CEST 2009 Tue May 12 13:49:37 CEST 2009 Tue May 12 13:50:19 CEST 2009 Tue May 12 13:50:58 CEST 2009 Tue May 12 13:51:37 CEST 2009 As you can see - the date command in not executed every 10 seconds. Are there some settings (in FreeBSD guest or in Qemu itself) to fix this? Miroslav Lachman From julian at elischer.org Tue May 12 21:50:53 2009 From: julian at elischer.org (Julian Elischer) Date: Tue May 12 21:50:59 2009 Subject: PERFORCE change 161987 for review In-Reply-To: <200905121848.n4CImKQt036691@repoman.freebsd.org> References: <200905121848.n4CImKQt036691@repoman.freebsd.org> Message-ID: <4A09EF38.6060407@elischer.org> Marko Zec wrote: > http://perforce.freebsd.org/chv.cgi?CH=161987 > > Change 161987 by zec@zec_tpx32 on 2009/05/12 18:47:49 > > Back out O(n**2) ad-hoc hack for searching for available > ifunits in cloning ifnets, and restore the standard O(n) > bitmapped searching / ifunit allocation method for both > default and options VIMAGE builds. > > HOWEVER, hereby we also introduce per-vnet if_clone driver > registration and ifunit allocation. As a (necessary) example, > if_loop is modified to attach itself as an independent > cloner instance to each vnet. > > This approach has a neat byproduct: if_clone drivers that > do not explicitly declare themselves as multi-vnet, by > exporting an iattach() method and registering to the vnet > framework, continue to work with unmodified semantics in > the default vnet. However, they will NOT be available > in other vnets. Ah I didn't read this right the first time.. generally, good but... So we cannot have tun drivers in vimages? tun needs it's /dev entres, so can not be 'renumbered' (in the base sense) until we somehow add vimage support to devfs. however having tun3 in one vimage and tun4 in another would still be pretty ok I think. So I think the modes wanted would be: "Unvirtualised" appears in base vimage only "Scattered" one namespace, but in different vimages. "Virtualised" separate namespaces. p.s excuse my unamerican way of spelling 'ised' (not ized) my fingers refuse to co-operate. > > This brings us a step closer to being able to selectively > attach subsystems to particular vnets, instead of having > all subsystems unconditionally available to all vnets by > default. > From zec at freebsd.org Tue May 12 23:18:38 2009 From: zec at freebsd.org (Marko Zec) Date: Tue May 12 23:18:43 2009 Subject: PERFORCE change 161987 for review In-Reply-To: <4A09EF38.6060407@elischer.org> References: <200905121848.n4CImKQt036691@repoman.freebsd.org> <4A09EF38.6060407@elischer.org> Message-ID: <200905130100.32678.zec@freebsd.org> On Tuesday 12 May 2009 23:50:48 Julian Elischer wrote: > Marko Zec wrote: > > http://perforce.freebsd.org/chv.cgi?CH=161987 > > > > Change 161987 by zec@zec_tpx32 on 2009/05/12 18:47:49 > > > > Back out O(n**2) ad-hoc hack for searching for available > > ifunits in cloning ifnets, and restore the standard O(n) > > bitmapped searching / ifunit allocation method for both > > default and options VIMAGE builds. > > > > HOWEVER, hereby we also introduce per-vnet if_clone driver > > registration and ifunit allocation. As a (necessary) example, > > if_loop is modified to attach itself as an independent > > cloner instance to each vnet. > > > > This approach has a neat byproduct: if_clone drivers that > > do not explicitly declare themselves as multi-vnet, by > > exporting an iattach() method and registering to the vnet > > framework, continue to work with unmodified semantics in > > the default vnet. However, they will NOT be available > > in other vnets. > > Ah I didn't read this right the first time.. > generally, good but... > > So we cannot have tun drivers in vimages? > > tun needs it's /dev entres, so can not be 'renumbered' (in the > base sense) until we somehow add vimage support to devfs. > however having tun3 in one vimage and tun4 in another would still > be pretty ok I think. Hmm but how would such an approach help with say /dev/pf, which also has to be functional in all vnets? Wouldn't it be useful if a single /dev entry could provide access to appropriate subsystem instances in different vnets, depending in which vnet the process which opens the special file operates? I think this is how the virtualized pf did work, and there's anegdotal evidence that it did work well, at least until this got ripped off the vimage branch with the next pf import from OpenBSD :) Marko > So I think the modes wanted would be: > > "Unvirtualised" appears in base vimage only > "Scattered" one namespace, but in different vimages. > "Virtualised" separate namespaces. > > p.s excuse my unamerican way of spelling 'ised' (not ized) > my fingers refuse to co-operate. > > > This brings us a step closer to being able to selectively > > attach subsystems to particular vnets, instead of having > > all subsystems unconditionally available to all vnets by > > default. From marco.borsatino at poste.it Thu May 14 06:06:25 2009 From: marco.borsatino at poste.it (marco.borsatino@poste.it) Date: Thu May 14 06:06:37 2009 Subject: virtual network with qemu Message-ID: Hi to all. I'd like to implement a little virtual network using QEMU 0.10.2, but, until now, I have failed. This is the situation. Host: AMD 64 running FreeBSD 7.2 #ifconfig nfe0: flags=8843 metric 0 mtu 1500 options=10b ether 00:15:f2:44:2d:f9 inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255 media: Ethernet autoselect (100baseTX ) status: active plip0: flags=108810 metric 0 mtu 1500 lo0: flags=8049 metric 0 mtu 16384 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 I've create an image using the FreeBSD 7.2 DVD: #qemu-img create -f qcow2 hda fbsd72.img 10G The image has been created. #qemu -L /usr/local/share/qemu/ -cdrom /dev/acd0 -m 512 -boot d fbsd72.img Alfter a long time, the installation of the guest system has been completed. When the installation program asked for information about network configuration, as a first step, I chose DHCP configuration and, as usualy, the network has been set like this: IP 10.0.2.15/255.255.255.0 gateway 10.0.2.2 nameserver 10.0.2.3 When the installation of the guest PC was finished, I've copied the image to pc01.img, to keep the original untouched. After that I've started qemu like this: #qemu -L /usr/local/share/qemu -localtime -net nic,macaddr=00:15:f2:44:2d:01 -net socket,mcast=230.0.0.1:1234 -hda pc01.img -cdrom /dev/acd0 & but the network in the guest system does not work. ifconfig in the guest system tells: #ifconfig -a ed0: flags=8843 metric 0 mtu 1500 ether 00:15:f2:44:2d:01 media: Ethernet 10baseT/UTP plip0: ... lo0: ... If I try: #ping 10.0.2.2 (the gateway) all packets are lost. For this reason, I've tryed a static IP configuration like this: IP 10.0.2.4/255.255.255.0 gateway 10.0.2.2 nameserver 10.0.2.3 but the gateway does not respond. So it is useless to try with a second guest system. Please help. Sorry for my bad english. Marco From jamie at FreeBSD.org Thu May 14 17:12:58 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Thu May 14 17:13:09 2009 Subject: Hierarchical jails In-Reply-To: <4A051DE3.30705@FreeBSD.org> References: <4A051DE3.30705@FreeBSD.org> Message-ID: <4A0C5112.9010103@FreeBSD.org> There's still a change to offer your input on the new jails before they go in! OK, given the lack of response so far, it's less "still a chance" than "please?". Current plans are to have this in place for 8.0, with connections to the ongoing Vimage work. Hopefully the silence is approval, and commits will likely be appearing soon. I wrote: > Here's the first round of hierarchical jails under the new framework. > > Instead of creds having either a prison or a NULL pointer, they all have > a prison pointer with the default being the global "prison0" that > contains information about the real environment. Jailed root may (if > granted permission) create prisons that would be under its place in the > hierarchy, but may not alter (or even see) prisons at its level or > above. > > The JID space is flat, i.e. every prison in the system has a unique ID. > The prison name space is hierarchical, with jails having dot-separated > component names. > > prison0 contains three fields that were system globals: pr_root, > pr_host, and pr_securelevel. I've kept the globals rootvnode and > hostname, and take care that when one is changed the other changes too > (not yet true for hostname - read on). But I've actually removed the > global securelevel, instead forcing people to use securelevel_gt() and > securelevel_ge() (or in very rare cases to check prison0.pr_securelevel > directly). I chose to do that because while using the global rootvnode > and hostname may be incorrect, using the wrong securelevel is, well, > insecure. Actually it would be insecure to use the wrong rootvnode too, > but I'm not convinced removing that global is worth the headache. > > Other globals are subsumed into prison0, but they were only ever part of > the jail system anyway: the various jail-related permission bits and > such administrative things as prisoncount. > > The prison hierarchy keeps track of restrictions placed on prisons, and > will reflect them downward so a child jail is always at least as > restricted as its ancestors. It doesn't go the other way though: if a > prison's restrictions are loosened, the children stay as they are. > > This patch doesn't have anything for userland, and hierarchical jails > won't work without that patch (because jails don't have permission to > create sub-jails by default, and jail(2) can't grant that permission). > A userland patch will follow soon, very similar to the version I posted > here recently. > > - Jamie From julian at elischer.org Thu May 14 17:33:05 2009 From: julian at elischer.org (Julian Elischer) Date: Thu May 14 17:33:17 2009 Subject: Hierarchical jails In-Reply-To: <4A0C5112.9010103@FreeBSD.org> References: <4A051DE3.30705@FreeBSD.org> <4A0C5112.9010103@FreeBSD.org> Message-ID: <4A0C55CF.70706@elischer.org> Jamie Gritton wrote: > There's still a change to offer your input on the new jails before they > go in! OK, given the lack of response so far, it's less "still a > chance" than "please?". Current plans are to have this in place for > 8.0, with connections to the ongoing Vimage work. Hopefully the silence > is approval, and commits will likely be appearing soon. > I think I may have replied before but it all looks pretty good to me.. > > I wrote: >> Here's the first round of hierarchical jails under the new framework. >> >> Instead of creds having either a prison or a NULL pointer, they all have >> a prison pointer with the default being the global "prison0" that >> contains information about the real environment. Jailed root may (if >> granted permission) create prisons that would be under its place in the >> hierarchy, but may not alter (or even see) prisons at its level or >> above. agreed >> >> The JID space is flat, i.e. every prison in the system has a unique ID. >> The prison name space is hierarchical, with jails having dot-separated >> component names. agreed >> >> prison0 contains three fields that were system globals: pr_root, >> pr_host, and pr_securelevel. I've kept the globals rootvnode and >> hostname, and take care that when one is changed the other changes too >> (not yet true for hostname - read on). But I've actually removed the >> global securelevel, instead forcing people to use securelevel_gt() and >> securelevel_ge() (or in very rare cases to check prison0.pr_securelevel >> directly). I chose to do that because while using the global rootvnode >> and hostname may be incorrect, using the wrong securelevel is, well, >> insecure. Actually it would be insecure to use the wrong rootvnode too, >> but I'm not convinced removing that global is worth the headache. not sure why you want to keep hostname a true global It seems to me that it is an eminently virtalizable property. though possible a special hostname might exist for the base system for error messages etc. kind of like V_hostname an G_hostname :) otherwise I agree. >> >> Other globals are subsumed into prison0, but they were only ever part of >> the jail system anyway: the various jail-related permission bits and >> such administrative things as prisoncount. >> >> The prison hierarchy keeps track of restrictions placed on prisons, and >> will reflect them downward so a child jail is always at least as >> restricted as its ancestors. It doesn't go the other way though: if a >> prison's restrictions are loosened, the children stay as they are. I agree with this in principle and we'll see ow it works out in practice. >> >> This patch doesn't have anything for userland, and hierarchical jails >> won't work without that patch (because jails don't have permission to >> create sub-jails by default, and jail(2) can't grant that permission). >> A userland patch will follow soon, very similar to the version I posted >> here recently. I looked at that too. All in all, I like it. >> >> - Jamie > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From jamie at FreeBSD.org Thu May 14 17:44:25 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Thu May 14 17:44:42 2009 Subject: Hierarchical jails In-Reply-To: <4A0C55CF.70706@elischer.org> References: <4A051DE3.30705@FreeBSD.org> <4A0C5112.9010103@FreeBSD.org> <4A0C55CF.70706@elischer.org> Message-ID: <4A0C5871.1080407@FreeBSD.org> Julian Elischer wrote: > Jamie Gritton wrote: >>> prison0 contains three fields that were system globals: pr_root, >>> pr_host, and pr_securelevel. I've kept the globals rootvnode and >>> hostname, and take care that when one is changed the other changes too >>> (not yet true for hostname - read on). But I've actually removed the >>> global securelevel, instead forcing people to use securelevel_gt() and >>> securelevel_ge() (or in very rare cases to check prison0.pr_securelevel >>> directly). I chose to do that because while using the global rootvnode >>> and hostname may be incorrect, using the wrong securelevel is, well, >>> insecure. Actually it would be insecure to use the wrong rootvnode too, >>> but I'm not convinced removing that global is worth the headache. > > not sure why you want to keep hostname a true global > It seems to me that it is an eminently virtalizable property. > though possible a special hostname might exist for the base system > for error messages etc. > kind of like V_hostname an G_hostname :) It was mostly for the number of times I saw that global being used - didn't want to upset the order of things too much. I didn't see nearly as much use of securelevel with the advent of securelevel_ge() and securelevel_gt(). But I suppose the G/V_hostname thing has already gotten that ball rolling. There is at least one place that uses the global securelevel directly (i.e. prison0.securelevel). The same could be done for hostnames, which does a pretty good job of pointing out that this is the global hostname being used. Because you're right - the hostname is at the center of of what it means to have a jail identity. Then there's rootvnode, the third global that's superseded by hierarchical jails. I could also remove that, allowing the use of prison0.pr_root for those who need the real root. From jamie at FreeBSD.org Thu May 14 19:12:48 2009 From: jamie at FreeBSD.org (Jamie Gritton) Date: Thu May 14 19:13:21 2009 Subject: Hierarchical jails In-Reply-To: <20090514181446.GA42264@stack.nl> References: <4A051DE3.30705@FreeBSD.org> <4A0C5112.9010103@FreeBSD.org> <20090514181446.GA42264@stack.nl> Message-ID: <4A0C6D29.7020606@FreeBSD.org> Jilles Tjoelker wrote: > On Thu, May 14, 2009 at 11:12:50AM -0600, Jamie Gritton wrote: >> There's still a change to offer your input on the new jails before they >> go in! OK, given the lack of response so far, it's less "still a >> chance" than "please?". Current plans are to have this in place for >> 8.0, with connections to the ongoing Vimage work. Hopefully the silence >> is approval, and commits will likely be appearing soon. > > I have not tried this, but I think this patch may allow jailed roots to > escape. The problem is that there is only one fd_jdir. The escape would > go like: jailed root creates a new jail in a subdirectory, opens its / > and sends the fd to a process in the new jail via a unix domain socket. > When the process calls fchdir on the fd, it will be able to access .. > normally. > > With nested chroot, or chroot in jail, this is not possible, because > fd_jdir always contains the first jail or chroot done and will not allow > escaping from it; however, root in a level 2 chroot can escape back to > level 1 using chroot. Indeed - considering how that was a major design point of jails, I'm not sure how I missed it. ".." processing will need to run up the jail tree. No big deal on performance and easily done, but embarrassing not have had that in place already. From m.borsatino at alice.it Tue May 12 11:31:08 2009 From: m.borsatino at alice.it (m.borsatino@alice.it) Date: Thu May 14 22:11:59 2009 Subject: qemu virtual network Message-ID: <3B419C0DD853DC47AA4FA65D0FC92B5FE5C8FF@FBCMST11V01.fbc.local> Hi to all. I'd like to implement a little virtual network using QEMU 0.10.2, but, until now, I have failed. This is the situation. Host: AMD 64 running FreeBSD 7.2 #ifconfig nfe0: flags=8843 metric 0 mtu 1500 options=10b ether 00:15:f2:44:2d:f9 inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255 media: Ethernet autoselect (100baseTX ) status: active plip0: flags=108810 metric 0 mtu 1500 lo0: flags=8049 metric 0 mtu 16384 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 I've create an image using the FreeBSD 7.2 DVD: #qemu-img create -f qcow2 hda fbsd72.img 10G The image has been created. #qemu -L /usr/local/share/qemu/ -cdrom /dev/acd0 -m 512 -boot d fbsd72.img Alfter a long time, the installation of the guest system has been completed. When the installation program asked for information about network configuration, as a first step, I chose DHCP configuration and, as usualy, the network has been set like this: IP 10.0.2.15/255.255.255.0 gateway 10.0.2.2 nameserver 10.0.2.3 When the installation of the guest PC was finished, I've copied the image to pc01.img, to keep the original untouched. After that I've started qemu like this: #qemu -L /usr/local/share/qemu -localtime -net nic,macaddr=00:15:f2:44:2d:01 -net socket,mcast=230.0.0.1:1234 -hda pc01.img -cdrom /dev/acd0 & but the network in the guest system does not work. ifconfig in the guest system tells: #ifconfig -a ed0: flags=8843 metric 0 mtu 1500 ether 00:15:f2:44:2d:01 media: Ethernet 10baseT/UTP plip0: ... lo0: ... If I try: #ping 10.0.2.2 (the gateway) all packets are lost. For this reason, I've tryed a static IP configuration like this: IP 10.0.2.4/255.255.255.0 gateway 10.0.2.2 nameserver 10.0.2.3 but the gateway does not respond. So it is useless to try with a second guest system. Please help. Sorry for my bad english. Marco From ed at 80386.nl Wed May 13 17:52:26 2009 From: ed at 80386.nl (Ed Schouten) Date: Thu May 14 22:11:59 2009 Subject: PERFORCE change 161987 for review In-Reply-To: <4A09EF38.6060407@elischer.org> References: <200905121848.n4CImKQt036691@repoman.freebsd.org> <4A09EF38.6060407@elischer.org> Message-ID: <20090513175223.GG58540@hoeg.nl> * Julian Elischer wrote: > tun needs it's /dev entres, so can not be 'renumbered' (in the > base sense) until we somehow add vimage support to devfs. > however having tun3 in one vimage and tun4 in another would still > be pretty ok I think. So I think the modes wanted would be: It's the same with pts(4) right now. Be sure to prevent tun entries from being opened from a different jail, though. -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-virtualization/attachments/20090513/001f16a2/attachment.pgp From jilles at stack.nl Thu May 14 18:15:04 2009 From: jilles at stack.nl (Jilles Tjoelker) Date: Thu May 14 22:12:00 2009 Subject: Hierarchical jails In-Reply-To: <4A0C5112.9010103@FreeBSD.org> References: <4A051DE3.30705@FreeBSD.org> <4A0C5112.9010103@FreeBSD.org> Message-ID: <20090514181446.GA42264@stack.nl> On Thu, May 14, 2009 at 11:12:50AM -0600, Jamie Gritton wrote: > There's still a change to offer your input on the new jails before they > go in! OK, given the lack of response so far, it's less "still a > chance" than "please?". Current plans are to have this in place for > 8.0, with connections to the ongoing Vimage work. Hopefully the silence > is approval, and commits will likely be appearing soon. I have not tried this, but I think this patch may allow jailed roots to escape. The problem is that there is only one fd_jdir. The escape would go like: jailed root creates a new jail in a subdirectory, opens its / and sends the fd to a process in the new jail via a unix domain socket. When the process calls fchdir on the fd, it will be able to access .. normally. With nested chroot, or chroot in jail, this is not possible, because fd_jdir always contains the first jail or chroot done and will not allow escaping from it; however, root in a level 2 chroot can escape back to level 1 using chroot. -- Jilles Tjoelker From julian at elischer.org Fri May 15 07:26:33 2009 From: julian at elischer.org (Julian Elischer) Date: Fri May 15 07:26:51 2009 Subject: Hierarchical jails In-Reply-To: <20090514181446.GA42264@stack.nl> References: <4A051DE3.30705@FreeBSD.org> <4A0C5112.9010103@FreeBSD.org> <20090514181446.GA42264@stack.nl> Message-ID: <4A0D1927.8090303@elischer.org> Jilles Tjoelker wrote: > On Thu, May 14, 2009 at 11:12:50AM -0600, Jamie Gritton wrote: >> There's still a change to offer your input on the new jails before they >> go in! OK, given the lack of response so far, it's less "still a >> chance" than "please?". Current plans are to have this in place for >> 8.0, with connections to the ongoing Vimage work. Hopefully the silence >> is approval, and commits will likely be appearing soon. > > I have not tried this, but I think this patch may allow jailed roots to > escape. The problem is that there is only one fd_jdir. The escape would > go like: jailed root creates a new jail in a subdirectory, opens its / > and sends the fd to a process in the new jail via a unix domain socket. > When the process calls fchdir on the fd, it will be able to access .. > normally. > > With nested chroot, or chroot in jail, this is not possible, because > fd_jdir always contains the first jail or chroot done and will not allow > escaping from it; however, root in a level 2 chroot can escape back to > level 1 using chroot. > this is the old chroot escape. it is well known and methods exist to stop it. I can not say what is done here, but your post does remind me to add this to the list of things we need to keep in mind. From venture37 at gmail.com Sun May 17 14:41:55 2009 From: venture37 at gmail.com (Sevan / Venture37) Date: Sun May 17 14:42:00 2009 Subject: Kernel Compiled with options VIMAGE panics on boot In-Reply-To: <200905110409.04612.zec@icir.org> References: <4A06E7B0.10600@gmail.com> <200905110409.04612.zec@icir.org> Message-ID: <4A10222D.8080100@gmail.com> Marko Zec wrote: > On Sunday 10 May 2009 16:41:52 Sevan / Venture37 wrote: >> Hi >> I've installed a fresh copy of this months snapshot, updated src via >> cvsup, made a copy of the GENERIC config file, added 'options VIMAGE' & >> 'nooptions SCTP' & compiled & installed it, on boot the system panics >> with the error: >> panic: in /usr/src/sys/net/if.c:485 if_alloc() >> vnet=0 curvnet=0 >> cpuid = 0 >> >> photo of panic: >> http://img18.imageshack.us/img18/3297/img1057e.jpg >> >> Any ideas?? >> >> /usr/src/sys/net/if.c is v1.328 if that helps. > > It seems that the USB code should set the curvnet context when attaching and > detaching ifnets (rum0 in your case), which it currently does not. I'll look > into this in the next few days - thanks for the report! > > Marko I've tried to compile a new kernel once again after updating src, this time it bombs out during the build process. http://img33.imageshack.us/img33/6164/img1064.jpg Sevan / Venture37 From bzeeb-lists at lists.zabbadoz.net Sun May 17 14:49:51 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Sun May 17 14:49:58 2009 Subject: Kernel Compiled with options VIMAGE panics on boot In-Reply-To: <4A10222D.8080100@gmail.com> References: <4A06E7B0.10600@gmail.com> <200905110409.04612.zec@icir.org> <4A10222D.8080100@gmail.com> Message-ID: <20090517144647.R72053@maildrop.int.zabbadoz.net> On Sun, 17 May 2009, Sevan / Venture37 wrote: Hi, > I've tried to compile a new kernel once again after updating src, this time > it bombs out during the build process. > > http://img33.imageshack.us/img33/6164/img1064.jpg yes, we are aware of that one and the patch is easy and both Marko and I have it but the commit, that introduced this compile time regression for VIMAGE, also introduced a regression for the !VIMAGE && !VIMAGE_GLOBALS case that we are currently trying to indentify. Here's the patch you want to apply (pasted in) to make things compile again. Index: sys/netinet/in.c =================================================================== --- sys/netinet/in.c (revision 192250) +++ sys/netinet/in.c (working copy) @@ -814,6 +814,7 @@ in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in *sin, int scrub) { + INIT_VNET_NET(ifp->if_vnet); INIT_VNET_INET(ifp->if_vnet); register u_long i = ntohl(sin->sin_addr.s_addr); struct sockaddr_in oldaddr; @@ -1007,6 +1008,7 @@ static int in_scrubprefix(struct in_ifaddr *target) { + INIT_VNET_NET(curvnet); INIT_VNET_INET(curvnet); struct in_ifaddr *ia; struct in_addr prefix, mask, p; /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From julian at elischer.org Sun May 17 18:40:57 2009 From: julian at elischer.org (Julian Elischer) Date: Sun May 17 18:41:03 2009 Subject: Kernel Compiled with options VIMAGE panics on boot In-Reply-To: <20090517144647.R72053@maildrop.int.zabbadoz.net> References: <4A06E7B0.10600@gmail.com> <200905110409.04612.zec@icir.org> <4A10222D.8080100@gmail.com> <20090517144647.R72053@maildrop.int.zabbadoz.net> Message-ID: <4A105736.5080505@elischer.org> Bjoern A. Zeeb wrote: > On Sun, 17 May 2009, Sevan / Venture37 wrote: > > Hi, > >> I've tried to compile a new kernel once again after updating src, this >> time it bombs out during the build process. >> >> http://img33.imageshack.us/img33/6164/img1064.jpg > > yes, we are aware of that one and the patch is easy and both Marko and > I have it but the commit, that introduced this compile time regression > for VIMAGE, also introduced a regression for the !VIMAGE && > !VIMAGE_GLOBALS case that we are currently trying to indentify. > > Here's the patch you want to apply (pasted in) to make things compile > again. BTW Marko is offline for 3 days. > > > Index: sys/netinet/in.c > =================================================================== > --- sys/netinet/in.c (revision 192250) > +++ sys/netinet/in.c (working copy) > @@ -814,6 +814,7 @@ > in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in > *sin, > int scrub) > { > + INIT_VNET_NET(ifp->if_vnet); > INIT_VNET_INET(ifp->if_vnet); > register u_long i = ntohl(sin->sin_addr.s_addr); > struct sockaddr_in oldaddr; > @@ -1007,6 +1008,7 @@ > static int > in_scrubprefix(struct in_ifaddr *target) > { > + INIT_VNET_NET(curvnet); > INIT_VNET_INET(curvnet); > struct in_ifaddr *ia; > struct in_addr prefix, mask, p; > > > /bz > From julian at elischer.org Tue May 19 20:31:35 2009 From: julian at elischer.org (Julian Elischer) Date: Tue May 19 20:31:45 2009 Subject: svn commit: r192351 - head/sys/netinet In-Reply-To: <200905191330.54024.jhb@freebsd.org> References: <200905182234.n4IMYifY077079@svn.freebsd.org> <200905190819.12407.jhb@freebsd.org> <4A12E85B.7050107@elischer.org> <200905191330.54024.jhb@freebsd.org> Message-ID: <4A131726.6010003@elischer.org> John Baldwin wrote: > On Tuesday 19 May 2009 1:11:55 pm Julian Elischer wrote: >> John Baldwin wrote: >>> On Monday 18 May 2009 6:34:44 pm Bjoern A. Zeeb wrote: >>>> Author: bz >>>> Date: Mon May 18 22:34:44 2009 >>>> New Revision: 192351 >>>> URL: http://svn.freebsd.org/changeset/base/192351 >>>> >>>> Log: >>>> Revert the logical change of r192341. >>>> >>>> net.inet.ip.fw.one_pass is a classic ip_input.c variable and is used in >>>> the pfil and bridge code as well. As ipfw is loadable we need to always >>>> provide it. That is the reason why it lives in struct vnet_inet and >>>> not in struct vnet_ipfw. >>> Gah, I had thought I had seen it in vnet_ipfw when adding > default_to_accept >>> (as at first I had looked into making default_to_accept per-image but >>> tunables + VIMAGE don't mix). >> we need to look at this.. what does it MEAN to have a tunable and >> multiple images? my guess is that normal tunables are only valid for >> teh base image, but that one might have a way to set the 'tunables' >> for one's child images.. possibly by setting them in one's environment? > > Do you have a kernel environment per vimage? If not, you could still have > per-vimage variables that are settable via tunables look at kenv during > vimage creation to parse any tunables perhaps. However, that is possibly > tricky since you can sometimes use sysctl.conf to override a setting done via > loader.conf and in that case, what value should a new vimage get > One could make the argument that tunables are set from outside the base jail (i.e. at boot), and that the equivalent should exist for each image/jail, where what is outside the jail is the parent jail. We do not have a kernel environment per jail, but I think that is because we haven't thought of it until now. I'd suggest that just as you inherit new environment values from a parent process, you could inherrit a 'changed' kernel environment from a parent image, and in fact a parent might want to send you differnet vale of something (e.g. linux uname value). :-) The From freebsd-virtualization at dino.sk Mon May 25 13:16:53 2009 From: freebsd-virtualization at dino.sk (Milan Obuch) Date: Mon May 25 13:17:01 2009 Subject: Panic in netgraph with VIMAGE Message-ID: <200905251506.27771.freebsd-virtualization@dino.sk> Hi, there is some bug in (most probably) netgraph code. I did fresh csup and rebuild today. Whenever I try to turn bluetooth on (equivalent to plugging in the dongle), panic occurs: ubt0: on usbus3 panic: in /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 ng_make_node_common() vnet=0 curvnet=0 cpuid = 0 This does not occur with kernel from sources three days old. Part from core.txt file: #0 doadump () at pcpu.h:246 246<--->pcpu.h: No such file or directory. <------>in pcpu.h (kgdb) #0 doadump () at pcpu.h:246 #1 0xc0554e0e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 #2 0xc05550e2 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:576 #3 0xc0b947c1 in ng_make_node_common (type=0xc0b8f9a0, nodepp=0xc416b3a8) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 #4 0xc0b8bcc4 in ubt_attach (dev=0xc4294280) at /usr/src/sys/modules/netgraph/bluetooth/ubt/../../../../dev/usb/bluetooth/ng_ubt.c:443 #5 0xc057dcbf in device_attach (dev=0xc4294280) at device_if.h:178 #6 0xc057e88e in device_probe_and_attach (dev=0xc4294280) at /usr/src/sys/kern/subr_bus.c:2473 #7 0xc0b38240 in usb2_probe_and_attach_sub (udev=0xc41fd800, uaa=0xe4116c1c) at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1131 #8 0xc0b3871a in usb2_probe_and_attach (udev=0xc41fd800, iface_index=255 '?') at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1288 #9 0xc0b40ff0 in uhub_explore (udev=0xc3f07000) at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_hub.c:218 #10 0xc0b31f29 in usb2_bus_explore (pm=0xc3ed0dd4) at /usr/src/sys/modules/usb/usb/../../../dev/usb/controller/usb_controller.c:215 #11 0xc0b4343a in usb2_process (arg=0xc3ed0d74) at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_process.c:139 #12 0xc0530008 in fork_exit (callout=0xc0b43360 ,. arg=0xc3ed0d74, frame=0xe4116d38) at /usr/src/sys/kern/kern_fork.c:830 #13 0xc070b550 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 At line 634 in ng_base.c, there is INIT_VNET_NETGRAPH(curvnet); I have options VIMAGE in my kernel config (actually this is first one succesfully compiled with mentioned option, but I did not try it too often, it just failed to compile before). Now I recompiled kernel again, this time without options VIMAGE in config, and panic does not occur. So the original problem is INIT_VNET_NETGRAPH implementation in presence of options VIMAGE in kernel config. If anyone has anything to test, please let me know. Regards, Milan From zec at icir.org Mon May 25 13:48:11 2009 From: zec at icir.org (Marko Zec) Date: Mon May 25 13:48:18 2009 Subject: Panic in netgraph with VIMAGE In-Reply-To: <200905251506.27771.freebsd-virtualization@dino.sk> References: <200905251506.27771.freebsd-virtualization@dino.sk> Message-ID: <200905251547.59432.zec@icir.org> On Monday 25 May 2009 15:06:27 Milan Obuch wrote: > Hi, > there is some bug in (most probably) netgraph code. I did fresh csup and > rebuild today. Whenever I try to turn bluetooth on (equivalent to plugging > in the dongle), panic occurs: > > ubt0: 2.00/31.64, addr 2> on usbus3 > panic: > in /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 > ng_make_node_common() > vnet=0 curvnet=0 > cpuid = 0 > > This does not occur with kernel from sources three days old. This is a known problem related to curvnet context not being set by the USB device attach code - I have to lurk / shop around for some cheap USB ethernet or bt devices to be able to reproduce & fix this locally, the alternative would be wild guessing and planting context setting macros at random places in the USB code, i.e. without testing, which I'm reluctant to do. Marko > Part from core.txt file: > > #0 doadump () at pcpu.h:246 > 246<--->pcpu.h: No such file or directory. > <------>in pcpu.h > (kgdb) #0 doadump () at pcpu.h:246 > #1 0xc0554e0e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 > #2 0xc05550e2 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:576 > #3 0xc0b947c1 in ng_make_node_common (type=0xc0b8f9a0, nodepp=0xc416b3a8) > at > /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 #4 > 0xc0b8bcc4 in ubt_attach (dev=0xc4294280) > > at > /usr/src/sys/modules/netgraph/bluetooth/ubt/../../../../dev/usb/bluetooth/n >g_ubt.c:443 #5 0xc057dcbf in device_attach (dev=0xc4294280) at > device_if.h:178 #6 0xc057e88e in device_probe_and_attach (dev=0xc4294280) > at /usr/src/sys/kern/subr_bus.c:2473 > #7 0xc0b38240 in usb2_probe_and_attach_sub (udev=0xc41fd800, > uaa=0xe4116c1c) at > /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1131 #8 > 0xc0b3871a in usb2_probe_and_attach (udev=0xc41fd800, iface_index=255 '?') > at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1288 #9 > 0xc0b40ff0 in uhub_explore (udev=0xc3f07000) > at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_hub.c:218 > #10 0xc0b31f29 in usb2_bus_explore (pm=0xc3ed0dd4) > > at > /usr/src/sys/modules/usb/usb/../../../dev/usb/controller/usb_controller.c:2 >15 #11 0xc0b4343a in usb2_process (arg=0xc3ed0d74) > at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_process.c:139 > #12 0xc0530008 in fork_exit (callout=0xc0b43360 ,. > arg=0xc3ed0d74, frame=0xe4116d38) at /usr/src/sys/kern/kern_fork.c:830 > #13 0xc070b550 in fork_trampoline () at > /usr/src/sys/i386/i386/exception.s:270 > > At line 634 in ng_base.c, there is > > INIT_VNET_NETGRAPH(curvnet); > > I have options VIMAGE in my kernel config (actually this is first one > succesfully compiled with mentioned option, but I did not try it too often, > it just failed to compile before). > > Now I recompiled kernel again, this time without options VIMAGE in config, > and panic does not occur. > > So the original problem is INIT_VNET_NETGRAPH implementation in presence of > options VIMAGE in kernel config. If anyone has anything to test, please let > me know. > > Regards, > Milan > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscribe@freebsd.org" From julian at elischer.org Mon May 25 18:16:48 2009 From: julian at elischer.org (Julian Elischer) Date: Mon May 25 18:16:55 2009 Subject: Panic in netgraph with VIMAGE In-Reply-To: <200905251547.59432.zec@icir.org> References: <200905251506.27771.freebsd-virtualization@dino.sk> <200905251547.59432.zec@icir.org> Message-ID: <4A1AE08E.5010007@elischer.org> Marko Zec wrote: > On Monday 25 May 2009 15:06:27 Milan Obuch wrote: >> Hi, >> there is some bug in (most probably) netgraph code. I did fresh csup and >> rebuild today. Whenever I try to turn bluetooth on (equivalent to plugging >> in the dongle), panic occurs: >> >> ubt0: > 2.00/31.64, addr 2> on usbus3 >> panic: >> in /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 >> ng_make_node_common() >> vnet=0 curvnet=0 >> cpuid = 0 >> >> This does not occur with kernel from sources three days old. > > This is a known problem related to curvnet context not being set by the USB > device attach code - I have to lurk / shop around for some cheap USB ethernet > or bt devices to be able to reproduce & fix this locally, the alternative > would be wild guessing and planting context setting macros at random places > in the USB code, i.e. without testing, which I'm reluctant to do. > it probably requires someone who knows the bluetooth and usb-ethernet code to decide how this is done. It seems to me that the bluetooth stuff should probably just always set itself to the base (default) vimage, as it has many kinds of devices that are not really 'interfaces' so to speak and probably deserve to be in the base virtual machine. It does have SOME interface type devices in theory but I don't know if they are supported. Maksim, in vimage, before yo call teh netgraph code, the mbuf should have an interface pointer and that in turn should have a pointer to the vimage.. Alternatively, the thread coming into netgraph should run code from vimage.h that sets the current image for that thread. can you suggest places that this may occur? > Marko > > >> Part from core.txt file: >> >> #0 doadump () at pcpu.h:246 >> 246<--->pcpu.h: No such file or directory. >> <------>in pcpu.h >> (kgdb) #0 doadump () at pcpu.h:246 >> #1 0xc0554e0e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 >> #2 0xc05550e2 in panic (fmt=Variable "fmt" is not available. >> ) at /usr/src/sys/kern/kern_shutdown.c:576 >> #3 0xc0b947c1 in ng_make_node_common (type=0xc0b8f9a0, nodepp=0xc416b3a8) >> at >> /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 #4 >> 0xc0b8bcc4 in ubt_attach (dev=0xc4294280) >> >> at >> /usr/src/sys/modules/netgraph/bluetooth/ubt/../../../../dev/usb/bluetooth/n >> g_ubt.c:443 #5 0xc057dcbf in device_attach (dev=0xc4294280) at >> device_if.h:178 #6 0xc057e88e in device_probe_and_attach (dev=0xc4294280) >> at /usr/src/sys/kern/subr_bus.c:2473 >> #7 0xc0b38240 in usb2_probe_and_attach_sub (udev=0xc41fd800, >> uaa=0xe4116c1c) at >> /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1131 #8 >> 0xc0b3871a in usb2_probe_and_attach (udev=0xc41fd800, iface_index=255 '?') >> at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_device.c:1288 #9 >> 0xc0b40ff0 in uhub_explore (udev=0xc3f07000) >> at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_hub.c:218 >> #10 0xc0b31f29 in usb2_bus_explore (pm=0xc3ed0dd4) >> >> at >> /usr/src/sys/modules/usb/usb/../../../dev/usb/controller/usb_controller.c:2 >> 15 #11 0xc0b4343a in usb2_process (arg=0xc3ed0d74) >> at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_process.c:139 >> #12 0xc0530008 in fork_exit (callout=0xc0b43360 ,. >> arg=0xc3ed0d74, frame=0xe4116d38) at /usr/src/sys/kern/kern_fork.c:830 >> #13 0xc070b550 in fork_trampoline () at >> /usr/src/sys/i386/i386/exception.s:270 >> >> At line 634 in ng_base.c, there is >> >> INIT_VNET_NETGRAPH(curvnet); >> >> I have options VIMAGE in my kernel config (actually this is first one >> succesfully compiled with mentioned option, but I did not try it too often, >> it just failed to compile before). >> >> Now I recompiled kernel again, this time without options VIMAGE in config, >> and panic does not occur. >> >> So the original problem is INIT_VNET_NETGRAPH implementation in presence of >> options VIMAGE in kernel config. If anyone has anything to test, please let >> me know. >> >> Regards, >> Milan >> _______________________________________________ >> freebsd-virtualization@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization >> To unsubscribe, send any mail to >> "freebsd-virtualization-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" From emax at freebsd.org Tue May 26 02:46:21 2009 From: emax at freebsd.org (Maksim Yevmenkin) Date: Tue May 26 06:11:31 2009 Subject: Panic in netgraph with VIMAGE In-Reply-To: <4A1AE08E.5010007@elischer.org> References: <200905251506.27771.freebsd-virtualization@dino.sk> <200905251547.59432.zec@icir.org> <4A1AE08E.5010007@elischer.org> Message-ID: On Mon, May 25, 2009 at 11:16 AM, Julian Elischer wrote: > Marko Zec wrote: >> >> On Monday 25 May 2009 15:06:27 Milan Obuch wrote: >>> >>> Hi, >>> there is some bug in (most probably) netgraph code. I did fresh csup and >>> rebuild today. Whenever I try to turn bluetooth on (equivalent to >>> plugging >>> in the dongle), panic occurs: >>> >>> ubt0: >> 2.00/31.64, addr 2> on usbus3 >>> panic: >>> in /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:634 >>> ng_make_node_common() >>> ?vnet=0 curvnet=0 >>> cpuid = 0 >>> >>> This does not occur with kernel from sources three days old. >> >> This is a known problem related to curvnet context not being set by the >> USB device attach code - I have to lurk / shop around for some cheap USB >> ethernet or bt devices to be able to reproduce & fix this locally, the >> alternative would be wild guessing and planting context setting macros at >> random places in the USB code, i.e. without testing, which I'm reluctant to >> do. >> > > it probably requires someone who knows the bluetooth and usb-ethernet > code to decide how this is done. > > It seems to me that the bluetooth stuff should probably just always set > itself to the base (default) vimage, as it has many kinds of devices that > are not really 'interfaces' so to speak and probably deserve to be in the > base virtual machine. > It does have SOME interface type devices in theory but I don't know if they > are supported. > > Maksim, in vimage, before yo call teh netgraph code, the mbuf should have an > interface pointer and that in turn should have a pointer to the vimage.. > Alternatively, the thread coming into netgraph should run code from vimage.h > that sets the current image for that thread. ?can you suggest places that > this may occur? hmm... i do not really know anything about vimage (yet :), but the call to INIT_VNET_NETGRAPH() in ng_make_node_common() seems (to me) out of place. from what i understand, ng_make_node_common() called on all sorts of nodes. some of those are not even network related. it seems to me that network related netgraph nodes (ng_(e)iface, ng_(k)socket, etc.) obviously should set vimage etc. pointer, however for the rest of the nodes some reasonable defaults should be used. as far as setting interface pointer in mbuf its going to be tricky. bluetooth devices do not associated with any network interface, so i'm not sure how to do it. i will need to study the code for a little bit before i can make any intelligent suggestions. thanks, max From nvass9573 at gmx.com Thu May 28 11:27:47 2009 From: nvass9573 at gmx.com (Nikos Vassiliadis) Date: Thu May 28 11:27:54 2009 Subject: panic with option VIMAGE + PPPoE Message-ID: <4A1E7501.7090308@gmx.com> Hi, I am seeing the following panic trying to use PPPoE. > (kgdb) bt > #0 doadump () at pcpu.h:246 > #1 0xc085a77e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 > #2 0xc085aa52 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:576 > #3 0xc04ba317 in db_panic (addr=Could not find the frame base for "db_panic". > ) at /usr/src/sys/ddb/db_command.c:478 > #4 0xc04ba941 in db_command (last_cmdp=0xc0cf9cdc, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 > #5 0xc04baa9a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 > #6 0xc04bc8fd in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:229 > #7 0xc0889256 in kdb_trap (type=3, code=0, tf=0xc781ab18) at /usr/src/sys/kern/subr_kdb.c:534 > #8 0xc0b182bb in trap (frame=0xc781ab18) at /usr/src/sys/i386/i386/trap.c:685 > #9 0xc0afabeb in calltrap () at /usr/src/sys/i386/i386/exception.s:165 > #10 0xc08893da in kdb_enter (why=0xc0be3714 "panic", msg=0xc0be3714 "panic") at cpufunc.h:71 > #11 0xc085aa36 in panic (fmt=0xc2718513 "in %s:%d %s()\n vnet=%p curvnet=%p") at /usr/src/sys/kern/kern_shutdown.c:559 > #12 0xc27114c8 in ng_ID2noderef (ID=4) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:816 > #13 0xc2712202 in ng_address_ID (here=0xc2666380, item=0xc27ade00, ID=4, retaddr=0) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3547 > #14 0xc27a07b0 in ng_ether_rcvmsg (node=0xc2666380, item=0xc27ade00, lasthook=0x0) at /usr/src/sys/modules/netgraph/ether/../../../netgraph/ng_ether.c:596 > #15 0xc271327d in ng_apply_item (node=0xc2666380, item=0xc27ade00, rw=1) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2417 > #16 0xc27146f6 in ngthread (arg=0x0) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3340 > #17 0xc08342c8 in fork_exit (callout=0xc2714570 , arg=0x0, frame=0xc781ad38) at /usr/src/sys/kern/kern_fork.c:830 > #18 0xc0afac60 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 > (kgdb) list ng_base.c:816 > 811 Node ID handling > 812 ************************************************************************/ > 813 static node_p > 814 ng_ID2noderef(ng_ID_t ID) > 815 { > 816 INIT_VNET_NETGRAPH(curvnet); > 817 node_p node; > 818 mtx_lock(&ng_idhash_mtx); > 819 NG_IDHASH_FIND(ID, node); > 820 if(node) > (kgdb) I use the following ppp.conf: > lab# cat /etc/ppp/ppp.conf > pppoe: > set device PPPoE:le1 > set authname "xxxx@xxxx" > set authkey "xxxxxxx" > enable lqr echo > set cd 5 > set dial > set login > set redial 0 0 > enable dns > add default HISADDR > > lab# To reproduce the panic, you have to use the above ppp.conf, start ppp(8), and then in the ppp "console" type "open". It panics very early, that is, you don't need a PPPoE "server" to reproduce it. My kernel is GENERIC + VIMAGE - SCTP from a few days old -CURRENT. Just reporting, Nikos From svein-listmail at stillbilde.net Fri May 29 11:48:12 2009 From: svein-listmail at stillbilde.net (Svein Skogen (listmail accont)) Date: Fri May 29 11:48:19 2009 Subject: Not sure but: VMWare PVSCSI? Message-ID: <4A1FC702.4080100@stillbilde.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've noticed that VMWare ESXi 4 introduces pvscsi, and that the openvm-tools already has the driver module for linux. Have anybody started porting this to FreeBSD yet? //Svein - -- - --------+-------------------+------------------------------- /"\ |Svein Skogen | svein@d80.iso100.no \ / |Solberg ?stli 9 | PGP Key: 0xE5E76831 X |2020 Skedsmokorset | svein@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | svein@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listmail@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +-------------------+------------------------------- |msn messenger: | Mobile Phone: +47 907 03 575 |svein@jernhuset.no | RIPE handle: SS16503-RIPE - --------+-------------------+------------------------------- Picture Gallery: https://gallery.stillbilde.net/v/svein/ - ------------------------------------------------------------ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkofxwIACgkQODUnwSLUlKSs3gCfTxM52k/PJCis1oF91n/Kmix5 IaAAnA1yCUPPUkPw2AKsew50WtDKhz2j =7oyf -----END PGP SIGNATURE----- From 000.fbsd at quip.cz Sun May 31 13:59:10 2009 From: 000.fbsd at quip.cz (Miroslav Lachman) Date: Sun May 31 13:59:17 2009 Subject: problem with time and cronjobs in Qemu guest In-Reply-To: <4A096517.7020104@quip.cz> References: <4A096517.7020104@quip.cz> Message-ID: <4A228D29.30804@quip.cz> Miroslav Lachman wrote: > Hi, > > I have Qemu 0.10.1 installed on my old Windows 2000 PC and I am running > FreeBSD 7.2-RC1 i386 in it for some testing purposes. Today I realized > that some cron task was not run at specified time nor some commands in > endless loop. > > This is simple example of strange behavior (written in tcsh shell) > root@firstbsd ~/# while 1 > while?date > while?sleep 10 > while?end > Tue May 12 13:43:20 CEST 2009 > Tue May 12 13:43:58 CEST 2009 > Tue May 12 13:44:35 CEST 2009 > Tue May 12 13:45:12 CEST 2009 > Tue May 12 13:45:50 CEST 2009 > Tue May 12 13:46:27 CEST 2009 > Tue May 12 13:47:04 CEST 2009 > Tue May 12 13:47:42 CEST 2009 > Tue May 12 13:48:20 CEST 2009 > Tue May 12 13:48:57 CEST 2009 > Tue May 12 13:49:37 CEST 2009 > Tue May 12 13:50:19 CEST 2009 > Tue May 12 13:50:58 CEST 2009 > Tue May 12 13:51:37 CEST 2009 > > As you can see - the date command in not executed every 10 seconds. > > Are there some settings (in FreeBSD guest or in Qemu itself) to fix this? It is better after adding kern.hz="50" in to loader.conf, but still not perfect: root@firstbsd ~/# while 1 while?date while?sleep 10 while?end Sun May 31 15:55:13 CEST 2009 Sun May 31 15:55:23 CEST 2009 Sun May 31 15:55:34 CEST 2009 Sun May 31 15:55:46 CEST 2009 Sun May 31 15:56:06 CEST 2009 Sun May 31 15:56:17 CEST 2009 Sun May 31 15:56:28 CEST 2009 Sun May 31 15:56:39 CEST 2009 Sun May 31 15:56:49 CEST 2009 Sun May 31 15:57:00 CEST 2009 any other workaround? Miroslav Lachman