From ed at 80386.nl Fri Aug 1 11:39:37 2008 From: ed at 80386.nl (Ed Schouten) Date: Fri Aug 1 11:39:49 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday Message-ID: <20080801113935.GM99951@hoeg.nl> Hi all, One month ago I sent a schedule to the lists about the MPSAFE TTY code I'm working on. It contained the following: * Ed Schouten wrote: > August 3 2008: > Disconnect drivers from the build that haven't been patched in > the MPSAFE TTY branch. This means I'm going to disconnect these drivers on Sunday. I posted a list of drivers some time ago. The list of drivers is a little different than what I had posted: - I omitted ppp(4) and sl(4) on purpose, because I expected they would already have been disconnected by this time (IFF_NEEDSGIANT). - It seems I forgot to mention ucycom(4) and ufoma(4). These have not been ported to the new TTY layer. This means the complete list of drivers is: | USB: ubser(4), ucycom(4), ufoma(4) | ISA/PCI: cx(4), cy(4), digi(4), rc(4), rp(4), si(4), sio(4) | Line disciplines: ng_h4(4), ng_tty(4), ppp(4), sl(4), snp(4) There are a couple of important things to mention here: - Some line disciplines (ng_h4(4), ng_tty(4) and snp(4)) will be restored in the future. After the new TTY code has been imported, a hooks interface shall be developed, which will allow these drivers to work once again. - PC98 still uses the sio(4) driver. I've decided not to touch PC98 at this moment. I'll contact the PC98 folks one of these days, to see if we can already perform a partial migration to uart(4). Wrapping up, I'd like to say I really hope we can one day see these drivers reappear in FreeBSD. Fortunately we've still got a long time before 8.0-RELEASE. Yours, -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080801/571e7261/attachment.pgp From ed at 80386.nl Fri Aug 1 11:40:55 2008 From: ed at 80386.nl (Ed Schouten) Date: Fri Aug 1 11:41:02 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080801113935.GM99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> Message-ID: <20080801114053.GN99951@hoeg.nl> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080801/d651b011/attachment.pgp From peterjeremy at optushome.com.au Fri Aug 1 12:49:15 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Fri Aug 1 12:49:40 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080801114053.GN99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> <20080801114053.GN99951@hoeg.nl> Message-ID: <20080801124845.GZ1359@server.vk2pj.dyndns.org> On 2008-Aug-01 13:40:53 +0200, Ed Schouten wrote: >One of the most important things I forgot to mention: I've attached the >patch I'm going to commit. Comments on the patch are very welcome! This patch just disconnects the majority of the serial drivers from the build. Whilst I support the aim of making the TTY subsystem MPSAFE, as I've previously stated, IMO, just disconnecting everything is not the way forward. On 2008-Jul-04 11:22:44 +0200, you wrote: >The digi(4) code shouldn't be very hard to port. As I said before, I am >considering making most drivers at least compile before the code hits >the tree, which should make it a lot easier for people to get their >things working again. This doesn't seem to have happened. On 2008-Jul-08 16:16:20 +0200, you wrote: >If time permits, I'll fix nmdm(4). I've also received some messages >about si(4) and digi(4), so I'll contact those people to see what we can >do here. I had indicated an interest in digi(4) but haven't heard anything further. On 2008-Jul-20 14:32:56 +0200, you wrote: >As usual, the latest mpsafetty patchset can be found here. I would >really appreciate it if I could get more reviews on the code. Thanks! > > http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ Looking through the latest patches (20080801), there is still no documentation explaining how to use the new interfaces. It looks like the only way to port a driver is to study the changes made to some other drivers and work out how to apply equivalent changes to the driver you are adapting. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080801/d72412c9/attachment.pgp From phk at phk.freebsd.dk Fri Aug 1 12:53:23 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri Aug 1 12:53:42 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: Your message of "Fri, 01 Aug 2008 22:48:45 +1000." <20080801124845.GZ1359@server.vk2pj.dyndns.org> Message-ID: <64491.1217595198@critter.freebsd.dk> In message <20080801124845.GZ1359@server.vk2pj.dyndns.org>, Peter Jeremy writes : >On 2008-Aug-01 13:40:53 +0200, Ed Schouten wrote: >>One of the most important things I forgot to mention: I've attached the >>patch I'm going to commit. Comments on the patch are very welcome! > >This patch just disconnects the majority of the serial drivers from >the build. Whilst I support the aim of making the TTY subsystem >MPSAFE, as I've previously stated, IMO, just disconnecting everything >is not the way forward. I got a syntax error on this email Peter, didn't you mean to write: "Great work Ed, let me send you some patches" A MPSAFE tty subsystem is infinitely more important than any particular non-console tty driver. If FreeBSD should have digi(4) support in the future somebody should spend some quality with the driver, instead of stopping Ed from making much necessary progress. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From jhb at freebsd.org Fri Aug 1 13:13:44 2008 From: jhb at freebsd.org (John Baldwin) Date: Fri Aug 1 13:13:50 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <64491.1217595198@critter.freebsd.dk> References: <64491.1217595198@critter.freebsd.dk> Message-ID: <200808010901.39545.jhb@freebsd.org> On Friday 01 August 2008 08:53:18 am Poul-Henning Kamp wrote: > In message <20080801124845.GZ1359@server.vk2pj.dyndns.org>, Peter Jeremy > writes > > >On 2008-Aug-01 13:40:53 +0200, Ed Schouten wrote: > >>One of the most important things I forgot to mention: I've attached the > >>patch I'm going to commit. Comments on the patch are very welcome! > > > >This patch just disconnects the majority of the serial drivers from > >the build. Whilst I support the aim of making the TTY subsystem > >MPSAFE, as I've previously stated, IMO, just disconnecting everything > >is not the way forward. > > I got a syntax error on this email Peter, didn't you mean to write: > > "Great work Ed, let me send you some patches" > > > A MPSAFE tty subsystem is infinitely more important than any particular > non-console tty driver. > > If FreeBSD should have digi(4) support in the future somebody should > spend some quality with the driver, instead of stopping Ed from > making much necessary progress. On the other hand, we didn't throw out half the NIC drivers when we did the MPSAFE network stack locking either, we allowed for a transition that gave time for individual drivers to be locked. -- John Baldwin From jhb at freebsd.org Fri Aug 1 13:13:47 2008 From: jhb at freebsd.org (John Baldwin) Date: Fri Aug 1 13:14:06 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080801113935.GM99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> Message-ID: <200808010904.50819.jhb@freebsd.org> On Friday 01 August 2008 07:39:35 am Ed Schouten wrote: > Hi all, > > One month ago I sent a schedule to the lists about the MPSAFE TTY code > I'm working on. It contained the following: > > * Ed Schouten wrote: > > August 3 2008: > > Disconnect drivers from the build that haven't been patched in > > the MPSAFE TTY branch. > > This means I'm going to disconnect these drivers on Sunday. I posted a > list of drivers some time ago. The list of drivers is a little different > than what I had posted: > > - I omitted ppp(4) and sl(4) on purpose, because I expected they would > already have been disconnected by this time (IFF_NEEDSGIANT). > > - It seems I forgot to mention ucycom(4) and ufoma(4). These have not > been ported to the new TTY layer. > > This means the complete list of drivers is: > | USB: ubser(4), ucycom(4), ufoma(4) > | ISA/PCI: cx(4), cy(4), digi(4), rc(4), rp(4), si(4), sio(4) > | Line disciplines: ng_h4(4), ng_tty(4), ppp(4), sl(4), snp(4) > > There are a couple of important things to mention here: > > - Some line disciplines (ng_h4(4), ng_tty(4) and snp(4)) will be > restored in the future. After the new TTY code has been imported, a > hooks interface shall be developed, which will allow these drivers to > work once again. > > - PC98 still uses the sio(4) driver. I've decided not to touch PC98 at > this moment. I'll contact the PC98 folks one of these days, to see if > we can already perform a partial migration to uart(4). > > Wrapping up, I'd like to say I really hope we can one day see these > drivers reappear in FreeBSD. Fortunately we've still got a long time > before 8.0-RELEASE. > > Yours, Note that one approach you can take is that even if you can't test patches for some of these drivers due to ENOHARDWARE, other users can. So you can still generate patches for drivers (make sure they compile) and then post them to current and stable to get them tested. I think it is more courteous to our users that way than to require them to be developers. And given my recent and continuing efforts with NIC drivers, I think I can safely say that I've already put my money where my mouth is on this one. However, it is probably far easier to provide patches for testing once the actual subsystem is in the tree rather than prior, so if the plan is to do that then I'm ok with it. There is something to be said, however, for the model used in the network stack where some hack shims were left in place to support non-updated drivers until they could be updated. I know I have an rp(4) card (but in use in a production box running 6.x) and from that I know other people also have rp(4) cards that I've talked with (and RocketPort even provides their own FreeBSD driver) for example. -- John Baldwin From peterjeremy at optushome.com.au Fri Aug 1 13:26:59 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Fri Aug 1 13:27:05 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <64491.1217595198@critter.freebsd.dk> References: <20080801124845.GZ1359@server.vk2pj.dyndns.org> <64491.1217595198@critter.freebsd.dk> Message-ID: <20080801132624.GG1359@server.vk2pj.dyndns.org> On 2008-Aug-01 12:53:18 +0000, Poul-Henning Kamp wrote: >I got a syntax error on this email Peter, didn't you mean to write: > >"Great work Ed, let me send you some patches" I would love to be able to send some patches. In order to do so, I need some information about how to interface with the MPSAFE TTY subsystem and how to adapt an existing driver. I am not the only person to have indicated a need for some hand-holding and I was under the impression that Ed would provide this but, to date, all I have is suggestions to look at patched drivers. >A MPSAFE tty subsystem is infinitely more important than any particular >non-console tty driver. OTOH, a piece of middle-ware that doesn't work with the underlying hardware drivers makes that hardware useless. MPSAFE is probably more important than any particular non-console driver but Ed is talking about disconnecting almost every non-console TTY driver. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080801/46b8e284/attachment.pgp From ed at 80386.nl Fri Aug 1 13:44:36 2008 From: ed at 80386.nl (Ed Schouten) Date: Fri Aug 1 13:44:49 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: <20080801132624.GG1359@server.vk2pj.dyndns.org> References: <20080801124845.GZ1359@server.vk2pj.dyndns.org> <64491.1217595198@critter.freebsd.dk> <20080801132624.GG1359@server.vk2pj.dyndns.org> Message-ID: <20080801134435.GQ99951@hoeg.nl> Hello Peter, * Peter Jeremy wrote: > On 2008-Aug-01 12:53:18 +0000, Poul-Henning Kamp wrote: > >I got a syntax error on this email Peter, didn't you mean to write: > > > >"Great work Ed, let me send you some patches" > > I would love to be able to send some patches. In order to do so, I > need some information about how to interface with the MPSAFE TTY > subsystem and how to adapt an existing driver. I am not the only > person to have indicated a need for some hand-holding and I was under > the impression that Ed would provide this but, to date, all I have is > suggestions to look at patched drivers. I guess things went wrong, because I probably confused you with Peter Wemm (I know - I'm bad with names), who inquired about si(4). I did send him a message some time ago, to see if we could make an appointment to discuss how we could get si(4) working. Even though I agree with you that we need more documentation on the TTY layer's internals, my opinion is that other people should have shown more interest from the start. When I sent a message a couple of weeks ago, I almost immediately got a response from Alexander Kabaev (kan@). He wanted to help me with the dcons(4) driver, which he did. I just said: take a look at what I did to uart(4) and the console drivers and gave him some random advice. He was able to send me an almost flawless diff in a matter of hours, which he committed to the mpsafetty branch himself! Maybe I'm replying to too many messages at the same time, but as John said, it's a lot easier making the remaining drivers work after the code has been integrated. It's not like we're permanently carving things into stone - we've almost got a full year to get it all working again. Yours, -- Ed Schouten WWW: http://80386.nl/ PS: The subject should have read "disconnected", not "connected". -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080801/44cc8613/attachment.pgp From phk at phk.freebsd.dk Fri Aug 1 13:45:28 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri Aug 1 13:45:43 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: Your message of "Fri, 01 Aug 2008 23:26:24 +1000." <20080801132624.GG1359@server.vk2pj.dyndns.org> Message-ID: <64677.1217598326@critter.freebsd.dk> In message <20080801132624.GG1359@server.vk2pj.dyndns.org>, Peter Jeremy writes : >OTOH, a piece of middle-ware that doesn't work with the underlying >hardware drivers makes that hardware useless. MPSAFE is probably more >important than any particular non-console driver but Ed is talking >about disconnecting almost every non-console TTY driver. And I'm right behind him. If digi(4) is important, somebody with hardware will fix it, if it isn't important it will not be fixed, and good riddance. I think you are being unreasonable, and I'll point you at an old post I made, rather than rehash the arguments again: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1863154+0+archive/2002/freebsd-current/20021006.freebsd-current Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Fri Aug 1 13:46:54 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri Aug 1 13:47:07 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: Your message of "Fri, 01 Aug 2008 09:01:39 -0400." <200808010901.39545.jhb@freebsd.org> Message-ID: <64694.1217598412@critter.freebsd.dk> In message <200808010901.39545.jhb@freebsd.org>, John Baldwin writes: >On Friday 01 August 2008 08:53:18 am Poul-Henning Kamp wrote: >> A MPSAFE tty subsystem is infinitely more important than any particular >> non-console tty driver. >> >> If FreeBSD should have digi(4) support in the future somebody should >> spend some quality with the driver, instead of stopping Ed from >> making much necessary progress. > >On the other hand, we didn't throw out half the NIC drivers when we did the >MPSAFE network stack locking either, we allowed for a transition that gave >time for individual drivers to be locked. First there are close to 10 years of difference in how relevant serial ports and ethernet interfaces are. Second, I think we could have saved if we had done just that :-) Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From peter at wemm.org Sat Aug 2 06:24:34 2008 From: peter at wemm.org (Peter Wemm) Date: Sat Aug 2 06:24:46 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: <20080801134435.GQ99951@hoeg.nl> References: <20080801124845.GZ1359@server.vk2pj.dyndns.org> <64491.1217595198@critter.freebsd.dk> <20080801132624.GG1359@server.vk2pj.dyndns.org> <20080801134435.GQ99951@hoeg.nl> Message-ID: On Fri, Aug 1, 2008 at 6:44 AM, Ed Schouten wrote: > Hello Peter, > > * Peter Jeremy wrote: >> On 2008-Aug-01 12:53:18 +0000, Poul-Henning Kamp wrote: >> >I got a syntax error on this email Peter, didn't you mean to write: >> > >> >"Great work Ed, let me send you some patches" >> >> I would love to be able to send some patches. In order to do so, I >> need some information about how to interface with the MPSAFE TTY >> subsystem and how to adapt an existing driver. I am not the only >> person to have indicated a need for some hand-holding and I was under >> the impression that Ed would provide this but, to date, all I have is >> suggestions to look at patched drivers. > > I guess things went wrong, because I probably confused you with Peter > Wemm (I know - I'm bad with names), who inquired about si(4). I did send > him a message some time ago, to see if we could make an appointment to > discuss how we could get si(4) working. > > Even though I agree with you that we need more documentation on the TTY > layer's internals, my opinion is that other people should have shown > more interest from the start. When I sent a message a couple of weeks > ago, I almost immediately got a response from Alexander Kabaev (kan@). > He wanted to help me with the dcons(4) driver, which he did. I just > said: take a look at what I did to uart(4) and the console drivers and > gave him some random advice. He was able to send me an almost flawless > diff in a matter of hours, which he committed to the mpsafetty branch > himself! > > Maybe I'm replying to too many messages at the same time, but as John > said, it's a lot easier making the remaining drivers work after the code > has been integrated. It's not like we're permanently carving things into > stone - we've almost got a full year to get it all working again. I'll be happy to work on si(4) once the code hits the tree. Don't let si(4) get in the way. I have actual hardware (and use it 24x7), so I've got some incentive. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From rwatson at FreeBSD.org Sun Aug 3 16:26:50 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Aug 3 16:27:08 2008 Subject: HEAD UP: non-MPSAFE network drivers to be disabled (was: 8.0 network stack MPsafety goals (fwd)) In-Reply-To: <20080630091033.P3968@fledge.watson.org> References: <20080524111715.T64552@fledge.watson.org> <20080629180126.F90836@fledge.watson.org> <20080630091033.P3968@fledge.watson.org> Message-ID: On Mon, 30 Jun 2008, Robert Watson wrote: > On Sun, 29 Jun 2008, Robert Watson wrote: > >> An FYI on the state of things here: in the last month, John has updated a >> number of device drivers to be MPSAFE, and the USB work remains in-flight. >> I'm holding fire a bit on disabling IFF_NEEDSGIANT while things settle and >> I catch up on driver state, and will likely send out an update next week >> regarding which device drivers remain on the kill list, and generally what >> the status of this project is. > > Here's the revised list of drivers that will have their build disabled in > the next week (subject to an appropriate block of time for me): A quick update: I had postponed removing IFF_NEEDSGIANT while awaiting the apparently forthcoming USB stack commit. Since it appears slow in coming, I will move ahead and disconnect non-USB drivers that require IFF_NEEDSGIANT in the coming week, but will leave the IFF_NEEDSGIANT infrastructure there, along with the current USB drivers that depend on it, until the USB merge is done. Robert N M Watson Computer Laboratory University of Cambridge From imp at bsdimp.com Sun Aug 3 17:32:14 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 3 17:32:26 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080801113935.GM99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> Message-ID: <20080803.112856.35218914.imp@bsdimp.com> In message: <20080801113935.GM99951@hoeg.nl> Ed Schouten writes: : Hi all, : : One month ago I sent a schedule to the lists about the MPSAFE TTY code : I'm working on. It contained the following: : : * Ed Schouten wrote: : > August 3 2008: : > Disconnect drivers from the build that haven't been patched in : > the MPSAFE TTY branch. : : This means I'm going to disconnect these drivers on Sunday. I posted a : list of drivers some time ago. The list of drivers is a little different : than what I had posted: : : - I omitted ppp(4) and sl(4) on purpose, because I expected they would : already have been disconnected by this time (IFF_NEEDSGIANT). : : - It seems I forgot to mention ucycom(4) and ufoma(4). These have not : been ported to the new TTY layer. : : This means the complete list of drivers is: : : | USB: ubser(4), ucycom(4), ufoma(4) : | ISA/PCI: cx(4), cy(4), digi(4), rc(4), rp(4), si(4), sio(4) : | Line disciplines: ng_h4(4), ng_tty(4), ppp(4), sl(4), snp(4) This is a lot of functionality to remove on such short notice. : There are a couple of important things to mention here: : : - Some line disciplines (ng_h4(4), ng_tty(4) and snp(4)) will be : restored in the future. After the new TTY code has been imported, a : hooks interface shall be developed, which will allow these drivers to : work once again. Can't you push off the import until these things are done. There have been too many empty promises in the past to accept this at face value. : - PC98 still uses the sio(4) driver. I've decided not to touch PC98 at : this moment. I'll contact the PC98 folks one of these days, to see if : we can already perform a partial migration to uart(4). : : Wrapping up, I'd like to say I really hope we can one day see these : drivers reappear in FreeBSD. Fortunately we've still got a long time : before 8.0-RELEASE. You are moving too fast on this. Please slow down. You promised documentation and such, but that hasn't happened, so a slowdown in your timeline is justified. thank you for your consideration. Warner From imp at bsdimp.com Sun Aug 3 17:32:16 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 3 17:32:37 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <64491.1217595198@critter.freebsd.dk> References: <20080801124845.GZ1359@server.vk2pj.dyndns.org> <64491.1217595198@critter.freebsd.dk> Message-ID: <20080803.113010.-1849554152.imp@bsdimp.com> In message: <64491.1217595198@critter.freebsd.dk> "Poul-Henning Kamp" writes: : In message <20080801124845.GZ1359@server.vk2pj.dyndns.org>, Peter Jeremy writes : : : >On 2008-Aug-01 13:40:53 +0200, Ed Schouten wrote: : >>One of the most important things I forgot to mention: I've attached the : >>patch I'm going to commit. Comments on the patch are very welcome! : > : >This patch just disconnects the majority of the serial drivers from : >the build. Whilst I support the aim of making the TTY subsystem : >MPSAFE, as I've previously stated, IMO, just disconnecting everything : >is not the way forward. : : I got a syntax error on this email Peter, didn't you mean to write: : : "Great work Ed, let me send you some patches" NO HE DID NOT. : A MPSAFE tty subsystem is infinitely more important than any particular : non-console tty driver. : : If FreeBSD should have digi(4) support in the future somebody should : spend some quality with the driver, instead of stopping Ed from : making much necessary progress. THINGS ARE GOING IN TOO FAST. PLEASE SLOW DOWN. This is great work, but it is premature at this time. Warner From imp at bsdimp.com Sun Aug 3 17:34:56 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 3 17:35:08 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: <20080801134435.GQ99951@hoeg.nl> References: <64491.1217595198@critter.freebsd.dk> <20080801132624.GG1359@server.vk2pj.dyndns.org> <20080801134435.GQ99951@hoeg.nl> Message-ID: <20080803.113253.-532680110.imp@bsdimp.com> In message: <20080801134435.GQ99951@hoeg.nl> Ed Schouten writes: : Maybe I'm replying to too many messages at the same time, but as John : said, it's a lot easier making the remaining drivers work after the code : has been integrated. It's not like we're permanently carving things into : stone - we've almost got a full year to get it all working again. We've heard this promise before. It rarely has resulted in the promised work. I don't think this is the right way forward. Warner From rwatson at FreeBSD.org Sun Aug 3 17:38:42 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Aug 3 17:38:48 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: <20080803.113253.-532680110.imp@bsdimp.com> References: <64491.1217595198@critter.freebsd.dk> <20080801132624.GG1359@server.vk2pj.dyndns.org> <20080801134435.GQ99951@hoeg.nl> <20080803.113253.-532680110.imp@bsdimp.com> Message-ID: On Sun, 3 Aug 2008, M. Warner Losh wrote: > In message: <20080801134435.GQ99951@hoeg.nl> > Ed Schouten writes: : Maybe I'm replying to too > many messages at the same time, but as John : said, it's a lot easier making > the remaining drivers work after the code : has been integrated. It's not > like we're permanently carving things into : stone - we've almost got a full > year to get it all working again. > > We've heard this promise before. It rarely has resulted in the promised > work. I don't think this is the right way forward. I have to admit that I was quite pleasantly surprised that so many network drivers became MPSAFE in the last couple of months, and USB will nicely round that out. It may help if Ed could take whatever notes he sent Peter about sio and drop them on the Wiki as the beginning of a howto on adapting drivers to the new tty layer. One thing I would really like to see is us taking this as an opportunity to get more people interested in maintaining the tty drivers we have -- now is a really good opportunity for people do adopt drivers and update them. I fall down a bit more on the Warner side here also: I think posting the docs in advance of removing the drivers, and having a couple of weeks gap between those events, is a much more productive way to achieve the goal of updated drivers. It also means that we don't have timeline gaps when serial drivers for devices are broken: when people are playing the binary search game to find some other problem, the last thing they want is to hit dates where their system consoles don't work. Robert N M Watson Computer Laboratory University of Cambridge From sam at freebsd.org Sun Aug 3 17:45:15 2008 From: sam at freebsd.org (Sam Leffler) Date: Sun Aug 3 17:45:27 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: <20080803.113253.-532680110.imp@bsdimp.com> References: <64491.1217595198@critter.freebsd.dk> <20080801132624.GG1359@server.vk2pj.dyndns.org> <20080801134435.GQ99951@hoeg.nl> <20080803.113253.-532680110.imp@bsdimp.com> Message-ID: <4895EE96.9070404@freebsd.org> M. Warner Losh wrote: > In message: <20080801134435.GQ99951@hoeg.nl> > Ed Schouten writes: > : Maybe I'm replying to too many messages at the same time, but as John > : said, it's a lot easier making the remaining drivers work after the code > : has been integrated. It's not like we're permanently carving things into > : stone - we've almost got a full year to get it all working again. > > We've heard this promise before. It rarely has resulted in the > promised work. I don't think this is the right way forward. I'll point out that my vap work languished for almost 3 years waiting for help to convert drivers. That help never really came; people did pitch in towards the end (not to belittle the help I received) but in the end it took sponsorship to see it into the tree. Even now there are drivers that are unfinished (ipw comes to mind) and much work still remains. I am solidly behind Ed and believe his plan is the correct one. Sam From imp at bsdimp.com Sun Aug 3 17:47:40 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 3 17:47:46 2008 Subject: Reminder: non-mpsafetty drivers to be *dis*connected on Sunday In-Reply-To: References: <20080801134435.GQ99951@hoeg.nl> <20080803.113253.-532680110.imp@bsdimp.com> Message-ID: <20080803.114438.2086213408.imp@bsdimp.com> In message: Robert Watson writes: : On Sun, 3 Aug 2008, M. Warner Losh wrote: : : > In message: <20080801134435.GQ99951@hoeg.nl> : > Ed Schouten writes: : Maybe I'm replying to too : > many messages at the same time, but as John : said, it's a lot easier making : > the remaining drivers work after the code : has been integrated. It's not : > like we're permanently carving things into : stone - we've almost got a full : > year to get it all working again. : > : > We've heard this promise before. It rarely has resulted in the promised : > work. I don't think this is the right way forward. : : I have to admit that I was quite pleasantly surprised that so many : network drivers became MPSAFE in the last couple of months, and USB : will nicely round that out. It may help if Ed could take whatever : notes he sent Peter about sio and drop them on the Wiki as the : beginning of a howto on adapting drivers to the new tty layer. One : thing I would really like to see is us taking this as an opportunity : to get more people interested in maintaining the tty drivers we have : -- now is a really good opportunity for people do adopt drivers and : update them. : : I fall down a bit more on the Warner side here also: I think posting : the docs in advance of removing the drivers, and having a couple of : weeks gap between those events, is a much more productive way to : achieve the goal of updated drivers. It also means that we don't : have timeline gaps when serial drivers for devices are broken: when : people are playing the binary search game to find some other : problem, the last thing they want is to hit dates where their system : consoles don't work. Yes. It wouldn't kill us to push this out 4-6 weeks. Having the docs available, even in the wiki form, would help quite a bit. We want to harvest the enthusiasm people like Peter and Peter are showing to convert their drivers. Warner From ed at 80386.nl Sun Aug 3 19:48:48 2008 From: ed at 80386.nl (Ed Schouten) Date: Sun Aug 3 19:48:59 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080803.112856.35218914.imp@bsdimp.com> References: <20080801113935.GM99951@hoeg.nl> <20080803.112856.35218914.imp@bsdimp.com> Message-ID: <20080803194844.GA99951@hoeg.nl> * M. Warner Losh wrote: > You are moving too fast on this. Please slow down. You promised > documentation and such, but that hasn't happened, so a slowdown in > your timeline is justified. It's really unfortunate you happen to mention this issue right now. After the message by Peter two days ago, I think I sent him enough documentation to get him started. As discussed with scottl and rwatson, I extended this documentation to something more usable: http://wiki.freebsd.org/TTYRedesign But indeed, this doesn't justify that I haven't written this earlier on. -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080803/3291ffff/attachment.pgp From imp at bsdimp.com Sun Aug 3 20:19:35 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 3 20:19:41 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080803194844.GA99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> <20080803.112856.35218914.imp@bsdimp.com> <20080803194844.GA99951@hoeg.nl> Message-ID: <20080803.141744.-552483469.imp@bsdimp.com> In message: <20080803194844.GA99951@hoeg.nl> Ed Schouten writes: : * M. Warner Losh wrote: : > You are moving too fast on this. Please slow down. You promised : > documentation and such, but that hasn't happened, so a slowdown in : > your timeline is justified. : : It's really unfortunate you happen to mention this issue right now. : After the message by Peter two days ago, I think I sent him enough : documentation to get him started. As discussed with scottl and rwatson, : I extended this documentation to something more usable: : : http://wiki.freebsd.org/TTYRedesign : : But indeed, this doesn't justify that I haven't written this earlier on. Yes, it is unfortunate that I mention it now. I should have said something sooner too. However, I'm not saying anything that hasn't been said by others in the past few days and weeks. I'm just yelling it more loudly than they chose to. I saw the earlier discussions, and thought things were going well, but haven't had the time to pay attention in the last couple of weeks. Still, please don't shoot the messenger too much. Warner From rwatson at FreeBSD.org Sun Aug 3 20:36:23 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Aug 3 20:36:29 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080803.141744.-552483469.imp@bsdimp.com> References: <20080801113935.GM99951@hoeg.nl> <20080803.112856.35218914.imp@bsdimp.com> <20080803194844.GA99951@hoeg.nl> <20080803.141744.-552483469.imp@bsdimp.com> Message-ID: On Sun, 3 Aug 2008, M. Warner Losh wrote: > In message: <20080803194844.GA99951@hoeg.nl> > Ed Schouten writes: > : * M. Warner Losh wrote: > : > You are moving too fast on this. Please slow down. You promised > : > documentation and such, but that hasn't happened, so a slowdown in > : > your timeline is justified. > : > : It's really unfortunate you happen to mention this issue right now. > : After the message by Peter two days ago, I think I sent him enough > : documentation to get him started. As discussed with scottl and rwatson, > : I extended this documentation to something more usable: > : > : http://wiki.freebsd.org/TTYRedesign > : > : But indeed, this doesn't justify that I haven't written this earlier on. > > Yes, it is unfortunate that I mention it now. I should have said something > sooner too. However, I'm not saying anything that hasn't been said by > others in the past few days and weeks. I'm just yelling it more loudly than > they chose to. I saw the earlier discussions, and thought things were going > well, but haven't had the time to pay attention in the last couple of weeks. > Still, please don't shoot the messenger too much. I'm a fan of giving it a week or two breather and focusing on updating drivers, documentation, etc, and then merging it all in mid-august. I don't think there's any need to delay things a month, but if we have eager hands (Peter Jeremy, Peter Wemm?) waiting to update drivers given a bit of documentation, now is the time for them to start looking at the wiki and figuring out what they're missing. BTW, what do you (Ed) think of uploading the patch for one of the drivers to the wiki and annotating the diffs? Robert N M Watson Computer Laboratory University of Cambridge From ed at 80386.nl Sun Aug 3 20:46:47 2008 From: ed at 80386.nl (Ed Schouten) Date: Sun Aug 3 20:48:32 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: References: <20080801113935.GM99951@hoeg.nl> <20080803.112856.35218914.imp@bsdimp.com> <20080803194844.GA99951@hoeg.nl> <20080803.141744.-552483469.imp@bsdimp.com> Message-ID: <20080803204645.GD99951@hoeg.nl> Hello Robert, * Robert Watson wrote: > I'm a fan of giving it a week or two breather and focusing on updating > drivers, documentation, etc, and then merging it all in mid-august. I > don't think there's any need to delay things a month, My thoughts exactly. I also mentioned this in a private message to Warner. I'm sure we'll talk about this at the DevSummit, which is a good thing. As I once mentioned, it would be rather painful for me if we would delay it too long, because now is my summer break and in September it is not. > but if we have eager hands (Peter Jeremy, Peter Wemm?) waiting to > update drivers given a bit of documentation, now is the time for them > to start looking at the wiki and figuring out what they're missing. Yes. Peter Wemm already mentioned he doesn't mind fixing up si(4) after the import, which also removes some of the pressure. As mentioned on the Wiki page: don't be shy to poke me if you have any questions/comments! > BTW, what do you (Ed) think of uploading the patch for one of the > drivers to the wiki and annotating the diffs? That sounds like a good idea. I think I'll annotate /sys/ia64/ia64/ssc.c, which has become really simple after it had been ported to mpsafetty. -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080803/49a1e933/attachment.pgp From rwatson at FreeBSD.org Sun Aug 3 21:26:59 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Aug 3 21:27:06 2008 Subject: Reminder: non-mpsafetty drivers to be connected on Sunday In-Reply-To: <20080803204645.GD99951@hoeg.nl> References: <20080801113935.GM99951@hoeg.nl> <20080803.112856.35218914.imp@bsdimp.com> <20080803194844.GA99951@hoeg.nl> <20080803.141744.-552483469.imp@bsdimp.com> <20080803204645.GD99951@hoeg.nl> Message-ID: On Sun, 3 Aug 2008, Ed Schouten wrote: > * Robert Watson wrote: >> I'm a fan of giving it a week or two breather and focusing on updating >> drivers, documentation, etc, and then merging it all in mid-august. I >> don't think there's any need to delay things a month, > > My thoughts exactly. I also mentioned this in a private message to Warner. > I'm sure we'll talk about this at the DevSummit, which is a good thing. As I > once mentioned, it would be rather painful for me if we would delay it too > long, because now is my summer break and in September it is not. I think this sounds fine. My big concern, btw, is not in any way with the shape/quality of the work you've done --- rather, it's that I want to avoid, as much as possible, knocking people off the head of 8.x as developers or users. Experience suggests that the more rough bumps people get on the development head, the more likely they are to fall back to some or another -stable, or try to "wait out" the problem by going away for a month or two. This has a negative impact on testing, since it means fewer users, and it has a negative impact on overall development rate. It's not that that any particular breakage is the end of the world, it's just that as people bump along, they eventually hit a bump there they could spend four more hours trying to figure out why the box appears not to boot, or they could just fall back and get work done, and you get a gradual attrition. This is, btw, one reason why using Perforce has actually significant accelerated development: projects are more mature before they are merged, so are less likely to knock people off. Which doesn't mean we don't need occasional breakage, it just means we have to moderate it, give people plenty of warning, etc. This avoids cascading and cyclic development failures along the lines of "I'll wait until bgfsck is stable before trying HEAD and fixing KSE", "I'll wait until KSE is stable before trying HEAD and fixing SMP", etc. :-) Robert N M Watson Computer Laboratory University of Cambridge From bugmaster at FreeBSD.org Mon Aug 4 11:06:52 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 4 11:07:17 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200808041106.m74B6qNV082005@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From nwhitehorn at freebsd.org Tue Aug 5 09:42:06 2008 From: nwhitehorn at freebsd.org (Nathan Whitehorn) Date: Tue Aug 5 09:42:36 2008 Subject: UMA MD Small Allocator Runtime Switching Message-ID: <48981C19.8060009@freebsd.org> I'm working on the PowerPC G5 port right now, and have run into a problem with the way the UMA small allocator works. On G3/G4 systems, there is a direct physical->virtual mapping, and on G5s there isn't. All of the infrastructure is in place to support both types of system with a single kernel image, except that UMA_MD_SMALL_ALLOC must be switched on/off at runtime. One solution is to put if (direct_map) use_nonsmall_case() into the MD small_alloc/free() routines and define UMA_MD_SMALL_ALLOC everywhere. This works well, except that the MI UMA code then sets booted = 1 too early in the boot process, before the kmem_alloc*() routines are available. Basically, I need to find a way have an MD UMA allocator without the MI UMA code assuming anything about how it works internally. Maybe adding a UMA_MD_ALLOC_LATE define to prevent setting booted=1 early on? -Nathan From attilio at freebsd.org Tue Aug 5 14:52:58 2008 From: attilio at freebsd.org (Attilio Rao) Date: Tue Aug 5 14:53:04 2008 Subject: witness performance improvements In-Reply-To: <20080718163231.B954@desktop> References: <20080718163231.B954@desktop> Message-ID: <3bbf2fe10808050752ra9bf259x45627660245d3ad9@mail.gmail.com> 2008/7/19, Jeff Roberson : > Hello, > > I have a patch that improves witness performance available at: > > http://people.freebsd.org/~jeff/witness.diff > > This improvement comes at the cost of some significant space overhead. It > changes the witness graph from a linked tree to a matrix based approach. > Relationships can be quickly resolved with a table lookup. The table size > is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > This patch also makes struct witness objects persistent even after the last > lock using this name has been removed. This is helpful for short lived > objects which may be created frequently. > > To reduce lock contention on SMP witness_checkorder() now runs without the > w_mtx when there are no lock violations. I also cache a lock_list_entry in > each thread as allocating these requires the w_mtx. The entry is disposed > of at thread_exit(). > > There is also a new sysctl that produces dot output which graphs lock order > relationships with the graphviz program. > > Most of this work was done by Ilya Maykov while he was at Isilon systems. > The locking work and some cleanup/porting/refinement was done by me on > behalf of Nokia. > > The performance improvement can be significant. It is only on the order of > 10-20% for buildkernel but on a packet forwarding test at nokia it sped > things up by 5x putting a witness enabled kernel within about 50% of the > performance of a kernel without. I believe buildworld isn't helped as much > because forking and exiting a lot would then contend on the witness lock. Hello, here there is a fixed version of the Jeff's patch: http://community.gufi.org/~rookie/witness_fast.diff It fixes some bugs, imports the "comma serparated" approach for fullgraph and drops the cyclegraph (which can be now evicted by the fullgraph through handy scripts). I'd like people test this "final" version before it hits the tree and give feedbacks. Thanks, Attilio PS: consider this patch not exactly to be set as an example in regard of "diff reduction against head" :) -- Peace can only be achieved by understanding - A. Einstein From jhb at freebsd.org Tue Aug 5 18:58:20 2008 From: jhb at freebsd.org (John Baldwin) Date: Tue Aug 5 18:58:27 2008 Subject: UMA MD Small Allocator Runtime Switching In-Reply-To: <48981C19.8060009@freebsd.org> References: <48981C19.8060009@freebsd.org> Message-ID: <200808051024.27043.jhb@freebsd.org> On Tuesday 05 August 2008 05:23:37 am Nathan Whitehorn wrote: > I'm working on the PowerPC G5 port right now, and have run into a > problem with the way the UMA small allocator works. On G3/G4 systems, > there is a direct physical->virtual mapping, and on G5s there isn't. All > of the infrastructure is in place to support both types of system with a > single kernel image, except that UMA_MD_SMALL_ALLOC must be switched > on/off at runtime. > > One solution is to put if (direct_map) use_nonsmall_case() into the MD > small_alloc/free() routines and define UMA_MD_SMALL_ALLOC everywhere. > This works well, except that the MI UMA code then sets booted = 1 too > early in the boot process, before the kmem_alloc*() routines are available. > > Basically, I need to find a way have an MD UMA allocator without the MI > UMA code assuming anything about how it works internally. Maybe adding a > UMA_MD_ALLOC_LATE define to prevent setting booted=1 early on? > -Nathan Have you considered creating an artificial direct map region in the address space on the G5? Some of the other 64-bit ports (amd64 and sparc64) do this to gain the benefits of the direct map even though it isn't a mandated part of the architecture like it is on some other platforms (alpha and mips). -- John Baldwin From gonzo at freebsd.org Thu Aug 7 14:59:32 2008 From: gonzo at freebsd.org (Oleksandr Tymoshenko) Date: Thu Aug 7 14:59:44 2008 Subject: Need a code review In-Reply-To: <86r69buar0.fsf@ds4.des.no> References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> Message-ID: <489B08F6.8060605@freebsd.org> Dag-Erling Sm?rgrav wrote: > "M. Warner Losh" writes: >> http://people.freebsd.org/~gonzo/mips2/libpam.diff > > This won't work. Your patch unconditionally sets NO_STATIC_MODULES > which will result in a non-functional libpam.a (the modules will be > built into the library, but without any of the glue that allows the > library to find them) not just on mips, but on all other platforms. openpam detects static modules build using cpp(1) condition: #if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) The problem is that gcc MIPS option -mabi-calls assumes -fpic for both static and dynamic builds. So the question is: would defining NO_STATIC_MODULES for MIPS be enough or it should be addressed upstream? PS NetBSD stumbled upon it too: http://mail-index.netbsd.org/port-sgimips/2008/01/29/msg000058.html -- gonzo From gonzo at freebsd.org Thu Aug 7 17:36:56 2008 From: gonzo at freebsd.org (Oleksandr Tymoshenko) Date: Thu Aug 7 17:37:03 2008 Subject: Need a code review In-Reply-To: <489B08F6.8060605@freebsd.org> References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> Message-ID: <489B32A2.1090302@freebsd.org> Oleksandr Tymoshenko wrote: > Dag-Erling Sm?rgrav wrote: >> "M. Warner Losh" writes: >>> http://people.freebsd.org/~gonzo/mips2/libpam.diff >> >> This won't work. Your patch unconditionally sets NO_STATIC_MODULES >> which will result in a non-functional libpam.a (the modules will be >> built into the library, but without any of the glue that allows the >> library to find them) not just on mips, but on all other platforms. > > openpam detects static modules build using cpp(1) condition: > #if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) > The problem is that gcc MIPS option -mabi-calls assumes -fpic for both > static and dynamic builds. So the question is: would defining > NO_STATIC_MODULES for MIPS be enough or it should be addressed > upstream? And diff in question is *completely* wrong. NO_STATIC_MODULES should be added to flags when compiling objects for shlib, not to PICFLAGS Actual "fix" passed unnoticed by me in contrib/openpam, sorry for misguiding. -- gonzo From des at des.no Thu Aug 7 23:06:43 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Aug 7 23:06:51 2008 Subject: Need a code review In-Reply-To: <489B08F6.8060605@freebsd.org> (Oleksandr Tymoshenko's message of "Thu, 07 Aug 2008 17:38:46 +0300") References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> Message-ID: <867iasfmrh.fsf@ds4.des.no> Oleksandr Tymoshenko writes: > openpam detects static modules build using cpp(1) condition: > #if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) > The problem is that gcc MIPS option -mabi-calls assumes -fpic for both > static and dynamic builds. So the question is: would defining > NO_STATIC_MODULES for MIPS be enough or it should be addressed > upstream? "upstream" in this case means me. DES -- Dag-Erling Sm?rgrav - des@des.no From ed at 80386.nl Fri Aug 8 10:56:06 2008 From: ed at 80386.nl (Ed Schouten) Date: Fri Aug 8 10:56:13 2008 Subject: MPSAFE TTY schedule delay: 10 days In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080808105605.GN99951@hoeg.nl> Hello everyone, Today it's August 8, which means I should have sent you the following: * Ed Schouten wrote: > August 8 2008: > Send the last heads-up to the lists, to warn people about the > big commit. > > August 10 2008: > Commit the new MPSAFE TTY driver in several commits (first > commit the layer itself, then commit changes to drivers one by > one). After some discussion, I decided to delay the import of the MPSAFE TTY code 10 days. This means I'm going to send the big heads-up on August 18 (last day of the DevSummit). As usual, I should point people to the following URL: > Please, make sure we can make this a smooth transition by > testing/reviewing my code. I tend to generate diffs very often. They can > be downloaded here: > > http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ It would be really nice if I could get some more reviews on the MPSAFE TTY code. Thanks! -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080808/6a3f0aff/attachment.pgp From jhb at freebsd.org Sat Aug 9 21:02:33 2008 From: jhb at freebsd.org (John Baldwin) Date: Sat Aug 9 21:03:29 2008 Subject: Make MOD_QUIESCE a bit more useful.. Message-ID: <200808091637.33820.jhb@freebsd.org> So currently the MOD_QUIESCE event is posted to a module when unloading a kld so it can veto non-forced unloads. However, the current implementation in the kernel linker is to run through all the modules in a file, posting MOD_QUIESCE followed by MOD_UNLOAD on each module serially. Thus, if you have multiple modules in a single kld and one of the modules veto's an unload request via MOD_QUIESCE, you don't know as the module author if any of your modules were unloaded via MOD_UNLOAD or not. I think a better approach would be to change the kernel linker to invoke MOD_QUIESCE on all modules in a single pass first. If none of those fail (or it's a forced unload), then it can do a second pass invoking MOD_UNLOAD on all the modules. -- John Baldwin From phk at phk.freebsd.dk Sun Aug 10 07:14:33 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun Aug 10 07:14:40 2008 Subject: Make MOD_QUIESCE a bit more useful.. In-Reply-To: Your message of "Sat, 09 Aug 2008 16:37:33 -0400." <200808091637.33820.jhb@freebsd.org> Message-ID: <1687.1218352471@critter.freebsd.dk> In message <200808091637.33820.jhb@freebsd.org>, John Baldwin writes: >So currently the MOD_QUIESCE event is posted to a module when unloading a kld >so it can veto non-forced unloads. However, the current implementation in >the kernel linker is to run through all the modules in a file, posting >MOD_QUIESCE followed by MOD_UNLOAD on each module serially. Thus, if you >have multiple modules in a single kld and one of the modules veto's an unload >request via MOD_QUIESCE, you don't know as the module author if any of your >modules were unloaded via MOD_UNLOAD or not. I think a better approach would >be to change the kernel linker to invoke MOD_QUIESCE on all modules in a >single pass first. If none of those fail (or it's a forced unload), then it can do a second pass invoking MOD_UNLOAD on all the modules. I thought it already worked that way, so no objection. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From gonzo at freebsd.org Sun Aug 10 12:21:09 2008 From: gonzo at freebsd.org (Oleksandr Tymoshenko) Date: Sun Aug 10 12:21:15 2008 Subject: Need a code review In-Reply-To: <867iasfmrh.fsf@ds4.des.no> References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> <867iasfmrh.fsf@ds4.des.no> Message-ID: <489EDD2F.9080302@freebsd.org> Dag-Erling Sm?rgrav wrote: > Oleksandr Tymoshenko writes: >> openpam detects static modules build using cpp(1) condition: >> #if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) >> The problem is that gcc MIPS option -mabi-calls assumes -fpic for both >> static and dynamic builds. So the question is: would defining >> NO_STATIC_MODULES for MIPS be enough or it should be addressed >> upstream? > > "upstream" in this case means me. Here is new fix: http://people.freebsd.org/~gonzo/mips2/libpam2.diff The idea is to set define explicitly for dynamic case rather then rely on __PIC__. -- gonzo From des at des.no Sun Aug 10 20:51:44 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sun Aug 10 20:51:56 2008 Subject: Need a code review In-Reply-To: <489EDD2F.9080302@freebsd.org> (Oleksandr Tymoshenko's message of "Sun, 10 Aug 2008 15:21:03 +0300") References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> <867iasfmrh.fsf@ds4.des.no> <489EDD2F.9080302@freebsd.org> Message-ID: <86myjkbnky.fsf@ds4.des.no> Oleksandr Tymoshenko writes: > Here is new fix: http://people.freebsd.org/~gonzo/mips2/libpam2.diff > The idea is to set define explicitly for dynamic case rather then rely > on __PIC__. This is completely backwards. Dynamic is the default option, and in fact the only supported option on many (non-FreeBSD) systems. If it were up to me, I would remove support for statically linked modules entirely. DES -- Dag-Erling Sm?rgrav - des@des.no From imp at bsdimp.com Sun Aug 10 22:55:37 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sun Aug 10 22:55:44 2008 Subject: Make MOD_QUIESCE a bit more useful.. In-Reply-To: <200808091637.33820.jhb@freebsd.org> References: <200808091637.33820.jhb@freebsd.org> Message-ID: <20080810.165333.232928772.imp@bsdimp.com> In message: <200808091637.33820.jhb@freebsd.org> John Baldwin writes: : So currently the MOD_QUIESCE event is posted to a module when unloading a kld : so it can veto non-forced unloads. However, the current implementation in : the kernel linker is to run through all the modules in a file, posting : MOD_QUIESCE followed by MOD_UNLOAD on each module serially. Thus, if you : have multiple modules in a single kld and one of the modules veto's an unload : request via MOD_QUIESCE, you don't know as the module author if any of your : modules were unloaded via MOD_UNLOAD or not. I think a better approach would : be to change the kernel linker to invoke MOD_QUIESCE on all modules in a : single pass first. If none of those fail (or it's a forced unload), then it : can do a second pass invoking MOD_UNLOAD on all the modules. That sounds great to me. I'm a bit surprised it is implemented the way you say... Warner From julian at elischer.org Mon Aug 11 04:58:51 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Aug 11 04:58:58 2008 Subject: Make MOD_QUIESCE a bit more useful.. In-Reply-To: <20080810.165333.232928772.imp@bsdimp.com> References: <200808091637.33820.jhb@freebsd.org> <20080810.165333.232928772.imp@bsdimp.com> Message-ID: <489FC706.7050306@elischer.org> M. Warner Losh wrote: > In message: <200808091637.33820.jhb@freebsd.org> > John Baldwin writes: > : So currently the MOD_QUIESCE event is posted to a module when unloading a kld > : so it can veto non-forced unloads. However, the current implementation in > : the kernel linker is to run through all the modules in a file, posting > : MOD_QUIESCE followed by MOD_UNLOAD on each module serially. Thus, if you > : have multiple modules in a single kld and one of the modules veto's an unload > : request via MOD_QUIESCE, you don't know as the module author if any of your > : modules were unloaded via MOD_UNLOAD or not. I think a better approach would > : be to change the kernel linker to invoke MOD_QUIESCE on all modules in a > : single pass first. If none of those fail (or it's a forced unload), then it > : can do a second pass invoking MOD_UNLOAD on all the modules. > > That sounds great to me. I'm a bit surprised it is implemented the > way you say... me++ > > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From bugmaster at FreeBSD.org Mon Aug 11 11:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 11 11:07:14 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200808111106.m7BB6sSD047132@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From jhb at freebsd.org Mon Aug 11 15:44:56 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Aug 11 15:45:03 2008 Subject: Make MOD_QUIESCE a bit more useful.. In-Reply-To: <20080810.165333.232928772.imp@bsdimp.com> References: <200808091637.33820.jhb@freebsd.org> <20080810.165333.232928772.imp@bsdimp.com> Message-ID: <200808111143.50023.jhb@freebsd.org> On Sunday 10 August 2008 06:53:33 pm M. Warner Losh wrote: > In message: <200808091637.33820.jhb@freebsd.org> > John Baldwin writes: > : So currently the MOD_QUIESCE event is posted to a module when unloading a kld > : so it can veto non-forced unloads. However, the current implementation in > : the kernel linker is to run through all the modules in a file, posting > : MOD_QUIESCE followed by MOD_UNLOAD on each module serially. Thus, if you > : have multiple modules in a single kld and one of the modules veto's an unload > : request via MOD_QUIESCE, you don't know as the module author if any of your > : modules were unloaded via MOD_UNLOAD or not. I think a better approach would > : be to change the kernel linker to invoke MOD_QUIESCE on all modules in a > : single pass first. If none of those fail (or it's a forced unload), then it > : can do a second pass invoking MOD_UNLOAD on all the modules. > > That sounds great to me. I'm a bit surprised it is implemented the > way you say... So was I. What happens now is that the kernel linker does a for loop over all the modules calling 'module_unload()'. module_unload() invokes both MOD_QUIESCE and MOD_UNLOAD back to back. Hmm, so fixing this brings up one extra note: Do we need a new module event (say MOD_UNLOAD_ABORTED, MOD_AWAKEN, or MOD_DEQUIESCE) that would get invoked when a kldunload is veto'd by a MOD_QUIESCE event that would get posted to all the modules that had successfully QUIESCED so far? -- John Baldwin From gonzo at freebsd.org Mon Aug 11 18:28:18 2008 From: gonzo at freebsd.org (Oleksandr Tymoshenko) Date: Mon Aug 11 18:28:24 2008 Subject: Need a code review In-Reply-To: <86myjkbnky.fsf@ds4.des.no> References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> <867iasfmrh.fsf@ds4.des.no> <489EDD2F.9080302@freebsd.org> <86myjkbnky.fsf@ds4.des.no> Message-ID: <48A084C0.4040105@freebsd.org> Dag-Erling Sm?rgrav wrote: > Oleksandr Tymoshenko writes: >> Here is new fix: http://people.freebsd.org/~gonzo/mips2/libpam2.diff >> The idea is to set define explicitly for dynamic case rather then rely >> on __PIC__. > > This is completely backwards. Dynamic is the default option, and in > fact the only supported option on many (non-FreeBSD) systems. If it > were up to me, I would remove support for statically linked modules > entirely. OK, so the static case is the one to be marked explicitly. Here is the next patch: http://people.freebsd.org/~gonzo/mips2/libpam3.diff Drop NO_STATIC_MODULES since it's a default and add PAM_STATIC_MODULES to request static modules build. -- gonzo From keramida at ceid.upatras.gr Tue Aug 12 01:20:07 2008 From: keramida at ceid.upatras.gr (Giorgos Keramidas) Date: Tue Aug 12 01:20:14 2008 Subject: Need a code review In-Reply-To: <48A084C0.4040105@freebsd.org> (Oleksandr Tymoshenko's message of "Mon, 11 Aug 2008 21:28:16 +0300") References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> <489B08F6.8060605@freebsd.org> <867iasfmrh.fsf@ds4.des.no> <489EDD2F.9080302@freebsd.org> <86myjkbnky.fsf@ds4.des.no> <48A084C0.4040105@freebsd.org> Message-ID: <874p5rt4gz.fsf@kobe.laptop> On Mon, 11 Aug 2008 21:28:16 +0300, Oleksandr Tymoshenko wrote: > Dag-Erling Sm?rgrav wrote: >> Oleksandr Tymoshenko writes: >>> Here is new fix: http://people.freebsd.org/~gonzo/mips2/libpam2.diff >>> The idea is to set define explicitly for dynamic case rather then rely >>> on __PIC__. >> >> This is completely backwards. Dynamic is the default option, and in >> fact the only supported option on many (non-FreeBSD) systems. If it >> were up to me, I would remove support for statically linked modules >> entirely. > > OK, so the static case is the one to be marked explicitly. Here is the > next patch: http://people.freebsd.org/~gonzo/mips2/libpam3.diff Drop > NO_STATIC_MODULES since it's a default and add PAM_STATIC_MODULES to > request static modules build. IMHO, since the #ifdef'ed part in openpam.h mentions "gcc static linking", it would probably be nice to keep the __GNUC__ part in -#if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) +#if defined(PAM_STATIC_MODULES) /* gcc, static linking */ and write it as -#if defined(__GNUC__) && !defined(__PIC__) && !defined(NO_STATIC_MODULES) +#if defined(__GNUC__) && defined(PAM_STATIC_MODULES) /* gcc, static linking */ From nwhitehorn at freebsd.org Wed Aug 13 14:45:26 2008 From: nwhitehorn at freebsd.org (Nathan Whitehorn) Date: Wed Aug 13 14:45:33 2008 Subject: UMA MD Small Allocator Runtime Switching In-Reply-To: <200808051024.27043.jhb@freebsd.org> References: <48981C19.8060009@freebsd.org> <200808051024.27043.jhb@freebsd.org> Message-ID: <48A2E62A.9060604@freebsd.org> John Baldwin wrote: > On Tuesday 05 August 2008 05:23:37 am Nathan Whitehorn wrote: >> I'm working on the PowerPC G5 port right now, and have run into a >> problem with the way the UMA small allocator works. On G3/G4 systems, >> there is a direct physical->virtual mapping, and on G5s there isn't. All >> of the infrastructure is in place to support both types of system with a >> single kernel image, except that UMA_MD_SMALL_ALLOC must be switched >> on/off at runtime. >> >> One solution is to put if (direct_map) use_nonsmall_case() into the MD >> small_alloc/free() routines and define UMA_MD_SMALL_ALLOC everywhere. >> This works well, except that the MI UMA code then sets booted = 1 too >> early in the boot process, before the kmem_alloc*() routines are available. >> >> Basically, I need to find a way have an MD UMA allocator without the MI >> UMA code assuming anything about how it works internally. Maybe adding a >> UMA_MD_ALLOC_LATE define to prevent setting booted=1 early on? >> -Nathan > > Have you considered creating an artificial direct map region in the address > space on the G5? Some of the other 64-bit ports (amd64 and sparc64) do this > to gain the benefits of the direct map even though it isn't a mandated part > of the architecture like it is on some other platforms (alpha and mips). I thought about it, but we can only use 4K pages on the G5 so this would put a large amount of pressure on the page table. IBM removed the block translation mechanism from the G5 and the CPU's superpage support is not available in the 32-bit compatibility mode under which we currently run. -Nathan From rwatson at FreeBSD.org Wed Aug 13 16:59:52 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Aug 13 16:59:58 2008 Subject: Advanced warning: virtualization work will be afoot Message-ID: Dear all: This weekend is the FreeBSD Developer Summit in Cambridge, UK, and one of the focuses of the meetings will be the on-going network stack virtualization. We hope to have several days in a row to focus on known issues, understand (and possible address) various concerns about the approach, and possibly expand the focus to further subsystems. We also hope to explore doing some amount of virtualization below the language level, and hopefully have the right set of people there to talk about that. With all this in mind, it may well be that we begin some of the heavy lifting on virtualization in the base Subversion tree, whereas to date this work has occurred almost entirely in Perforce. The outcome of the meetings will depend a lot on what we find and what progress we make, so this isn't exactly a "HEADS UP: Virtualization going into SVN", but it could well be that there is quite a bit of committing going on. Thanks, Robert N M Watson Computer Laboratory University of Cambridge From jhb at freebsd.org Wed Aug 13 17:21:37 2008 From: jhb at freebsd.org (John Baldwin) Date: Wed Aug 13 17:21:46 2008 Subject: UMA MD Small Allocator Runtime Switching In-Reply-To: <48A2E62A.9060604@freebsd.org> References: <48981C19.8060009@freebsd.org> <200808051024.27043.jhb@freebsd.org> <48A2E62A.9060604@freebsd.org> Message-ID: <200808131214.43326.jhb@freebsd.org> On Wednesday 13 August 2008 09:48:26 am Nathan Whitehorn wrote: > John Baldwin wrote: > > On Tuesday 05 August 2008 05:23:37 am Nathan Whitehorn wrote: > >> I'm working on the PowerPC G5 port right now, and have run into a > >> problem with the way the UMA small allocator works. On G3/G4 systems, > >> there is a direct physical->virtual mapping, and on G5s there isn't. All > >> of the infrastructure is in place to support both types of system with a > >> single kernel image, except that UMA_MD_SMALL_ALLOC must be switched > >> on/off at runtime. > >> > >> One solution is to put if (direct_map) use_nonsmall_case() into the MD > >> small_alloc/free() routines and define UMA_MD_SMALL_ALLOC everywhere. > >> This works well, except that the MI UMA code then sets booted = 1 too > >> early in the boot process, before the kmem_alloc*() routines are available. > >> > >> Basically, I need to find a way have an MD UMA allocator without the MI > >> UMA code assuming anything about how it works internally. Maybe adding a > >> UMA_MD_ALLOC_LATE define to prevent setting booted=1 early on? > >> -Nathan > > > > Have you considered creating an artificial direct map region in the address > > space on the G5? Some of the other 64-bit ports (amd64 and sparc64) do this > > to gain the benefits of the direct map even though it isn't a mandated part > > of the architecture like it is on some other platforms (alpha and mips). > > I thought about it, but we can only use 4K pages on the G5 so this would > put a large amount of pressure on the page table. IBM removed the block > translation mechanism from the G5 and the CPU's superpage support is not > available in the 32-bit compatibility mode under which we currently run. Hmm, I didn't know you weren't running in full 64-bit mode. Is that a property of the G5 CPU that it only supports the 32-bit compat mode with 64-bit extensions? -- John Baldwin From nwhitehorn at freebsd.org Wed Aug 13 17:36:25 2008 From: nwhitehorn at freebsd.org (Nathan Whitehorn) Date: Wed Aug 13 17:36:31 2008 Subject: UMA MD Small Allocator Runtime Switching In-Reply-To: <200808131214.43326.jhb@freebsd.org> References: <48981C19.8060009@freebsd.org> <200808051024.27043.jhb@freebsd.org> <48A2E62A.9060604@freebsd.org> <200808131214.43326.jhb@freebsd.org> Message-ID: <48A31B9A.6040705@freebsd.org> John Baldwin wrote: > > [snipped bit about faking a direct map] >> I thought about it, but we can only use 4K pages on the G5 so this would >> put a large amount of pressure on the page table. IBM removed the block >> translation mechanism from the G5 and the CPU's superpage support is not >> available in the 32-bit compatibility mode under which we currently run. >> > > Hmm, I didn't know you weren't running in full 64-bit mode. Is that a > property of the G5 CPU that it only supports the 32-bit compat mode with > 64-bit extensions? > No, it supports full 64-bit mode as well, and likes that much better. In fact, you have to do a fair bit of work to keep it in the compatibility mode: it switches to the full 64-bit mode whenever it takes a trap, for instance. The initial porting target is the compatibility mode because (a) I'm lazy and didn't want to simultaneously do a brand new 64-bit port and deal with changes for the G5 and (b) it would be nice to have a single 32-bit PPC install CD that works on all machines with 32-bit operating system support. It's the trying to avoid any #ifdef G5 that creates this problem with the UMA allocator. I'm gotten this completely working using a bunch of dynamic switching stuff (I can boot multiuser and build world on both my G3 and G5 machine with the same kernel), but to do it I need to remove where it sets booted = 1 as an optimization when the MI UMA subsystem is initializing. -Nathan From avg at icyb.net.ua Fri Aug 15 13:05:19 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Aug 15 13:05:25 2008 Subject: tilt/horizontal scroll support In-Reply-To: <20080813162931.GC718@epsilon.local> References: <48A300B9.5090105@icyb.net.ua> <20080813162931.GC718@epsilon.local> Message-ID: <48A57B1B.4000903@icyb.net.ua> on 13/08/2008 19:29 Rui Paulo said the following: > > Well, perhaps the best way is to teach sysmouse about horizontal scrolling > and then add a quirk WRT your mouse ? > > sysmouse(4) really needs to grow horizontal scrolling since nowadays every > mouse has it. Rui, I agree, this would be a perfect solution. What scares me is backward compatibility. I think that I do not understand how to do it right. So that older userland software works with newer kernels and newer userland works with older kernels. As I understand there are interfaces of hardware mouse drivers, then there is moused, then there is sysmouse interface and then there are user applications like X server. Knowledge of horizontal scrolling needs to be added to all components in the chain and it is better be done in backward-compatible fashion. And I really do not know to do this properly. Would it be just adding some new bytes to the protocol or growing a new protocol (level) or something else... P.S. I replaced usb ml with arch@ in cc. -- Andriy Gapon From darrenr at freebsd.org Fri Aug 15 14:25:51 2008 From: darrenr at freebsd.org (Darren Reed) Date: Fri Aug 15 14:26:03 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: References: Message-ID: <1218809394.10612.1268815905@webmail.messagingengine.com> Robert, Do you have any more information about what the details of this virtualization work will be? e.g. will it be similar to what Solaris has with zones? The reason that I ask is that I've just finished getting the ipfilter code (non-Sun code) converted to being zone aware. What does that mean? Lots of global variables are gone, replaced by soft-context structures that are allocated and free'd when zones come alive/die. For BSD, while all of the code paths are the same, I'm currently only using a single soft context and just pass around a pointer to it. If you're going to be doing similar work for FreeBSD, I will try and get this into the tree sooner, rather than later, so that there's one less component that you need to worry about. Cheers, Darren -- Darren Reed darrenr@fastmail.net From des at des.no Fri Aug 15 15:43:10 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Fri Aug 15 15:43:16 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <1218809394.10612.1268815905@webmail.messagingengine.com> (Darren Reed's message of "Fri, 15 Aug 2008 16:09:54 +0200") References: <1218809394.10612.1268815905@webmail.messagingengine.com> Message-ID: <86fxp6wagj.fsf@ds4.des.no> "Darren Reed" writes: > Do you have any more information about what the details of this > virtualization work will be? This has been discussed extensively on various mailing lists and at various BSD conferences over the past ~5 years. Search the archives for "vimage". DES -- Dag-Erling Sm?rgrav - des@des.no From Alexander at Leidinger.net Fri Aug 15 15:55:36 2008 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Fri Aug 15 15:55:43 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <1218809394.10612.1268815905@webmail.messagingengine.com> References: <1218809394.10612.1268815905@webmail.messagingengine.com> Message-ID: <20080815173029.4a9a9f59@deskjail> Quoting "Darren Reed" (Fri, 15 Aug 2008 16:09:54 +0200): > Robert, > > Do you have any more information about what the details of > this virtualization work will be? e.g. will it be similar > to what Solaris has with zones? It's like the Solaris Crossbow project. Virtual network stacks. I don't know if you refer to this or to the normal operation of zones since Solaris 10. The later one we will get for jails too, but this is just a side effect (sort of, there are patches floating around to get multi-IPs into jails in a different way before the VNET stuff, so that we can have the multi-IPS for jails in 7.x too). Bye, Alexander. -- "Planet Express: our crew is replaceable, your package isn't." -Advertisement http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From attilio at freebsd.org Fri Aug 15 16:10:41 2008 From: attilio at freebsd.org (Attilio Rao) Date: Fri Aug 15 16:10:47 2008 Subject: [PATCH] Deadlock detection on production environments Message-ID: <3bbf2fe10808150842mf8ab204u5a5ef13b894e32e2@mail.gmail.com> Below is reported a patch which should identify deadlocks (or, better, a class of them) in system thought to run in production environments (so with no addictional overhead or no addictional debugging options like WITNESS). The main idea is to check for times spent by threads sleeping on a sleepqueue or blocking on a turnstile. If these times exceed an exagerated threshold, we can say the threads deadlocked. This seems a very effective way of identify deadlocks in particular if considering the following things: - LOR can happen only between locks within the same class (for example, sleepqueued locks (sx, lockmgr), turnstiled locks (mtx, rwlock) and spinlocks). - spinlocks alredy have a way to identify deadlocks between eachother. As scanning all the threads in the system is an operation which requires some locking, the operation is deferred into a specific context, the "deadlock resolver" which will periodically (with 1 second interval) check for sleep times and block times of any thread and it will report threshold exceedings. In order to implement this idea I had to fill some gaps in the basic support. For example, sleep ticks were saved once the thread was entering the sleepqueue, but blocking ticks weren't in the turnstiles. In order to cope with this, in the patch turnstiles save / clean the block times. There were no way to differentiate a thread sleeping on a sleepqueue linked to a lock or a waitchannel (msleep, condvar, etc.). In order to cope with this, I added sleepq_type() which returns the type of the waitchannel. The implementation has been pretty much straightforward if not for a LOR which was present between the threadlock and the sleepq_type(). Now it should be solved. Note that actually the default threshold is 30 min for sleepqueues and 15 mins for turnstiles. This approach doesn't solve all kind of deadlocks. For example, it won't solve the problem of missed wakeups on a waitchannel. Theoretically, we could check for sleepqueue linked to a waitchannel too, but it would not be simple to differentiate between a volountary, long, sleeping (ie: users input waiting) and a deadlock. If you have good ideas about it, please let me know. A case where this approach will lead to a false positive is the case of bugged code. For example, if a thread owns a sxlock and then performs a very long sleep on a waitchannel (for example, the users input waiting) with the sx held. In this case, the lock should be dropped before to perform the sleeping and there is no way to cope with badly written code. Also we should cope with the general ticks counter overflowing. What if ticks counter overflows? it is right we should add heuristic in order to identify such case and workaround it? For the moment I left it out. The final concern is about a possible deadlock for the thread lock resolver when acquiring the allproc_lock. The solution for this should be mainly MD, I guess, and it should involve the NMI (or a relative concept for architectures different from IA32). On ia32 systems, for example, we could use a NMI handler, linked to a watchdog, performing appropriate checks on the resolver runtimes, in order to verify that it didn't break out. The patch has been reviewed and discussed by jeff@ and has been developed on the behalf of Nokia. Thanks, Attilio ================================== --- /usr/src/sys/kern/kern_clock.c 2008-05-31 23:42:15.000000000 +0200 +++ src/sys/kern/kern_clock.c 2008-08-15 15:34:51.000000000 +0200 @@ -48,14 +48,16 @@ #include #include #include -#include +#include #include +#include #include #include #include #include #include #include +#include #include #include #include @@ -159,6 +161,117 @@ SYSCTL_PROC(_kern, OID_AUTO, cp_times, CTLTYPE_LONG|CTLFLAG_RD, 0,0, sysctl_kern_cp_times, "LU", "per-CPU time statistics"); +static int slptime_threshold = 1800; +static int blktime_threshold = 900; + +static void +deadlkres(void) +{ + struct proc *p; + struct thread *td; + void *wchan; + int blkticks, slpticks, slptype, tticks; + + for (;;) { + blkticks = blktime_threshold * hz; + slpticks = slptime_threshold * hz; + sx_slock(&allproc_lock); + FOREACH_PROC_IN_SYSTEM(p) { + PROC_SLOCK(p); + FOREACH_THREAD_IN_PROC(p, td) { + thread_lock(td); + if (TD_ON_LOCK(td)) { + + /* + * The thread should be blocked on a + * turnstile, so just check if the + * turnstile channel is in good state. + */ + MPASS(td->td_blocked != NULL); + tticks = ticks - td->td_blktick; + if (tticks > blkticks) { + + /* + * A thread stuck for too long + * on a turnstile. Act as + * appropriate. + */ + thread_unlock(td); + PROC_SUNLOCK(p); + sx_sunlock(&allproc_lock); + panic("%s: deadlock detected for %p, blocked for %d ticks\n", + __func__, td, tticks); + } + } else if (TD_ON_SLEEPQ(td)) { + + /* + * Check if the thread is sleeping + * on a lock, otherwise skip the check. + * Drop the thread lock in order to + * avoid a LOR with sq_chain spinlock. + * Note that the proc spinlock will + * prevent the thread from exiting. + */ + wchan = td->td_wchan; + thread_unlock(td); + slptype = sleepq_type(wchan); + thread_lock(td); + + /* + * Check if we lost the race for + * accessing again the thread. + */ + if (!TD_ON_SLEEPQ(td) || + wchan != td->td_wchan || + slptype == -1) { + thread_unlock(td); + continue; + } + tticks = ticks - td->td_slptick; + if ((slptype == SLEEPQ_SX || + slptype == SLEEPQ_LK) && + tticks > slpticks) { + + /* + * A thread stuck for too long + * on a sleepqueue linked to + * a lock. Act as appropriate. + */ + thread_unlock(td); + PROC_SUNLOCK(p); + sx_sunlock(&allproc_lock); + panic("%s: deadlock detected for %p, blocked for %d ticks\n", + __func__, td, tticks); + } + } + thread_unlock(td); + } + PROC_SUNLOCK(p); + } + sx_sunlock(&allproc_lock); + + /* + * Sleep for one second, than try again to find deadlocks. + */ + pause("deadlkres", 1 * hz); + } +} + +static struct kthread_desc deadlkres_kd = { + "deadlkres", + deadlkres, + (struct thread **)NULL +}; +SYSINIT(deadlkres, SI_SUB_CLOCKS, SI_ORDER_ANY, kthread_start, &deadlkres_kd); + +SYSCTL_NODE(_debug, OID_AUTO, deadlock, CTLFLAG_RW, 0, "Deadlock detection"); +SYSCTL_INT(_debug_deadlock, OID_AUTO, slptime_threshold, CTLFLAG_RW, + &slptime_threshold, 0, + "Number of seconds within is valid to sleep on a sleepqueue"); +SYSCTL_INT(_debug_deadlock, OID_AUTO, blktime_threshold, CTLFLAG_RW, + &blktime_threshold, 0, + "Number of seconds within is valid to block on a turnstile"); + void read_cpu_time(long *cp_time) { --- /usr/src/sys/kern/subr_sleepqueue.c 2008-08-13 18:41:11.000000000 +0200 +++ src/sys/kern/subr_sleepqueue.c 2008-08-15 15:30:10.000000000 +0200 @@ -121,8 +121,8 @@ LIST_ENTRY(sleepqueue) sq_hash; /* (c) Chain and free list. */ LIST_HEAD(, sleepqueue) sq_free; /* (c) Free queues. */ void *sq_wchan; /* (c) Wait channel. */ -#ifdef INVARIANTS int sq_type; /* (c) Queue type. */ +#ifdef INVARIANTS struct lock_object *sq_lock; /* (c) Associated lock. */ #endif }; @@ -313,7 +313,6 @@ ("thread's sleep queue has a non-empty free list")); KASSERT(sq->sq_wchan == NULL, ("stale sq_wchan pointer")); sq->sq_lock = lock; - sq->sq_type = flags & SLEEPQ_TYPE; #endif #ifdef SLEEPQUEUE_PROFILING sc->sc_depth++; @@ -326,6 +325,7 @@ sq = td->td_sleepqueue; LIST_INSERT_HEAD(&sc->sc_queues, sq, sq_hash); sq->sq_wchan = wchan; + sq->sq_type = flags & SLEEPQ_TYPE; } else { MPASS(wchan == sq->sq_wchan); MPASS(lock == sq->sq_lock); @@ -644,6 +644,29 @@ } /* + * Returns the type of sleepqueue given a waitchannel. + */ +int +sleepq_type(void *wchan) +{ + struct sleepqueue *sq; + int type; + + MPASS(wchan != NULL); + + sleepq_lock(wchan); + sq = sleepq_lookup(wchan); + if (sq == NULL) { + sleepq_release(wchan); + return (-1); + } + type = sq->sq_type; + sleepq_release(wchan); + + return (type); +} + +/* * Removes a thread from a sleep queue and makes it * runnable. */ @@ -1144,8 +1167,8 @@ return; found: db_printf("Wait channel: %p\n", sq->sq_wchan); -#ifdef INVARIANTS db_printf("Queue type: %d\n", sq->sq_type); +#ifdef INVARIANTS if (sq->sq_lock) { lock = sq->sq_lock; db_printf("Associated Interlock: %p - (%s) %s\n", lock, --- /usr/src/sys/kern/subr_turnstile.c 2008-05-31 23:42:15.000000000 +0200 +++ src/sys/kern/subr_turnstile.c 2008-08-15 15:30:10.000000000 +0200 @@ -732,6 +732,7 @@ td->td_tsqueue = queue; td->td_blocked = ts; td->td_lockname = lock->lo_name; + td->td_blktick = ticks; TD_SET_LOCK(td); mtx_unlock_spin(&tc->tc_lock); propagate_priority(td); @@ -924,6 +925,7 @@ MPASS(TD_CAN_RUN(td)); td->td_blocked = NULL; td->td_lockname = NULL; + td->td_blktick = 0; #ifdef INVARIANTS td->td_tsqueue = 0xff; #endif --- /usr/src/sys/sys/proc.h 2008-08-13 18:42:06.000000000 +0200 +++ src/sys/sys/proc.h 2008-08-15 15:30:10.000000000 +0200 @@ -213,6 +213,7 @@ struct ucred *td_ucred; /* (k) Reference to credentials. */ u_int td_estcpu; /* (t) estimated cpu utilization */ u_int td_slptick; /* (t) Time at sleep. */ + u_int td_blktick; /* (t) Time spent blocked. */ struct rusage td_ru; /* (t) rusage information */ uint64_t td_incruntime; /* (t) Cpu ticks to transfer to proc. */ uint64_t td_runtime; /* (t) How many cpu ticks we've run. */ --- /usr/src/sys/sys/sleepqueue.h 2008-08-13 18:42:06.000000000 +0200 +++ src/sys/sys/sleepqueue.h 2008-08-15 15:30:10.000000000 +0200 @@ -109,6 +109,7 @@ void sleepq_set_timeout(void *wchan, int timo); int sleepq_timedwait(void *wchan, int pri); int sleepq_timedwait_sig(void *wchan, int pri); +int sleepq_type(void *wchan); void sleepq_wait(void *wchan, int pri); int sleepq_wait_sig(void *wchan, int pri); From des at des.no Fri Aug 15 16:50:22 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Fri Aug 15 16:50:34 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <11167f520808150930h7b1e307eu8b7f5032d177fcbe@mail.gmail.com> (Sam Fourman, Jr.'s message of "Fri, 15 Aug 2008 11:30:28 -0500") References: <1218809394.10612.1268815905@webmail.messagingengine.com> <86fxp6wagj.fsf@ds4.des.no> <11167f520808150930h7b1e307eu8b7f5032d177fcbe@mail.gmail.com> Message-ID: <86y72yuss2.fsf@ds4.des.no> "Sam Fourman Jr." writes: > I am confused, is vimage actually in -CURRENT now? my guess is no > because there is no man page No, this is why Robert was saying we were going to work on it this weekend. DES -- Dag-Erling Sm?rgrav - des@des.no From sfourman at gmail.com Fri Aug 15 16:56:32 2008 From: sfourman at gmail.com (Sam Fourman Jr.) Date: Fri Aug 15 16:56:38 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <86fxp6wagj.fsf@ds4.des.no> References: <1218809394.10612.1268815905@webmail.messagingengine.com> <86fxp6wagj.fsf@ds4.des.no> Message-ID: <11167f520808150930h7b1e307eu8b7f5032d177fcbe@mail.gmail.com> > This has been discussed extensively on various mailing lists and at > various BSD conferences over the past ~5 years. Search the archives for > "vimage". I am confused, is vimage actually in -CURRENT now? my guess is no because there is no man page Sam Fourman Jr. From julian at elischer.org Fri Aug 15 18:33:19 2008 From: julian at elischer.org (Julian Elischer) Date: Fri Aug 15 18:33:31 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <1218809394.10612.1268815905@webmail.messagingengine.com> References: <1218809394.10612.1268815905@webmail.messagingengine.com> Message-ID: <48A5CBEE.6090603@elischer.org> Darren Reed wrote: > Robert, > > Do you have any more information about what the details of > this virtualization work will be? e.g. will it be similar > to what Solaris has with zones? > > The reason that I ask is that I've just finished getting > the ipfilter code (non-Sun code) converted to being zone > aware. What does that mean? Lots of global variables are > gone, replaced by soft-context structures that are allocated > and free'd when zones come alive/die. For BSD, while all > of the code paths are the same, I'm currently only using > a single soft context and just pass around a pointer to > it. > > If you're going to be doing similar work for FreeBSD, I > will try and get this into the tree sooner, rather than > later, so that there's one less component that you need > to worry about. > > Cheers, > Darren look at the following document: http://perforce.freebsd.org/fileLogView.cgi?FSPC=//depot/projects/vimage/porting_to_vimage.txt sorry if that wraps there are patches at: also look at the patches in the ipfilter files in that branch. If you are doingthe work for zones then that will be applicable. BTW you might look at dropping all teh suport for freeBSD 3.x in your files :-) the aggregate diff can be found at: http://www.freebsd.org/~julian/vimage.diff. If you want to handle ipfilter yourself then we'd be happy to let you do it. From ed at 80386.nl Sat Aug 16 11:18:26 2008 From: ed at 80386.nl (Ed Schouten) Date: Sat Aug 16 11:18:33 2008 Subject: [Reviews requested] kern/121073: chroot for non-root users Message-ID: <20080816111824.GL99951@hoeg.nl> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080816/9e649361/attachment.pgp From kostikbel at gmail.com Sat Aug 16 12:31:58 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Aug 16 12:32:05 2008 Subject: [Reviews requested] kern/121073: chroot for non-root users In-Reply-To: <20080816111824.GL99951@hoeg.nl> References: <20080816111824.GL99951@hoeg.nl> Message-ID: <20080816121049.GU1803@deviant.kiev.zoral.com.ua> On Sat, Aug 16, 2008 at 01:18:24PM +0200, Ed Schouten wrote: > Hello everyone, > > When I visited FOSDEM back in February, I was talking with Jille > Timmermans about the chroot() call. After discussing that the problem > with chroot() is that it cannot be safely be executed by non-root users > w.r.t. setuid binaries*, we wrote this patchset for the kernel to add > something similar to `MNT_NOSUID' to the process flags. The result > being: > > http://bugs.FreeBSD.org/121073 > > The patch even adds a small security improvement to the system. Say, > you'd change the typical chroot() + setuid() order the other way around, > you're guaranteed the chrooted process will never change users > afterwards, because it won't honour set[ug]id binaries anymore. > > Our security officer was wise enough to add the following to the PR: > > +----------------------------------------------------------+ > |UNDER NO CONDITIONS SHOULD THIS PATCH BE COMMITTED WITHOUT| > |EXPLICIT APPROVAL FROM THE FREEBSD SECURITY OFFICER. | > +----------------------------------------------------------+ > > After having a discussion with Colin on IRC, there are a couple of > questions we would like to be answered (or discussed) before getting > this in the tree: > > - Are there any comments on the patch itself? > > - Colin was concerned if turned on, would it be possible for the user to > do things which it normally couldn't and shouldn't? > > It would be great to get many reviews on this before we'd land it in the > source tree. I've attached the patch to this email as well. Thanks! > > -- > Ed Schouten > WWW: http://80386.nl/ > > * Hardlink a setuid binary to a directory containing a fake C library > and executing it. I think that the patch gives instant root. FreeBSD provides a rfork(2) system call. This call allows the processes to share filedesc table, that, among other information, contains the root of the filesystem namespace for the process. So, the scenario is to rfork() a process without RFFDG flag, and then for one of the resulting processes to perform a chroot. Now, second one has chrooted root, but no P_NOSUGID flag set. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080816/ac13fe00/attachment.pgp From darrenr at freebsd.org Sat Aug 16 13:05:33 2008 From: darrenr at freebsd.org (Darren Reed) Date: Sat Aug 16 13:05:46 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <48A5CBEE.6090603@elischer.org> References: <1218809394.10612.1268815905@webmail.messagingengine.com> <48A5CBEE.6090603@elischer.org> Message-ID: <48A6D104.2010704@freebsd.org> Julian Elischer wrote: > Darren Reed wrote: >> Robert, >> >> Do you have any more information about what the details of >> this virtualization work will be? e.g. will it be similar >> to what Solaris has with zones? >> >> The reason that I ask is that I've just finished getting >> the ipfilter code (non-Sun code) converted to being zone >> aware. What does that mean? Lots of global variables are >> gone, replaced by soft-context structures that are allocated >> and free'd when zones come alive/die. For BSD, while all >> of the code paths are the same, I'm currently only using >> a single soft context and just pass around a pointer to >> it. >> >> If you're going to be doing similar work for FreeBSD, I >> will try and get this into the tree sooner, rather than >> later, so that there's one less component that you need >> to worry about. >> >> Cheers, >> Darren > > > > look at the following document: > > http://perforce.freebsd.org/fileLogView.cgi?FSPC=//depot/projects/vimage/porting_to_vimage.txt > > > sorry if that wraps > > there are patches at: > > also look at the patches in the ipfilter files in that branch. So the only changes I could see are V_* things for inet global variables that ipfilter abuses. Was there something else that I'm missing? > If you are doingthe work for zones then that will be applicable. It's now close to complete. There are 3 "layers" of data structure initilisation to get it running: - load (initialise all of the globals) - create (create the soft context structure) - init (build tables, register callbacks to get packets, etc.) > ... > the aggregate diff can be found at: > > http://www.freebsd.org/~julian/vimage.diff. > > If you want to handle ipfilter yourself then we'd be happy to let you > do it. At the moment, those diffs for IPFIlter only have some V_* changes for dealing with the global inet variables that it abuses. Is there more hidden somewhere? To get a proper idea of the changes I've been working on you should download this guy: http://coombs.anu.edu.au/~avalon/ip_fil5.0.3.tar.gz Nearly all of the global variables are gone, with bits and pieces hanging off ipf_*_softc_t structures now. So far as ipfilter/ipfw/pf go, for them to be meaningfully virtualised, the pfil code that currently supports them needs to be enhanced further such that they can "listen" for the creation of vimages, etc...that is if people want to delegate (partial) control of such to vimages. Darren From des at des.no Sat Aug 16 14:03:23 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sat Aug 16 14:03:35 2008 Subject: Advanced warning: virtualization work will be afoot In-Reply-To: <48A5CBEE.6090603@elischer.org> (Julian Elischer's message of "Fri, 15 Aug 2008 11:33:18 -0700") References: <1218809394.10612.1268815905@webmail.messagingengine.com> <48A5CBEE.6090603@elischer.org> Message-ID: <863al583bq.fsf@ds4.des.no> Julian Elischer writes: > the aggregate diff can be found at: > > http://www.freebsd.org/~julian/vimage.diff. > > If you want to handle ipfilter yourself then we'd be happy to let you > do it. There are a number of issues with the ipfilter parts of the vimage patch. I will commit an improved version to p4 shortly. DES -- Dag-Erling Sm?rgrav - des@des.no From jille at quis.cx Sat Aug 16 15:48:57 2008 From: jille at quis.cx (Jille Timmermans) Date: Sat Aug 16 15:49:06 2008 Subject: [Reviews requested] kern/121073: chroot for non-root users In-Reply-To: <20080816121049.GU1803@deviant.kiev.zoral.com.ua> References: <20080816111824.GL99951@hoeg.nl> <20080816121049.GU1803@deviant.kiev.zoral.com.ua> Message-ID: <48A6F0A5.7070208@quis.cx> Confirming Kostik, using rfork(RFPROC) instead of rfork(RFPROC|RFFDG) and chrooting in the child also chroot's the parent, without giving him the flag. An option might be to store the P_NOSUGID flag somewhere in the desc table ? Attached patch will show the difference between w/ and w/o RFFDG. jille@elvis:~$ cc -o chroot-rfork chroot-rfork.c jille@elvis:~$ sudo ./chroot-rfork /COPYRIGHT does not exist (chrooted) /COPYRIGHT does not exist (chrooted) jille@elvis:~$ cc -o chroot-rfork chroot-rfork.c -DWITH_RFFDG_FLAG jille@elvis:~$ sudo ./chroot-rfork /COPYRIGHT does not exist (chrooted) /COPYRIGHT exists (not chrooted) -- Jille Kostik Belousov wrote: > On Sat, Aug 16, 2008 at 01:18:24PM +0200, Ed Schouten wrote: > >> Hello everyone, >> >> When I visited FOSDEM back in February, I was talking with Jille >> Timmermans about the chroot() call. After discussing that the problem >> with chroot() is that it cannot be safely be executed by non-root users >> w.r.t. setuid binaries*, we wrote this patchset for the kernel to add >> something similar to `MNT_NOSUID' to the process flags. The result >> being: >> >> http://bugs.FreeBSD.org/121073 >> >> The patch even adds a small security improvement to the system. Say, >> you'd change the typical chroot() + setuid() order the other way around, >> you're guaranteed the chrooted process will never change users >> afterwards, because it won't honour set[ug]id binaries anymore. >> >> Our security officer was wise enough to add the following to the PR: >> >> +----------------------------------------------------------+ >> |UNDER NO CONDITIONS SHOULD THIS PATCH BE COMMITTED WITHOUT| >> |EXPLICIT APPROVAL FROM THE FREEBSD SECURITY OFFICER. | >> +----------------------------------------------------------+ >> >> After having a discussion with Colin on IRC, there are a couple of >> questions we would like to be answered (or discussed) before getting >> this in the tree: >> >> - Are there any comments on the patch itself? >> >> - Colin was concerned if turned on, would it be possible for the user to >> do things which it normally couldn't and shouldn't? >> >> It would be great to get many reviews on this before we'd land it in the >> source tree. I've attached the patch to this email as well. Thanks! >> >> -- >> Ed Schouten >> WWW: http://80386.nl/ >> >> * Hardlink a setuid binary to a directory containing a fake C library >> and executing it. >> > > I think that the patch gives instant root. FreeBSD provides a rfork(2) > system call. This call allows the processes to share filedesc table, that, > among other information, contains the root of the filesystem namespace > for the process. > > So, the scenario is to rfork() a process without RFFDG flag, and then > for one of the resulting processes to perform a chroot. Now, second one > has chrooted root, but no P_NOSUGID flag set. > -------------- next part -------------- #include #include #include #include #ifdef WITH_RFFDG_FLAG #define RFORK_FLAGS RFPROC|RFFDG #else #define RFORK_FLAGS RFPROC #endif int main(int argc, char **argv) { struct stat sb; switch(rfork(RFORK_FLAGS)) { case -1: err(1, "rfork()"); case 0: if(chroot("/tmp")!=0) err(1, "chroot()"); if(stat("/COPYRIGHT", &sb)==0) printf("/COPYRIGHT exists (not chrooted)\n"); else printf("/COPYRIGHT does not exist (chrooted)\n"); break; default: sleep(1); if(stat("/COPYRIGHT", &sb)==0) printf("/COPYRIGHT exists (not chrooted)\n"); else printf("/COPYRIGHT does not exist (chrooted)\n"); } } From ed at 80386.nl Mon Aug 18 09:34:42 2008 From: ed at 80386.nl (Ed Schouten) Date: Mon Aug 18 09:34:54 2008 Subject: HEADS UP: MPSAFE TTY integration on August 20 Message-ID: <20080818093441.GO99951@hoeg.nl> Hello everyone, As I informed everyone a couple of weeks ago: I'm going to integrate the MPSAFE TTY layer on August 20. The original plan was to do this on August 10: * Ed Schouten wrote: > August 10 2008: > Commit the new MPSAFE TTY driver in several commits (first > commit the layer itself, then commit changes to drivers one by > one). We decided to delay the integration 10 more days, to give interested people some more time to work on remaining drivers. I am going to integrate the following patchset: http://people.FreeBSD.org/~ed/mpsafetty/ The patchset also contains some changes to the Xen console driver and si(4), but I'm not going to commit those, because they are not yet fully ported/tested. Yours, -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080818/10025fa9/attachment.pgp From rwatson at FreeBSD.org Mon Aug 18 10:05:30 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Aug 18 10:05:37 2008 Subject: [Reviews requested] kern/121073: chroot for non-root users In-Reply-To: <20080816121049.GU1803@deviant.kiev.zoral.com.ua> References: <20080816111824.GL99951@hoeg.nl> <20080816121049.GU1803@deviant.kiev.zoral.com.ua> Message-ID: On Sat, 16 Aug 2008, Kostik Belousov wrote: >> It would be great to get many reviews on this before we'd land it in the >> source tree. I've attached the patch to this email as well. Thanks! > > I think that the patch gives instant root. FreeBSD provides a rfork(2) > system call. This call allows the processes to share filedesc table, that, > among other information, contains the root of the filesystem namespace for > the process. > > So, the scenario is to rfork() a process without RFFDG flag, and then for > one of the resulting processes to perform a chroot. Now, second one has > chrooted root, but no P_NOSUGID flag set. There is a long and sordid history of vulnerability associated with the use of the chroot(2) system call in well-meaning attempts to allow users to employ it in order to improve security. Most of the lessons center on the high level of trust placed in the file system name space by UNIX applications *and* the kernel, and the unexpected implications of allowing that namespace to be manipulated by untrusted processes. I think I would generally be very conservative about making any change to behavior here, even optional change, simply because it will lead to future security advisories. More generally, I'm a bit worried by the increasing number of minor security policy variations controlled by sysctls and kernel options -- often they serve the function of optionally exposing kernel behavior not reviewed or hardened against untrusted users to use. These minor variations risk coming into conflict with application and kernel assumptions about the security model, so I think we should be very careful about adding too many. Robert N M Watson Computer Laboratory University of Cambridge From bugmaster at FreeBSD.org Mon Aug 18 11:06:47 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 18 11:07:15 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200808181106.m7IB6kxU079746@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From andrew-freebsd at areilly.bpc-users.org Tue Aug 19 05:08:01 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Tue Aug 19 05:08:08 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack Message-ID: <20080819025019.GA27997@duncan.reilly.home> Hi all, Let me tell you a story, and perhaps someone can suggest a different course of action than the one that I've taken, which has been to switch back to SCHED_4BSD: I've got an old P3-500 machine that I use for audio processing experiments and also music playback. It's got an M-Audio Delta1010 card in it, which (in its most native mode) has ten channels in and twelve out, all 24/32-bit. I use the 4front-tech driver, because the native one doesn't do multi-channel (yet). I recently "upgraded" the OS on that box from 6-stable to 7-stable, since I've had such good experiences with 7 (and SCHED_ULE) on my desktop and server systems (and 4front now supports 7). Unfortunatly, for this application, this was a seriously retrograde step, at first: no matter how I fiddled the blocking factors and IO sizes, I couldn't stop the system from glitching (audio buffer underruns). It seemed that any (unrelated) network activity would take priority over the audio, even though I had the audio task set to rtprio=10. Loging in to the box with ssh was a guaranteed sound glitcher. It probably doesn't help that that box has a dagy old 100baseTX RealTek ethernet card, that I have to use with -r=1024 on my NFS mounts to avoid fragmentation problems. I'm pleased to report that switching back to SCHED_4BSD has retrieved the situation, and my audio task is now rock solid and stable again. I've been thinking about writing up a PR about the issue, but I haven't figured out how to generate a minimally failing example that anyone else would be able to verify. Maybe I'll just go ahead and post this message, to see if anyone has any suggestions. Given the emphasis of _ULE on multi-processor scalability and total system throughput (at which it seems to rock), I suspect that the answer may well be: "use a more suitable operating system". I hope not. I would expect that the same mechanisms that enable good multi-processor scalability would also have good real-time characteristics: the same asynchronous events and preemption are at work in both cases. So, here's the question: can I do something to my code, or the way I set its priority, to get something equivalent to the reliable real-time scheduling that I can get in _4BSD under _ULE? Cheers, Andrew From jroberson at jroberson.net Tue Aug 19 08:29:18 2008 From: jroberson at jroberson.net (Jeff Roberson) Date: Tue Aug 19 08:29:24 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080819025019.GA27997@duncan.reilly.home> References: <20080819025019.GA27997@duncan.reilly.home> Message-ID: <20080818215813.H952@desktop> On Tue, 19 Aug 2008, Andrew Reilly wrote: > Hi all, > > Let me tell you a story, and perhaps someone can suggest a > different course of action than the one that I've taken, which > has been to switch back to SCHED_4BSD: > > I've got an old P3-500 machine that I use for audio processing > experiments and also music playback. It's got an M-Audio > Delta1010 card in it, which (in its most native mode) has > ten channels in and twelve out, all 24/32-bit. I use the > 4front-tech driver, because the native one doesn't do > multi-channel (yet). I recently "upgraded" the OS on that box > from 6-stable to 7-stable, since I've had such good experiences > with 7 (and SCHED_ULE) on my desktop and server systems (and > 4front now supports 7). Unfortunatly, for this application, > this was a seriously retrograde step, at first: no matter how > I fiddled the blocking factors and IO sizes, I couldn't stop > the system from glitching (audio buffer underruns). It seemed > that any (unrelated) network activity would take priority over > the audio, even though I had the audio task set to rtprio=10. > Loging in to the box with ssh was a guaranteed sound glitcher. Can you tell me what % cpu the audio application uses while running? Have you tried nice -20 instead of rtprio? Thanks, Jeff > > It probably doesn't help that that box has a dagy old 100baseTX > RealTek ethernet card, that I have to use with -r=1024 on my NFS > mounts to avoid fragmentation problems. > > I'm pleased to report that switching back to SCHED_4BSD has > retrieved the situation, and my audio task is now rock solid and > stable again. > > I've been thinking about writing up a PR about the issue, but I > haven't figured out how to generate a minimally failing example > that anyone else would be able to verify. Maybe I'll just > go ahead and post this message, to see if anyone has any > suggestions. > > Given the emphasis of _ULE on multi-processor scalability and > total system throughput (at which it seems to rock), I suspect > that the answer may well be: "use a more suitable operating > system". I hope not. I would expect that the same mechanisms > that enable good multi-processor scalability would also have > good real-time characteristics: the same asynchronous events and > preemption are at work in both cases. > > So, here's the question: can I do something to my code, or the > way I set its priority, to get something equivalent to the > reliable real-time scheduling that I can get in _4BSD under > _ULE? > > Cheers, > > Andrew > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From andrew-freebsd at areilly.bpc-users.org Tue Aug 19 13:40:17 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Tue Aug 19 13:40:45 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080818215813.H952@desktop> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> Message-ID: <20080819134005.GA85664@duncan.reilly.home> On Mon, Aug 18, 2008 at 10:00:12PM -1000, Jeff Roberson wrote: > Can you tell me what % cpu the audio application uses while running? Have > you tried nice -20 instead of rtprio? It's currently using about 10%, maybe a bit more. I expect it to get heavier as I add more to it. I have hopes of it continuing to work even at 60 to 80% of CPU. I haven't tried nice -20 because I don't want the priority to drift or change, which is something that I thought the normal levels did. I'll give it a go though, and report back. Cheers, Andrew From freebsd at sopwith.solgatos.com Tue Aug 19 16:38:34 2008 From: freebsd at sopwith.solgatos.com (Dieter) Date: Tue Aug 19 16:38:41 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: Your message of "Tue, 19 Aug 2008 12:50:19 +1000." <20080819025019.GA27997@duncan.reilly.home> Message-ID: <200808191557.PAA05091@sopwith.solgatos.com> > I'm pleased to report that switching back to SCHED_4BSD has > retrieved the situation, and my audio task is now rock solid and > stable again. This is very interesting. Rtprio does not work, but switching schedulers does. Is there something that the scheduler affects that rtprio doesn't? > Given the emphasis of _ULE on multi-processor scalability and > total system throughput (at which it seems to rock), I suspect > that the answer may well be: "use a more suitable operating > system". I hope not. If you've only tried changing the priority and buffer sizes of the audio process, you could try nicing the network down, and changing the network's buffer sizes up/down. But I wouldn't count on that helping much, if any. I suspect that the answer, at least in your case, is use _4BSD. The goal of total system throughput, and the goal of real time are somewhat at odds. For example giving processes longer time slices would reduce the number of context switches and should increase total system throughput, but would hurt real time response. I've been battling similar real time problems. I'm already using _4BSD, so it doesn't solve everything. From what I can figure out, some device drivers simply hog the CPU for long periods of time. Hopefully that can be fixed. And the schedulers are concerned with allocating CPU, but ignore allocating I/O resources fairly. Nice/rtprio has very little affect on I/O if the process doesn't use much CPU. We really need a way to nice I/O up/down. From obrien at FreeBSD.org Tue Aug 19 16:41:49 2008 From: obrien at FreeBSD.org (David O'Brien) Date: Tue Aug 19 16:41:56 2008 Subject: HEADS UP: MPSAFE TTY integration on August 20 In-Reply-To: <20080818093441.GO99951@hoeg.nl> References: <20080818093441.GO99951@hoeg.nl> Message-ID: <20080819161150.GA18621@dragon.NUXI.org> On Mon, Aug 18, 2008 at 11:34:41AM +0200, Ed Schouten wrote: > As I informed everyone a couple of weeks ago: I'm going to integrate the > MPSAFE TTY layer on August 20. The original plan was to do this on > August 10: > > * Ed Schouten wrote: > > August 10 2008: > > Commit the new MPSAFE TTY driver in several commits (first > > commit the layer itself, then commit changes to drivers one by > > one). > > We decided to delay the integration 10 more days, to give interested > people some more time to work on remaining drivers. Ed, Before the first checkin, please lay down a tag. Given there are some drivers issues at the point of commit, folks might need to hang at the last point before MPSAFE-TTY to wait for late coming drivers. Now that SVN tags are ultra cheep, we should come up with a convention. Say svn+ssh://svn.freebsd.org/base/tag/_[(PRE|POST)-] $ svn copy --parents svn+ssh://svn.freebsd.org/base/head svn+ssh://svn.freebsd.org/base/tag/HEAD_PRE_MPSAFE_TTY thanks, -- -- David (obrien@FreeBSD.org) From ed at 80386.nl Wed Aug 20 09:16:53 2008 From: ed at 80386.nl (Ed Schouten) Date: Wed Aug 20 09:17:06 2008 Subject: HEADS DOWN: MPSAFE TTY layer integrated In-Reply-To: <20080818093441.GO99951@hoeg.nl> References: <20080818093441.GO99951@hoeg.nl> Message-ID: <20080820091651.GV99951@hoeg.nl> Hello everyone, I'm pleased to announce that the MPSAFE TTY layer has just been integrated to our source tree. Below is the commit message: ----- Forwarded message from Ed Schouten ----- > Date: Wed, 20 Aug 2008 08:31:58 +0000 (UTC) > From: Ed Schouten > To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org > Subject: cvs commit: src ObsoleteFiles.inc UPDATING src/bin/sh miscbltin.c > src/etc login.conf src/etc/defaults devfs.rules > src/lib/libc/stdlib Makefile.inc Symbol.map grantpt.3 ptsname.3 > ptsname.c src/lib/libc/sys Makefile.inc Symbol.map getrlimit.2 ... > > ed 2008-08-20 08:31:58 UTC > > FreeBSD src repository > > > > Log: > SVN rev 181905 on 2008-08-20 08:31:58Z by ed > > Integrate the new MPSAFE TTY layer to the FreeBSD operating system. > > The last half year I've been working on a replacement TTY layer for the > FreeBSD kernel. The new TTY layer was designed to improve the following: > > - Improved driver model: > > The old TTY layer has a driver model that is not abstract enough to > make it friendly to use. A good example is the output path, where the > device drivers directly access the output buffers. This means that an > in-kernel PPP implementation must always convert network buffers into > TTY buffers. > > If a PPP implementation would be built on top of the new TTY layer > (still needs a hooks layer, though), it would allow the PPP > implementation to directly hand the data to the TTY driver. > > - Improved hotplugging: > > With the old TTY layer, it isn't entirely safe to destroy TTY's from > the system. This implementation has a two-step destructing design, > where the driver first abandons the TTY. After all threads have left > the TTY, the TTY layer calls a routine in the driver, which can be > used to free resources (unit numbers, etc). > > The pts(4) driver also implements this feature, which means > posix_openpt() will now return PTY's that are created on the fly. > > - Improved performance: > > One of the major improvements is the per-TTY mutex, which is expected > to improve scalability when compared to the old Giant locking. > Another change is the unbuffered copying to userspace, which is both > used on TTY device nodes and PTY masters. > > Upgrading should be quite straightforward. Unlike previous versions, > existing kernel configuration files do not need to be changed, except > when they reference device drivers that are listed in UPDATING. > > Obtained from: //depot/projects/mpsafetty/... > Approved by: philip (ex-mentor) > Discussed: on the lists, at BSDCan, at the DevSummit > Sponsored by: Snow B.V., the Netherlands > dcons(4) fixed by: kan > > > ----- End forwarded message ----- Some people asked me if I could tag the tree before importing the MPSAFE TTY bits, to make sure people can easily stick to older source revisions when they want to do comparisons with the old TTY implementation, which may be useful when porting drivers, isolating bugs, etc. After discussing with people on IRC, I've decided not to create the tag. If you want to switch to pre-MPSAFE TTY, please check out revision r181904. Well, that's all I've got to say for now, I guess. Be sure to update your systems and give my code some extensive testing. Thanks! -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080820/ac4f8e90/attachment.pgp From keramida at ceid.upatras.gr Wed Aug 20 22:10:39 2008 From: keramida at ceid.upatras.gr (Giorgos Keramidas) Date: Wed Aug 20 22:10:52 2008 Subject: HEADS DOWN: MPSAFE TTY layer integrated In-Reply-To: <20080820091651.GV99951@hoeg.nl> (Ed Schouten's message of "Wed, 20 Aug 2008 11:16:51 +0200") References: <20080818093441.GO99951@hoeg.nl> <20080820091651.GV99951@hoeg.nl> Message-ID: <87pro3pliw.fsf@kobe.laptop> On Wed, 20 Aug 2008 11:16:51 +0200, Ed Schouten wrote: > Hello everyone, > > I'm pleased to announce that the MPSAFE TTY layer has just been > integrated to our source tree. Below is the commit message: Thank you! :-) >> Log: >> SVN rev 181905 on 2008-08-20 08:31:58Z by ed >> >> Integrate the new MPSAFE TTY layer to the FreeBSD operating system. >> [...] > > ----- End forwarded message ----- > > Some people asked me if I could tag the tree before importing the > MPSAFE TTY bits, to make sure people can easily stick to older source > revisions when they want to do comparisons with the old TTY > implementation, which may be useful when porting drivers, isolating > bugs, etc. > > After discussing with people on IRC, I've decided not to create the > tag. If you want to switch to pre-MPSAFE TTY, please check out > revision r181904. Fair enough. If the need arises, it's easy to lay down a tag after the fact, by copying: svn copy svn+ssh://svn.freebsd.org/base/head@181904 \ svn+ssh://svn.freebsd.org/base/tags/pre-mpsafetty But since the initial commit was pushed to svn in one changeset, the need for that is a bit small I guess. Doing it later should also be ok too. From nwhitehorn at freebsd.org Wed Aug 20 23:41:54 2008 From: nwhitehorn at freebsd.org (Nathan Whitehorn) Date: Wed Aug 20 23:42:01 2008 Subject: UMA MD Small Allocator Runtime Switching In-Reply-To: <200808131214.43326.jhb@freebsd.org> References: <48981C19.8060009@freebsd.org> <200808051024.27043.jhb@freebsd.org> <48A2E62A.9060604@freebsd.org> <200808131214.43326.jhb@freebsd.org> Message-ID: <48ACABBF.3070705@freebsd.org> John Baldwin wrote: > [snip] >> I thought about it, but we can only use 4K pages on the G5 so this would >> put a large amount of pressure on the page table. IBM removed the block >> translation mechanism from the G5 and the CPU's superpage support is not >> available in the 32-bit compatibility mode under which we currently run. >> > > Hmm, I didn't know you weren't running in full 64-bit mode. Is that a > property of the G5 CPU that it only supports the 32-bit compat mode with > 64-bit extensions? > (see other email about the properties of the G5) I think the following one-line patch provides a reasonable solution to this problem. Is there any reason this is a bad idea? -Nathan Index: uma_core.c =================================================================== --- uma_core.c (revision 181929) +++ uma_core.c (working copy) @@ -1667,7 +1667,7 @@ bucket_init(); -#ifdef UMA_MD_SMALL_ALLOC +#if defined(UMA_MD_SMALL_ALLOC) && !defined(UMA_MD_SMALL_ALLOC_NEEDS_VM) booted = 1; #endif From jroberson at jroberson.net Thu Aug 21 07:47:57 2008 From: jroberson at jroberson.net (Jeff Roberson) Date: Thu Aug 21 07:48:08 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080819134005.GA85664@duncan.reilly.home> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> <20080819134005.GA85664@duncan.reilly.home> Message-ID: <20080820214627.C30593@desktop> On Tue, 19 Aug 2008, Andrew Reilly wrote: > On Mon, Aug 18, 2008 at 10:00:12PM -1000, Jeff Roberson wrote: >> Can you tell me what % cpu the audio application uses while running? Have >> you tried nice -20 instead of rtprio? > > It's currently using about 10%, maybe a bit more. I expect > it to get heavier as I add more to it. I have hopes of it > continuing to work even at 60 to 80% of CPU. > > I haven't tried nice -20 because I don't want the priority to > drift or change, which is something that I thought the normal > levels did. I'll give it a go though, and report back. With such a low cpu utilization I wouldn't expect it's the scheduling algorithm. It may be a difference in preemption settings. Is preemption enabled in both kernels? Jeff > > Cheers, > > Andrew > From ivoras at freebsd.org Fri Aug 22 00:32:30 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 00:32:38 2008 Subject: Magic symlinks redux Message-ID: I was reading about new things in NetBSD, and one thing caught my attention: per-user /tmp. See http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080714_0251.html for example. Google says that a discussion about magic symlinks happens every now and then in FreeBSD but nothing really gets done. I found this implementation which looks like it's for 7.0: http://butcher.heavennet.ru/patches/kernel/magiclinks/ As far as I understand the VFS (which isn't much...) this looks like an trivial patch, and it's compatible with NetBSD. Since I'm interested in this (specifically for the per-user /tmp and maybe similar gadgetry), I'd like to nurse this patch into the tree, if there are no objections (of course, I'll bug anyone I can find who knows VFS to review it :) ). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/03830f7f/signature.pgp From rizzo at iet.unipi.it Fri Aug 22 09:21:59 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 09:22:06 2008 Subject: Magic symlinks redux In-Reply-To: References: Message-ID: <20080822090448.GB57441@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 01:54:29AM +0200, Ivan Voras wrote: > I was reading about new things in NetBSD, and one thing caught my > attention: per-user /tmp. See > http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080714_0251.html for > example. > > Google says that a discussion about magic symlinks happens every now and > then in FreeBSD but nothing really gets done. I found this > implementation which looks like it's for 7.0: > > http://butcher.heavennet.ru/patches/kernel/magiclinks/ interestingly simple. Question - is the process' ENV easily available in this part of the kernel, so one could in principle use environment variables as replacement strings ? Some comments on the code in the above patch: + readability it might be improved a bit: e.g. I don't see why uma_{zalloc|zfree} are hidden behind macros, nor why symlynk_magic() isn't simply called as if (vfs_magiclinks) symlink_magic(td, cp, &linklen); as it cannot fail as implemented; also, the whole MATCH/SUBSTITUTE macros are not particularly readable -- i understand one needs macros to implement sizeof("somestring") correctly, but apart from a wrapper I believe the core of these two blocks should be implemented by functions (possibly inline) with MATCH() returning the match length so one doesn't need to replicate the string parameter in SUBSTITUTE(); + correctness -- 1. termchar is not reset to '/' if a match is not found 2. what is the intended behaviour when the replacement string overflows the buffer ? + efficiency of symlink_magic() might be improved too: e.g. the function could do a quick check for the presence of @ and return without allocation/deallocation if not found; getcredhostname() (and similar routines) could be called so that they write directly to tmp, without the need for allocating an in-stack buffer cheers luigi From ivoras at freebsd.org Fri Aug 22 09:59:17 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 09:59:24 2008 Subject: Magic symlinks redux In-Reply-To: <20080822090448.GB57441@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> Message-ID: Luigi Rizzo wrote: > On Fri, Aug 22, 2008 at 01:54:29AM +0200, Ivan Voras wrote: >> I was reading about new things in NetBSD, and one thing caught my >> attention: per-user /tmp. See >> http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080714_0251.html for >> example. >> >> Google says that a discussion about magic symlinks happens every now and >> then in FreeBSD but nothing really gets done. I found this >> implementation which looks like it's for 7.0: >> >> http://butcher.heavennet.ru/patches/kernel/magiclinks/ > > interestingly simple. Yes, that's a big part of the attractiveness of this patch :) > Question - is the process' ENV easily available in this part > of the kernel, so one could in principle use environment variables > as replacement strings ? > > Some comments on the code in the above patch: > > + readability it might be improved a bit: > e.g. I don't see why uma_{zalloc|zfree} are hidden behind macros, > nor why symlynk_magic() isn't simply called as > > if (vfs_magiclinks) > symlink_magic(td, cp, &linklen); > > as it cannot fail as implemented; Ok. > also, the whole MATCH/SUBSTITUTE macros are not particularly > readable -- i understand one needs macros to implement sizeof("somestring") > correctly, but apart from a wrapper I believe the core of these two > blocks should be implemented by functions (possibly inline) with > MATCH() returning the match length so one doesn't need to replicate > the string parameter in SUBSTITUTE(); Yes, I intended to remove the code macros into static inline functions, possibly with some macro glue for sizeof. > + correctness -- > 1. termchar is not reset to '/' if a match is not found > 2. what is the intended behaviour when the replacement string overflows > the buffer ? I'll check those later, when I make an updated patch. > + efficiency of symlink_magic() might be improved too: > e.g. the function could do a quick check for the presence of @ and return > without allocation/deallocation if not found; I think it's because the author wanted a single pass over the string (in case of the "extended" @{...} syntax we can't just check if cp[0] == '@'). The first few lines of the symlink_magic loop ("if (cp[i] != '@')") effectively do what strchr() does. > getcredhostname() (and similar routines) could be called so that > they write directly to tmp, without the need for > allocating an in-stack buffer I think this is for consistency in calling SUBSTITUTE. It could be broken into two variants of the code but is it worth it? I'd rather modify getcredhostname() in kern/kern_jail.c to return the length of the created string and avoid a strlen(). From bu7cher at yandex.ru Fri Aug 22 10:09:35 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Fri Aug 22 10:09:43 2008 Subject: Magic symlinks redux In-Reply-To: <20080822090448.GB57441@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> Message-ID: <48AE89DC.9080408@yandex.ru> Luigi Rizzo wrote: > interestingly simple. > > Question - is the process' ENV easily available in this part > of the kernel, so one could in principle use environment variables > as replacement strings ? > > Some comments on the code in the above patch: > > + readability it might be improved a bit: > e.g. I don't see why uma_{zalloc|zfree} are hidden behind macros, > nor why symlynk_magic() isn't simply called as > > if (vfs_magiclinks) > symlink_magic(td, cp, &linklen); > > as it cannot fail as implemented; > also, the whole MATCH/SUBSTITUTE macros are not particularly > readable -- i understand one needs macros to implement sizeof("somestring") > correctly, but apart from a wrapper I believe the core of these two > blocks should be implemented by functions (possibly inline) with > MATCH() returning the match length so one doesn't need to replicate > the string parameter in SUBSTITUTE(); > > + correctness -- > 1. termchar is not reset to '/' if a match is not found > 2. what is the intended behaviour when the replacement string overflows > the buffer ? > > + efficiency of symlink_magic() might be improved too: > e.g. the function could do a quick check for the presence of @ and return > without allocation/deallocation if not found; > getcredhostname() (and similar routines) could be called so that > they write directly to tmp, without the need for > allocating an in-stack buffer This was so long ago.. As i remember this patch is a quick port of NetBSD's implementation and uses the same code. Also there was another implementation ported from DragonFlyBSD. David Quattlebaum is working on varsyms implementation and he sent fresh patch to me in this April. I attached patch. And sorry, i am not working on this today.. -- WBR, Andrey V. Elsukov -------------- next part -------------- A non-text attachment was scrubbed... Name: varsym.patch Type: application/octet-stream Size: 42181 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/c01a40d7/varsym.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Makefile Type: application/octet-stream Size: 679 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/c01a40d7/Makefile.obj From rizzo at iet.unipi.it Fri Aug 22 10:14:06 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 10:14:15 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822090448.GB57441@onelab2.iet.unipi.it> Message-ID: <20080822101639.GA58256@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 11:59:07AM +0200, Ivan Voras wrote: ... > >+ efficiency of symlink_magic() might be improved too: > > e.g. the function could do a quick check for the presence of @ and return > > without allocation/deallocation if not found; > > I think it's because the author wanted a single pass over the string (in > case of the "extended" @{...} syntax we can't just check if cp[0] == > '@'). The first few lines of the symlink_magic loop ("if (cp[i] != > '@')") effectively do what strchr() does. right, but doing the check upfront might save the uma_zalloc/zfree call in the common case. cheers luigi From ivoras at freebsd.org Fri Aug 22 10:16:29 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 10:16:36 2008 Subject: Magic symlinks redux In-Reply-To: <20080822101639.GA58256@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <20080822101639.GA58256@onelab2.iet.unipi.it> Message-ID: Luigi Rizzo wrote: > On Fri, Aug 22, 2008 at 11:59:07AM +0200, Ivan Voras wrote: > ... >>> + efficiency of symlink_magic() might be improved too: >>> e.g. the function could do a quick check for the presence of @ and return >>> without allocation/deallocation if not found; >> I think it's because the author wanted a single pass over the string (in >> case of the "extended" @{...} syntax we can't just check if cp[0] == >> '@'). The first few lines of the symlink_magic loop ("if (cp[i] != >> '@')") effectively do what strchr() does. > > right, but doing the check upfront might save the uma_zalloc/zfree call > in the common case. Ok. The strings are so short that it's trivial to check them early. From ivoras at freebsd.org Fri Aug 22 10:24:58 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 10:25:05 2008 Subject: Magic symlinks redux In-Reply-To: <48AE89DC.9080408@yandex.ru> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> Message-ID: Andrey V. Elsukov wrote: > Luigi Rizzo wrote: >> interestingly simple. >> >> Question - is the process' ENV easily available in this part >> of the kernel, so one could in principle use environment variables >> as replacement strings ? > This was so long ago.. As i remember this patch is a quick port of > NetBSD's implementation and uses the same code. > > Also there was another implementation ported from DragonFlyBSD. > David Quattlebaum is working on varsyms implementation and he sent > fresh patch to me in this April. I attached patch. > And sorry, i am not working on this today.. This patch is huge. As far as I can tell DragonflyBSD has a whole framework dedicated to varsyms, spread across a fair part of the kernel and with at least one special userland utility. It allows the operator to define his own variables that can be used in the substitutions, and I don't see that it predefines "special" variables like "uid" and "hostname". It's not necessarily a bad solution but I consider it overkill. Anyway, the syntax of DFBSD's varsyms is similar but sufficiently different from NetBSD's magicsyms implementation that both can coexist. DFBSD uses ${var} and NetBSD uses @var or @{var} so there's no ambiguity between them. Unless a kernel developer is interested in working the DFBSD's implementation in, I'll push the NetBSD's variant. From rizzo at iet.unipi.it Fri Aug 22 11:27:06 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 11:27:13 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> Message-ID: <20080822112939.GA58579@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: > Andrey V. Elsukov wrote: ... > > >This was so long ago.. As i remember this patch is a quick port of > >NetBSD's implementation and uses the same code. > > > >Also there was another implementation ported from DragonFlyBSD. ... > This patch is huge. As far as I can tell DragonflyBSD has a whole > framework dedicated to varsyms, spread across a fair part of the kernel > and with at least one special userland utility. It allows the operator > to define his own variables that can be used in the substitutions, and I > don't see that it predefines "special" variables like "uid" and > "hostname". It's not necessarily a bad solution but I consider it overkill. > > Anyway, the syntax of DFBSD's varsyms is similar but sufficiently > different from NetBSD's magicsyms implementation that both can coexist. > DFBSD uses ${var} and NetBSD uses @var or @{var} so there's no > ambiguity between them. > > Unless a kernel developer is interested in working the DFBSD's > implementation in, I'll push the NetBSD's variant. i also believe the simple solution is much more interesting. However i believe a crucial issue (in terms of implementation) is to define exactly the behaviour in error or corner cases, namely: + what to do if we try to expand @{nonexistentkeyword} ? i suppose leave the string as-is is the right thing. + what to do if, as a result of the expansion, we exceed MAXPATHLEN ? here it is really unclear whether returning the original is ok, or there is a way to report some kind of error. Also what is the exact syntax for @var ? From the code it seems to be allowed only as the last component of a pathname i.e. /foo/@bar is valid /foo/@bar/ is not valid and this makes me wonder why one should support this syntax at all, rather than just using /foo/@{bar} which achieves the same thing, is legal in all contexts, has a lower chance of conflicting with existing pathnames and makes the code simpler! cheers luigi From brueffer at FreeBSD.org Fri Aug 22 12:35:29 2008 From: brueffer at FreeBSD.org (Christian Brueffer) Date: Fri Aug 22 12:35:35 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> Message-ID: <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: > Andrey V. Elsukov wrote: > >Luigi Rizzo wrote: > >>interestingly simple. > >> > >>Question - is the process' ENV easily available in this part > >>of the kernel, so one could in principle use environment variables > >>as replacement strings ? > > >This was so long ago.. As i remember this patch is a quick port of > >NetBSD's implementation and uses the same code. > > > >Also there was another implementation ported from DragonFlyBSD. > >David Quattlebaum is working on varsyms implementation and he sent > >fresh patch to me in this April. I attached patch. > >And sorry, i am not working on this today.. > > This patch is huge. As far as I can tell DragonflyBSD has a whole > framework dedicated to varsyms, spread across a fair part of the kernel > and with at least one special userland utility. It allows the operator > to define his own variables that can be used in the substitutions, and I > don't see that it predefines "special" variables like "uid" and > "hostname". It's not necessarily a bad solution but I consider it overkill. > > Anyway, the syntax of DFBSD's varsyms is similar but sufficiently > different from NetBSD's magicsyms implementation that both can coexist. > DFBSD uses ${var} and NetBSD uses @var or @{var} so there's no > ambiguity between them. > > Unless a kernel developer is interested in working the DFBSD's > implementation in, I'll push the NetBSD's variant. > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > Brooks has a varsym port in p4, see //depot/user/brooks/varsym/ - Christian -- Christian Brueffer chris@unixpages.org brueffer@FreeBSD.org GPG Key: http://people.freebsd.org/~brueffer/brueffer.key.asc GPG Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/1c0ea094/attachment.pgp From rizzo at iet.unipi.it Fri Aug 22 14:53:42 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 14:53:50 2008 Subject: Magic symlinks redux In-Reply-To: <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> Message-ID: <20080822145616.GA61094@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 02:05:26PM +0200, Christian Brueffer wrote: > On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: ... > > This patch is huge. As far as I can tell DragonflyBSD has a whole > > framework dedicated to varsyms, spread across a fair part of the kernel > > and with at least one special userland utility. It allows the operator > > to define his own variables that can be used in the substitutions, and I > > don't see that it predefines "special" variables like "uid" and > > "hostname". It's not necessarily a bad solution but I consider it overkill. ... > Brooks has a varsym port in p4, see //depot/user/brooks/varsym/ this also seems to be based on Dragonfly's code, quite intrusive. I am playing with a rewrite (attached below) of the original patch, which fixes at least one memory leak and addresses some of the issues that i mentioned in this thread (abuse of macros, performance, behaviour on errors, etc.). (i haven't looked up yet the original copyright but i guess it is from netbsd...) cheers luigi Index: src/sys/kern/vfs_lookup.c =================================================================== --- src/sys/kern/vfs_lookup.c (revision 181995) +++ src/sys/kern/vfs_lookup.c (working copy) @@ -46,6 +46,7 @@ #include #include #include +#include // XXX symlinks #include #include #include @@ -88,6 +89,123 @@ } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_SECOND, nameiinit, NULL); +#ifdef MAGICLINKS +static int vfs_magiclinks = 1; +#else +static int vfs_magiclinks = 1; +#endif +SYSCTL_INT(_vfs, OID_AUTO, magiclinks, CTLFLAG_RW, &vfs_magiclinks, 0, + "Whether \"magic\" symlinks are expanded"); + +/* looks up a string returns the match len or 0 */ +static int +s_match(const char *key, int keylen, const char *haystack, const char *end) +{ + if (haystack + keylen >= end || haystack[keylen] != '}') + return 0; + if (strncmp(key, haystack, keylen)) + return 0; + return keylen; +} +#define MATCH(str) s_match(str, sizeof(str) - 1, src, end) + +static char * +s_subst(char *dst, const char *max, const char *value, int len) +{ + if (value == dst) { /* already copied, locate end of string */ + while (*dst) + dst++; + return dst; + } + /* check size, copy and replace */ + if (dst + len > max) /* overflow */ + return NULL; + bcopy(value, dst, len); + dst += len; + return dst; +} + +/* + * Substitute replacement text for 'magic' strings in symlinks. + * Looks for "@{string}", where is a + * recognized 'magic' string. Replaces the original with the + * appropriate replacement text. (Note that in some cases the + * replacement text may have zero length.) + * Assume *len is at least 3. + */ +static void +symlink_magic(struct thread *td, char *cp, int *len) +{ + char *src, *dst, *tmp, *end = cp + *len, *max; + int change = 0; + + /* quick return if nothing to replace */ + for (src = cp; src < end - 1; src++) { + if (src[0] == '@' && src[1] == '{') + break; + } + if (src == end - 1) /* no replacement */ + return; + + /* allocate a buffer for the replacement */ + dst = tmp = uma_zalloc(namei_zone, M_WAITOK); + if (dst == NULL) { /* no space for replacement */ + printf("zalloc fail in %s\n", __FUNCTION__); + return; + } + max = dst + MAXPATHLEN - 1; + for (src = cp; src < end - 1 && dst < max - 1;) { + int l; + if (src[0] != '@' || src[1] != '{') { + *dst++ = *src++; /* copy current char */ + continue; + } + src += 2; /* skip @{ */ + +printf("replace magic at %s\n", src); + /* + * The following checks should be ordered according + * to frequency of use. + */ + if ( (l = MATCH("machine_arch")) ) { + dst = s_subst(dst, max, MACHINE_ARCH, sizeof(MACHINE_ARCH) - 1); + } else if ( (l= MATCH("machine")) ) { + dst = s_subst(dst, max, MACHINE_ARCH, sizeof(MACHINE_ARCH) - 1); + } else if ( (l= MATCH("hostname")) ) { + getcredhostname(td->td_ucred, dst, max - dst); + dst = s_subst(dst, max, dst, 0); + } else if ( (l= MATCH("osrelease")) ) { + dst = s_subst(dst, max, osrelease, strlen(osrelease)); + } else if ( (l= MATCH("kernel_ident")) ) { + dst = s_subst(dst, max, kern_ident, strlen(kern_ident)); + } else if ( (l= MATCH("domainname")) ) { + dst = s_subst(dst, max, domainname, strlen(domainname)); + } else if ( (l= MATCH("ostype")) ) { + dst = s_subst(dst, max, ostype, strlen(ostype)); + } + if (dst == NULL) /* overflow */ + break; + if (l == 0) { /* no match, restore original */ + *dst++ = '@'; + *dst++ = '{'; + continue; + } + /* otherwise skip original name and } */ + src += l + 1; + change = 1; + } + if (change && dst) { + if (src < end) /* copy last char */ + *dst++ = *src; + *dst = '\0'; + printf("translating into %s\n", tmp); + *len = dst - tmp; + bcopy(tmp, cp, *len); + } + uma_zfree(namei_zone, tmp); +} +#undef MATCH + #ifdef LOOKUP_SHARED static int lookup_shared = 1; #else @@ -284,6 +402,8 @@ error = ENOENT; break; } + if (vfs_magiclinks && linklen >3) /* at least @{} in the symlink */ + symlink_magic(td, cp, &linklen); if (linklen + ndp->ni_pathlen >= MAXPATHLEN) { if (ndp->ni_pathlen > 1) uma_zfree(namei_zone, cp); From brooks at freebsd.org Fri Aug 22 14:59:44 2008 From: brooks at freebsd.org (Brooks Davis) Date: Fri Aug 22 14:59:51 2008 Subject: Magic symlinks redux In-Reply-To: References: Message-ID: <20080822150020.GA57443@lor.one-eyed-alien.net> On Fri, Aug 22, 2008 at 01:54:29AM +0200, Ivan Voras wrote: > I was reading about new things in NetBSD, and one thing caught my > attention: per-user /tmp. See > http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080714_0251.html for > example. > > Google says that a discussion about magic symlinks happens every now and > then in FreeBSD but nothing really gets done. I found this implementation > which looks like it's for 7.0: > > http://butcher.heavennet.ru/patches/kernel/magiclinks/ > > As far as I understand the VFS (which isn't much...) this looks like an > trivial patch, and it's compatible with NetBSD. Since I'm interested in > this (specifically for the per-user /tmp and maybe similar gadgetry), I'd > like to nurse this patch into the tree, if there are no objections (of > course, I'll bug anyone I can find who knows VFS to review it :) ). I have an implementation derived from Andrey's port of the DragonFly implementation which will be committed in the next month or two. We discussed it in detail at the dev summit and subject to a few more changes and cleanup, it's ready to go. It allows significantly more flexibility than the NetBSD approach while avoiding many of the pitfalls involved in variant symlinks. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/3913b43d/attachment.pgp From brooks at freebsd.org Fri Aug 22 15:06:52 2008 From: brooks at freebsd.org (Brooks Davis) Date: Fri Aug 22 15:06:59 2008 Subject: Magic symlinks redux In-Reply-To: <20080822145616.GA61094@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> <20080822145616.GA61094@onelab2.iet.unipi.it> Message-ID: <20080822150728.GB57443@lor.one-eyed-alien.net> On Fri, Aug 22, 2008 at 04:56:16PM +0200, Luigi Rizzo wrote: > On Fri, Aug 22, 2008 at 02:05:26PM +0200, Christian Brueffer wrote: > > On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: > ... > > > This patch is huge. As far as I can tell DragonflyBSD has a whole > > > framework dedicated to varsyms, spread across a fair part of the kernel > > > and with at least one special userland utility. It allows the operator > > > to define his own variables that can be used in the substitutions, and I > > > don't see that it predefines "special" variables like "uid" and > > > "hostname". It's not necessarily a bad solution but I consider it overkill. > ... > > Brooks has a varsym port in p4, see //depot/user/brooks/varsym/ > > this also seems to be based on Dragonfly's code, quite intrusive. This code adds one global symbol, one function call in the vfs code, and two pointers to struct proc. For that we get a system which is significantly more flexible than the NetBSD code. While the simplicity of the NetBSD code is somewhat attractive, the fact that variables can not be defined renders it useless for my purposes which are providing partial file system virtulization for computing job/sessions where I need to key off of externally derived job IDs or job specific temporary paths. -- Brooks > I am playing with a rewrite (attached below) of the original patch, > which fixes at least one memory leak and addresses some of the > issues that i mentioned in this thread (abuse of macros, performance, > behaviour on errors, etc.). > > (i haven't looked up yet the original copyright but i guess it > is from netbsd...) > > cheers > luigi > > Index: src/sys/kern/vfs_lookup.c > =================================================================== > --- src/sys/kern/vfs_lookup.c (revision 181995) > +++ src/sys/kern/vfs_lookup.c (working copy) > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include // XXX symlinks > #include > #include > #include > @@ -88,6 +89,123 @@ > } > SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_SECOND, nameiinit, NULL); > > +#ifdef MAGICLINKS > +static int vfs_magiclinks = 1; > +#else > +static int vfs_magiclinks = 1; > +#endif > +SYSCTL_INT(_vfs, OID_AUTO, magiclinks, CTLFLAG_RW, &vfs_magiclinks, 0, > + "Whether \"magic\" symlinks are expanded"); > + > +/* looks up a string returns the match len or 0 */ > +static int > +s_match(const char *key, int keylen, const char *haystack, const char *end) > +{ > + if (haystack + keylen >= end || haystack[keylen] != '}') > + return 0; > + if (strncmp(key, haystack, keylen)) > + return 0; > + return keylen; > +} > +#define MATCH(str) s_match(str, sizeof(str) - 1, src, end) > + > +static char * > +s_subst(char *dst, const char *max, const char *value, int len) > +{ > + if (value == dst) { /* already copied, locate end of string */ > + while (*dst) > + dst++; > + return dst; > + } > + /* check size, copy and replace */ > + if (dst + len > max) /* overflow */ > + return NULL; > + bcopy(value, dst, len); > + dst += len; > + return dst; > +} > + > +/* > + * Substitute replacement text for 'magic' strings in symlinks. > + * Looks for "@{string}", where is a > + * recognized 'magic' string. Replaces the original with the > + * appropriate replacement text. (Note that in some cases the > + * replacement text may have zero length.) > + * Assume *len is at least 3. > + */ > +static void > +symlink_magic(struct thread *td, char *cp, int *len) > +{ > + char *src, *dst, *tmp, *end = cp + *len, *max; > + int change = 0; > + > + /* quick return if nothing to replace */ > + for (src = cp; src < end - 1; src++) { > + if (src[0] == '@' && src[1] == '{') > + break; > + } > + if (src == end - 1) /* no replacement */ > + return; > + > + /* allocate a buffer for the replacement */ > + dst = tmp = uma_zalloc(namei_zone, M_WAITOK); > + if (dst == NULL) { /* no space for replacement */ > + printf("zalloc fail in %s\n", __FUNCTION__); > + return; > + } > + max = dst + MAXPATHLEN - 1; > + for (src = cp; src < end - 1 && dst < max - 1;) { > + int l; > + if (src[0] != '@' || src[1] != '{') { > + *dst++ = *src++; /* copy current char */ > + continue; > + } > + src += 2; /* skip @{ */ > + > +printf("replace magic at %s\n", src); > + /* > + * The following checks should be ordered according > + * to frequency of use. > + */ > + if ( (l = MATCH("machine_arch")) ) { > + dst = s_subst(dst, max, MACHINE_ARCH, sizeof(MACHINE_ARCH) - 1); > + } else if ( (l= MATCH("machine")) ) { > + dst = s_subst(dst, max, MACHINE_ARCH, sizeof(MACHINE_ARCH) - 1); > + } else if ( (l= MATCH("hostname")) ) { > + getcredhostname(td->td_ucred, dst, max - dst); > + dst = s_subst(dst, max, dst, 0); > + } else if ( (l= MATCH("osrelease")) ) { > + dst = s_subst(dst, max, osrelease, strlen(osrelease)); > + } else if ( (l= MATCH("kernel_ident")) ) { > + dst = s_subst(dst, max, kern_ident, strlen(kern_ident)); > + } else if ( (l= MATCH("domainname")) ) { > + dst = s_subst(dst, max, domainname, strlen(domainname)); > + } else if ( (l= MATCH("ostype")) ) { > + dst = s_subst(dst, max, ostype, strlen(ostype)); > + } > + if (dst == NULL) /* overflow */ > + break; > + if (l == 0) { /* no match, restore original */ > + *dst++ = '@'; > + *dst++ = '{'; > + continue; > + } > + /* otherwise skip original name and } */ > + src += l + 1; > + change = 1; > + } > + if (change && dst) { > + if (src < end) /* copy last char */ > + *dst++ = *src; > + *dst = '\0'; > + printf("translating into %s\n", tmp); > + *len = dst - tmp; > + bcopy(tmp, cp, *len); > + } > + uma_zfree(namei_zone, tmp); > +} > +#undef MATCH > + > #ifdef LOOKUP_SHARED > static int lookup_shared = 1; > #else > @@ -284,6 +402,8 @@ > error = ENOENT; > break; > } > + if (vfs_magiclinks && linklen >3) /* at least @{} in the symlink */ > + symlink_magic(td, cp, &linklen); > if (linklen + ndp->ni_pathlen >= MAXPATHLEN) { > if (ndp->ni_pathlen > 1) > uma_zfree(namei_zone, cp); > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/0f008445/attachment.pgp From ivoras at freebsd.org Fri Aug 22 15:29:47 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 15:29:53 2008 Subject: Magic symlinks redux In-Reply-To: <20080822150020.GA57443@lor.one-eyed-alien.net> References: <20080822150020.GA57443@lor.one-eyed-alien.net> Message-ID: <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> 2008/8/22 Brooks Davis : > I have an implementation derived from Andrey's port of the DragonFly > implementation which will be committed in the next month or two. We > discussed it in detail at the dev summit and subject to a few more > changes and cleanup, it's ready to go. It allows significantly more > flexibility than the NetBSD approach while avoiding many of the pitfalls > involved in variant symlinks. Does it also support special automatic variables like uid, hostname? From brooks at freebsd.org Fri Aug 22 15:39:08 2008 From: brooks at freebsd.org (Brooks Davis) Date: Fri Aug 22 15:39:59 2008 Subject: Magic symlinks redux In-Reply-To: <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> Message-ID: <20080822153945.GC57443@lor.one-eyed-alien.net> On Fri, Aug 22, 2008 at 05:02:31PM +0200, Ivan Voras wrote: > 2008/8/22 Brooks Davis : > > > I have an implementation derived from Andrey's port of the DragonFly > > implementation which will be committed in the next month or two. We > > discussed it in detail at the dev summit and subject to a few more > > changes and cleanup, it's ready to go. It allows significantly more > > flexibility than the NetBSD approach while avoiding many of the pitfalls > > involved in variant symlinks. > > Does it also support special automatic variables like uid, hostname? No it does not. There are two reasons for this. First, it's basically pointless since you can set system wide variables for things like hostname and I have login_conf support to set things like uid or uname variables. Second, consider all the implications of @uid in the context of setuid binaries. This is hard to reason about and easy to get wrong. As a result, I feel a model where variables are set per process and follow fork is much less prone to error. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/72888b1b/attachment.pgp From rizzo at iet.unipi.it Fri Aug 22 15:43:56 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 15:44:02 2008 Subject: Magic symlinks redux In-Reply-To: <20080822150728.GB57443@lor.one-eyed-alien.net> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> <20080822145616.GA61094@onelab2.iet.unipi.it> <20080822150728.GB57443@lor.one-eyed-alien.net> Message-ID: <20080822154631.GA61495@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 10:07:29AM -0500, Brooks Davis wrote: > On Fri, Aug 22, 2008 at 04:56:16PM +0200, Luigi Rizzo wrote: > > On Fri, Aug 22, 2008 at 02:05:26PM +0200, Christian Brueffer wrote: > > > On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: > > ... > > > > This patch is huge. As far as I can tell DragonflyBSD has a whole > > > > framework dedicated to varsyms, spread across a fair part of the kernel > > > > and with at least one special userland utility. It allows the operator > > > > to define his own variables that can be used in the substitutions, and I > > > > don't see that it predefines "special" variables like "uid" and > > > > "hostname". It's not necessarily a bad solution but I consider it overkill. > > ... > > > Brooks has a varsym port in p4, see //depot/user/brooks/varsym/ > > > > this also seems to be based on Dragonfly's code, quite intrusive. > > This code adds one global symbol, one function call in the vfs code, > and two pointers to struct proc. For that we get a system which is > significantly more flexible than the NetBSD code. > > While the simplicity of the NetBSD code is somewhat attractive, the > fact that variables can not be defined renders it useless for my > purposes which are providing partial file system virtulization for > computing job/sessions where I need to key off of externally derived job > IDs or job specific temporary paths. understood -- it's just that the difference in code size is impressive. Do you know how much of it is used to implement the "varsym" subsystem (user- or system-wide variables) and how much is the core name translation ? cheers luigi From brooks at freebsd.org Fri Aug 22 15:50:50 2008 From: brooks at freebsd.org (Brooks Davis) Date: Fri Aug 22 15:51:00 2008 Subject: Magic symlinks redux In-Reply-To: <20080822154631.GA61495@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> <20080822145616.GA61094@onelab2.iet.unipi.it> <20080822150728.GB57443@lor.one-eyed-alien.net> <20080822154631.GA61495@onelab2.iet.unipi.it> Message-ID: <20080822155126.GD57443@lor.one-eyed-alien.net> On Fri, Aug 22, 2008 at 05:46:31PM +0200, Luigi Rizzo wrote: > On Fri, Aug 22, 2008 at 10:07:29AM -0500, Brooks Davis wrote: > > On Fri, Aug 22, 2008 at 04:56:16PM +0200, Luigi Rizzo wrote: > > > On Fri, Aug 22, 2008 at 02:05:26PM +0200, Christian Brueffer wrote: > > > > On Fri, Aug 22, 2008 at 12:24:41PM +0200, Ivan Voras wrote: > > > ... > > > > > This patch is huge. As far as I can tell DragonflyBSD has a whole > > > > > framework dedicated to varsyms, spread across a fair part of the kernel > > > > > and with at least one special userland utility. It allows the operator > > > > > to define his own variables that can be used in the substitutions, and I > > > > > don't see that it predefines "special" variables like "uid" and > > > > > "hostname". It's not necessarily a bad solution but I consider it overkill. > > > ... > > > > Brooks has a varsym port in p4, see //depot/user/brooks/varsym/ > > > > > > this also seems to be based on Dragonfly's code, quite intrusive. > > > > This code adds one global symbol, one function call in the vfs code, > > and two pointers to struct proc. For that we get a system which is > > significantly more flexible than the NetBSD code. > > > > While the simplicity of the NetBSD code is somewhat attractive, the > > fact that variables can not be defined renders it useless for my > > purposes which are providing partial file system virtulization for > > computing job/sessions where I need to key off of externally derived job > > IDs or job specific temporary paths. > > understood -- it's just that the difference in code size is impressive. > > Do you know how much of it is used to implement the "varsym" > subsystem (user- or system-wide variables) and how much is the > core name translation ? Most of it is maintaining the lists of variables, handling the system calls to read and write them, and doing the lookup with the correct locking context. The basic match routine is about the same size, though if you get a hit it's somewhat more expensive since you have to walk up to three lists (two in what's in p4 at the moment, but I'm currently splitting the per-proc code into privileged and un-privileged sets to the administrator can add values to processes that later owners can't modify) to resolve the variable name. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/06b86e5b/attachment.pgp From ivoras at freebsd.org Fri Aug 22 15:53:59 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 22 15:54:05 2008 Subject: Magic symlinks redux In-Reply-To: <20080822153945.GC57443@lor.one-eyed-alien.net> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> Message-ID: <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> 2008/8/22 Brooks Davis : > On Fri, Aug 22, 2008 at 05:02:31PM +0200, Ivan Voras wrote: >> 2008/8/22 Brooks Davis : >> >> > I have an implementation derived from Andrey's port of the DragonFly >> > implementation which will be committed in the next month or two. We >> > discussed it in detail at the dev summit and subject to a few more >> > changes and cleanup, it's ready to go. It allows significantly more >> > flexibility than the NetBSD approach while avoiding many of the pitfalls >> > involved in variant symlinks. >> >> Does it also support special automatic variables like uid, hostname? > > No it does not. There are two reasons for this. First, it's basically > pointless since you can set system wide variables for things like > hostname and I have login_conf support to set things like uid or uname > variables. Second, consider all the implications of @uid in the context > of setuid binaries. This is hard to reason about and easy to get wrong. > As a result, I feel a model where variables are set per process and > follow fork is much less prone to error. Firstly, it might be useless for your purpose but there are others. If you read the NetBSD's documentation about magiclinks, you'll see this set of supported variables: @domainname Expands to the machine's domain name, as set by setdomainname(3). @hostname Expands to the machine's host name, as set by sethostname(3). @emul Expands to the name of the current process's emulation. @kernel_ident Expands to the name of the config(1) file used to generate the running kernel. @machine Expands to the value of MACHINE for the system (equivalent to the output of ``uname -m''). @machine_arch Expands to the value of MACHINE_ARCH for the system (equivalent to the output of ``uname -p''). @osrelease Expands to the operating system release of the running kernel (equivalent to the output of ``uname -r''). @ostype Expands to the operating system type of the running kernel (equivalent to the output of ``uname -s''). This will always be ``NetBSD'' on NetBSD systems. @ruid Exapnds to the real user-id of the process. @uid Expands to the effective user-id of the process. Many of those are static and can be set on boot, but not all of them - for example machine and machine_arch may be different when running 32-bit processes on 64-bit machines. Domainname and hostname are different in jails. Your example with uid is solved just like in userland (though the names are messed up) and reflect getuid() and geteuid(). Anyway, if the DFBSD framework is properly implemented, it shouldn't be hard to add these variables. If you don't want to, I volunteer. (I don't care about the syntax: @{something} vs ${something}, though I think NetBSD made the better choice since these variables are not accessing the process environment). From brooks at freebsd.org Fri Aug 22 16:12:37 2008 From: brooks at freebsd.org (Brooks Davis) Date: Fri Aug 22 16:12:44 2008 Subject: Magic symlinks redux In-Reply-To: <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> Message-ID: <20080822161314.GE57443@lor.one-eyed-alien.net> On Fri, Aug 22, 2008 at 05:53:58PM +0200, Ivan Voras wrote: > 2008/8/22 Brooks Davis : > > On Fri, Aug 22, 2008 at 05:02:31PM +0200, Ivan Voras wrote: > >> 2008/8/22 Brooks Davis : > >> > >> > I have an implementation derived from Andrey's port of the DragonFly > >> > implementation which will be committed in the next month or two. We > >> > discussed it in detail at the dev summit and subject to a few more > >> > changes and cleanup, it's ready to go. It allows significantly more > >> > flexibility than the NetBSD approach while avoiding many of the pitfalls > >> > involved in variant symlinks. > >> > >> Does it also support special automatic variables like uid, hostname? > > > > No it does not. There are two reasons for this. First, it's basically > > pointless since you can set system wide variables for things like > > hostname and I have login_conf support to set things like uid or uname > > variables. Second, consider all the implications of @uid in the context > > of setuid binaries. This is hard to reason about and easy to get wrong. > > As a result, I feel a model where variables are set per process and > > follow fork is much less prone to error. > > Firstly, it might be useless for your purpose but there are others. > > If you read the NetBSD's documentation about magiclinks, you'll see > this set of supported variables: > > @domainname Expands to the machine's domain name, as set by > setdomainname(3). > > @hostname Expands to the machine's host name, as set by > sethostname(3). > > @emul Expands to the name of the current process's emulation. > > @kernel_ident Expands to the name of the config(1) file used to generate > the running kernel. > > @machine Expands to the value of MACHINE for the system (equivalent > to the output of ``uname -m''). > > @machine_arch Expands to the value of MACHINE_ARCH for the system > (equivalent to the output of ``uname -p''). > > @osrelease Expands to the operating system release of the running > kernel (equivalent to the output of ``uname -r''). > > @ostype Expands to the operating system type of the running kernel > (equivalent to the output of ``uname -s''). This will > always be ``NetBSD'' on NetBSD systems. > > @ruid Exapnds to the real user-id of the process. > > @uid Expands to the effective user-id of the process. > > Many of those are static and can be set on boot, but not all of them - > for example machine and machine_arch may be different when running > 32-bit processes on 64-bit machines. Domainname and hostname are > different in jails. > > Your example with uid is solved just like in userland (though the > names are messed up) and reflect getuid() and geteuid(). Small changes to the file system namespace can easily lead to security issues when applications assume the namespace is static. This is particularly true for setuid binaries. > Anyway, if the DFBSD framework is properly implemented, it shouldn't > be hard to add these variables. If you don't want to, I volunteer. I'm not completely opposed to adding a static namespace for system wide variables. I'm not at all keen on the @ruid and @uid variables because I think they are risky. My current feeling is that I'd like to move ahead with my current implementation and then either add another namespace or add this off to the side mostly as is. > (I don't care about the syntax: @{something} vs ${something}, though I > think NetBSD made the better choice since these variables are not > accessing the process environment). This is something I've been debating. I've been leading toward something other than ${something}. Either @{} or %{} or else going all the way to something like %%something%%. I don't like the unanchored components netbsd uses. One other option we discussed at the devsummit was requiring that the first character of a variant symlink be special to reduce parsing overhead. I.e. requiring that variant symlinks start with @ or % or something. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/6b84dcc1/attachment.pgp From rizzo at iet.unipi.it Fri Aug 22 16:20:25 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 16:20:38 2008 Subject: Magic symlinks redux In-Reply-To: <20080822155126.GD57443@lor.one-eyed-alien.net> References: <20080822090448.GB57441@onelab2.iet.unipi.it> <48AE89DC.9080408@yandex.ru> <20080822120525.GA1366@haakonia.hitnet.RWTH-Aachen.DE> <20080822145616.GA61094@onelab2.iet.unipi.it> <20080822150728.GB57443@lor.one-eyed-alien.net> <20080822154631.GA61495@onelab2.iet.unipi.it> <20080822155126.GD57443@lor.one-eyed-alien.net> Message-ID: <20080822162259.GA61694@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 10:51:27AM -0500, Brooks Davis wrote: > On Fri, Aug 22, 2008 at 05:46:31PM +0200, Luigi Rizzo wrote: ... > > Do you know how much of it is used to implement the "varsym" > > subsystem (user- or system-wide variables) and how much is the > > core name translation ? > > Most of it is maintaining the lists of variables, handling the system > calls to read and write them, and doing the lookup with the correct > locking context. The basic match routine is about the same size, though > if you get a hit it's somewhat more expensive since you have to walk up > to three lists (two in what's in p4 at the moment, but I'm currently > splitting the per-proc code into privileged and un-privileged sets to > the administrator can add values to processes that later owners can't > modify) to resolve the variable name. so if i understand it well it could be committed as two separate pieces: one for the lookup and translation itself, and one for the variable management. The split would be very interesting for at least two reasons: 1. ease of porting/backporting/modularization e.g. think of the embedded case where one might want a limited or even no form of symlinks to save memory 2. maybe the code in charge of variable management can be replaced with some simpler instances (again for the embedded case, or e.g. to implement the NetBSD version of variant symlinks), reused for other purposes, or perhaps also integrated with other existing implementations. In fact i wonder about the following: we already have code to deal with kenv and sysctl, this would be a third mechanism that does a very similar thing, isn't there anything we can recycle from the existing ones ? Also, we do have a need to push tables of info in the kernel (e.g. list of PCI/USB ids, quirks tables and the like), maybe we can take this chance to make the varsym subsystem useful also within device drivers ? cheers luigi From imp at bsdimp.com Fri Aug 22 18:41:55 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Aug 22 18:42:19 2008 Subject: Magic symlinks redux In-Reply-To: <20080822162259.GA61694@onelab2.iet.unipi.it> References: <20080822154631.GA61495@onelab2.iet.unipi.it> <20080822155126.GD57443@lor.one-eyed-alien.net> <20080822162259.GA61694@onelab2.iet.unipi.it> Message-ID: <20080822.124019.-692152321.imp@bsdimp.com> In message: <20080822162259.GA61694@onelab2.iet.unipi.it> Luigi Rizzo writes: : Also, we do have a need to push tables of info in the kernel : (e.g. list of PCI/USB ids, quirks tables and the like), : maybe we can take this chance to make the varsym subsystem useful : also within device drivers ? No. what problem would this solve? Warner From rizzo at iet.unipi.it Fri Aug 22 19:34:22 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 19:34:29 2008 Subject: Magic symlinks redux In-Reply-To: <20080822.124019.-692152321.imp@bsdimp.com> References: <20080822154631.GA61495@onelab2.iet.unipi.it> <20080822155126.GD57443@lor.one-eyed-alien.net> <20080822162259.GA61694@onelab2.iet.unipi.it> <20080822.124019.-692152321.imp@bsdimp.com> Message-ID: <20080822193657.GB63527@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 12:40:19PM -0600, M. Warner Losh wrote: > In message: <20080822162259.GA61694@onelab2.iet.unipi.it> > Luigi Rizzo writes: > : Also, we do have a need to push tables of info in the kernel > : (e.g. list of PCI/USB ids, quirks tables and the like), > : maybe we can take this chance to make the varsym subsystem useful > : also within device drivers ? > > No. what problem would this solve? take e.g. uscanner (or several other devices, e.g. if_rl) where the only way to tell whether a device is supported or not is looking up a table of usb vendor/id (the same happens for many pci devices). in the simple cases you just need the id - a more complex one would use linux has a way (forget what the command name is) to add entries to the table at runtime, whereas on freebsd we need to patch&rebuild the module. if make this 'object store' thing (varsym) able to store arrays we could have device drivers scan the arrays (e.g. uscanner_id or if_rl_pciids etc.) to find out if it has a suitable string. one objection that is frequently raised is that randomly adding ids to a kernel table is a potential source of panics, but in the end, to do this you need root access so you could as well rm -rf / and make a similar if not worse damage. cheers luigi From imp at bsdimp.com Fri Aug 22 20:14:11 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Aug 22 20:14:24 2008 Subject: Magic symlinks redux In-Reply-To: <20080822193657.GB63527@onelab2.iet.unipi.it> References: <20080822162259.GA61694@onelab2.iet.unipi.it> <20080822.124019.-692152321.imp@bsdimp.com> <20080822193657.GB63527@onelab2.iet.unipi.it> Message-ID: <20080822.141312.732640662.imp@bsdimp.com> In message: <20080822193657.GB63527@onelab2.iet.unipi.it> Luigi Rizzo writes: : On Fri, Aug 22, 2008 at 12:40:19PM -0600, M. Warner Losh wrote: : > In message: <20080822162259.GA61694@onelab2.iet.unipi.it> : > Luigi Rizzo writes: : > : Also, we do have a need to push tables of info in the kernel : > : (e.g. list of PCI/USB ids, quirks tables and the like), : > : maybe we can take this chance to make the varsym subsystem useful : > : also within device drivers ? : > : > No. what problem would this solve? : : take e.g. uscanner (or several other devices, e.g. if_rl) : where the only way to tell whether a device : is supported or not is looking up a table of usb vendor/id : (the same happens for many pci devices). in the simple cases : you just need the id - a more complex one would use How is this related? I guess was my question. Also, this problem isn't just replacing a table in the kernel. The problem is "map this ID to that ID" because many drivers do special things for different IDs, and you have to specify the ID that it is compatible with. : linux has a way (forget what the command name is) to add entries : to the table at runtime, whereas on freebsd : we need to patch&rebuild the module. : : if make this 'object store' thing (varsym) able to store arrays we : could have device drivers scan the arrays (e.g. uscanner_id or : if_rl_pciids etc.) to find out if it has a suitable string. That's the wrong way to solve the problem. In FreeBSD there's no universal table on any bus except for PC Card. Until we have that, this solution can't happen. Each driver has its own ad-hoc way of doing this. Even in PC Card land, the size of the table isn't exported. The way that drivers are written today in FreeBSD, the bus has to provide this translation layer. There's really no other viable solution. If someone goes through and fixes all the important busses, then maybe this would be a needed service. : one objection that is frequently raised is that randomly adding ids : to a kernel table is a potential source of panics, but in the end, : to do this you need root access so you could as well rm -rf / and : make a similar if not worse damage. Panic! Well, I'm sold. Warner From rizzo at iet.unipi.it Fri Aug 22 21:38:44 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 21:38:51 2008 Subject: Magic symlinks redux In-Reply-To: <20080822.141312.732640662.imp@bsdimp.com> References: <20080822162259.GA61694@onelab2.iet.unipi.it> <20080822.124019.-692152321.imp@bsdimp.com> <20080822193657.GB63527@onelab2.iet.unipi.it> <20080822.141312.732640662.imp@bsdimp.com> Message-ID: <20080822214118.GA64725@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 02:13:12PM -0600, M. Warner Losh wrote: > In message: <20080822193657.GB63527@onelab2.iet.unipi.it> > Luigi Rizzo writes: > : On Fri, Aug 22, 2008 at 12:40:19PM -0600, M. Warner Losh wrote: > : > In message: <20080822162259.GA61694@onelab2.iet.unipi.it> > : > Luigi Rizzo writes: > : > : Also, we do have a need to push tables of info in the kernel > : > : (e.g. list of PCI/USB ids, quirks tables and the like), > : > : maybe we can take this chance to make the varsym subsystem useful > : > : also within device drivers ? > : > > : > No. what problem would this solve? > : > : take e.g. uscanner (or several other devices, e.g. if_rl) > : where the only way to tell whether a device > : is supported or not is looking up a table of usb vendor/id > : (the same happens for many pci devices). in the simple cases > : you just need the id - a more complex one would use > > How is this related? I guess was my question. background: the initial topic was variant symlinks, for which is desirable to have system- or user-definable variables that can be used during the translation -- so one could play tricks such as setting /tmp -> /tmpdir/${uid} and have private temp directories, and the like. To implement those variables you need a storage subsystem that is accessible from kernel, outside the process space, and supports multiple instances (per-user or per-process or perhaps both; the 'multiple instance' thing can be easily implemented with something with a hierarchical structure). It probably needs to be support very fast reads as it is accessed every time we need to translate a pathname involving variant symlinks. It doesn't need to be persistent. Dragonfly has implemented this storage subsystem with the 'varsym' command/subsystem. the connection: At least as i understand it, we already have plenty of storage subsystems in the OS that fit most of the above requirements (except perhaps the read cost) including the sysctl tree, the kenv tree, the /proc filesystem -- so i think it would be great to have a unified solution [at least as a backend] for all these purposes. I haven't looked at the internals of /proc or varsym so i don't know how suitable are them for our purposes. > Also, this problem isn't just replacing a table in the kernel. The > problem is "map this ID to that ID" because many drivers do special > things for different IDs, and you have to specify the ID that it is > compatible with. what's wrong with something like this driverparms.if_re.devid.0x11864300 = "0x04000000 D-Link DGE-528(T) Gigabit Ethernet Adapter" driverparms.if_re.devid.0x11868139 = "0x74800000 RealTek 8139c..." ... The subtree under if_re only needs to be known to the device, though some standardization would help of course -- e.g. we could have generic code to parse device ids of variable size using wildcards, and for the value field we could use xml encoding (if something more trivial does not fit -- but a barebone xml parser such as the one we need here fits in 5-10k of object code, i did that a few months ago , see http://info.iet.unipi.it/~luigi/FreeBSD/minixmlrpc/ Hooking a write callback on a node, e.g. to driverparms. if_re.devid would allow the driver to update its own internal table of descriptors upon a change, and retain the current match code (and speed) essentially unchanged in all drivers. > : linux has a way (forget what the command name is) to add entries > : to the table at runtime, whereas on freebsd > : we need to patch&rebuild the module. > : > : if make this 'object store' thing (varsym) able to store arrays we > : could have device drivers scan the arrays (e.g. uscanner_id or > : if_rl_pciids etc.) to find out if it has a suitable string. > > That's the wrong way to solve the problem. In FreeBSD there's no > universal table on any bus except for PC Card. Until we have that, > this solution can't happen. Each driver has its own ad-hoc way of > doing this. Even in PC Card land, the size of the table isn't > exported. > > The way that drivers are written today in FreeBSD, the bus has to > provide this translation layer. There's really no other viable > solution. If someone goes through and fixes all the important busses, > then maybe this would be a needed service. perhaps you are pointing to an ideal solution, which however would still require significant work on each driver to adapt the current, ad-hoc tables to the solution supplied by the bus. The approach i suggest above allows incremental deployment and i believe it still scales well. cheers luigi From imp at bsdimp.com Fri Aug 22 22:01:47 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Aug 22 22:01:54 2008 Subject: Magic symlinks redux In-Reply-To: <20080822214118.GA64725@onelab2.iet.unipi.it> References: <20080822193657.GB63527@onelab2.iet.unipi.it> <20080822.141312.732640662.imp@bsdimp.com> <20080822214118.GA64725@onelab2.iet.unipi.it> Message-ID: <20080822.160107.511563083.imp@bsdimp.com> In message: <20080822214118.GA64725@onelab2.iet.unipi.it> Luigi Rizzo writes: : > Also, this problem isn't just replacing a table in the kernel. The : > problem is "map this ID to that ID" because many drivers do special : > things for different IDs, and you have to specify the ID that it is : > compatible with. : : : what's wrong with something like this : : driverparms.if_re.devid.0x11864300 = "0x04000000 D-Link DGE-528(T) Gigabit Ethernet Adapter" : driverparms.if_re.devid.0x11868139 = "0x74800000 RealTek 8139c..." : ... : : The subtree under if_re only needs to be known to the device, though : some standardization would help of course -- e.g. we could have generic : code to parse device ids of variable size using wildcards, and for : the value field we could use xml encoding (if something more trivial : does not fit -- but a barebone xml parser such as the one we need here : fits in 5-10k of object code, i did that a few months ago , see : : http://info.iet.unipi.it/~luigi/FreeBSD/minixmlrpc/ : : Hooking a write callback on a node, e.g. to driverparms. : if_re.devid would allow the driver to update its own internal : table of descriptors upon a change, and retain the current match : code (and speed) essentially unchanged in all drivers. You're going to have to give me a much more detailed description than this, because the driver getting a callback to update its information is very handwavy. And drivers wishing to do this would need to do a lot of work to make sure that their tables are dynamic. Today, they are static, or in code (eg, switch statements). this sounds like a very complex solution to the problem, without really a clear vision for how it would draw together different devices. : > : linux has a way (forget what the command name is) to add entries : > : to the table at runtime, whereas on freebsd : > : we need to patch&rebuild the module. : > : : > : if make this 'object store' thing (varsym) able to store arrays we : > : could have device drivers scan the arrays (e.g. uscanner_id or : > : if_rl_pciids etc.) to find out if it has a suitable string. : > : > That's the wrong way to solve the problem. In FreeBSD there's no : > universal table on any bus except for PC Card. Until we have that, : > this solution can't happen. Each driver has its own ad-hoc way of : > doing this. Even in PC Card land, the size of the table isn't : > exported. : > : > The way that drivers are written today in FreeBSD, the bus has to : > provide this translation layer. There's really no other viable : > solution. If someone goes through and fixes all the important busses, : > then maybe this would be a needed service. : : perhaps you are pointing to an ideal solution, which however would : still require significant work on each driver to adapt the current, : ad-hoc tables to the solution supplied by the bus. : : The approach i suggest above allows incremental deployment and i believe : it still scales well. Actually, the solution that I propose requires *NO* changes to any leaf drivers. None. It only requires changes in the bus level code to do the lookup and substitution. They are totally ignorant of the changes that are going on behind the scenes and can treat a new card just like a card they already support without even knowing that they are doing this. Since I don't understand your suggestion at all, I can't comment about how well it scales. From what I see, it is a disaster in that area as every driver that wants to participate in this would need to change. Warner From peterjeremy at optushome.com.au Fri Aug 22 22:14:45 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Fri Aug 22 22:14:52 2008 Subject: Magic symlinks redux In-Reply-To: <20080822090448.GB57441@onelab2.iet.unipi.it> References: <20080822090448.GB57441@onelab2.iet.unipi.it> Message-ID: <20080822093723.GY32539@server.vk2pj.dyndns.org> On 2008-Aug-22 11:04:49 +0200, Luigi Rizzo wrote: >Question - is the process' ENV easily available in this part >of the kernel, so one could in principle use environment variables >as replacement strings ? No. The original environment is in a known location but a process is free to reorganise its environment to be based in a different location. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080822/6058e6f6/attachment.pgp From rizzo at iet.unipi.it Fri Aug 22 22:48:45 2008 From: rizzo at iet.unipi.it (Luigi Rizzo) Date: Fri Aug 22 22:48:55 2008 Subject: Magic symlinks redux In-Reply-To: <20080822.160107.511563083.imp@bsdimp.com> References: <20080822193657.GB63527@onelab2.iet.unipi.it> <20080822.141312.732640662.imp@bsdimp.com> <20080822214118.GA64725@onelab2.iet.unipi.it> <20080822.160107.511563083.imp@bsdimp.com> Message-ID: <20080822225119.GA65119@onelab2.iet.unipi.it> On Fri, Aug 22, 2008 at 04:01:07PM -0600, M. Warner Losh wrote: > In message: <20080822214118.GA64725@onelab2.iet.unipi.it> > Luigi Rizzo writes: ... > : Hooking a write callback on a node, e.g. to driverparms. > : if_re.devid would allow the driver to update its own internal > : table of descriptors upon a change, and retain the current match > : code (and speed) essentially unchanged in all drivers. > > You're going to have to give me a much more detailed description than > this, because the driver getting a callback to update its information > is very handwavy. take if_re as an example. When if_re loads, it would call object_store_register("driverparms.if_re.devids", if_re_update_devid_table, if_re_devid_table_root); which in turn puts the string-function-arg tuple in a hash table using the string as a search key. Optionally, it could also call a routine to pre-fill the object store subtree with static content supplied by if_re.c or so. When (from the upper part of the kernel, so it can sleep etc.) the subtree is written to, the object store calls if_re_update_devid_table(if_re_devid_table_root) which in turn would scan the subtree using the api supplied by the object store itself, and rebuilds the table, quirks, whatever used by if_re for its own purposes. Clearly, this is specific for if_re. umass will likely have a more complex structure with quirks etc, uscanner is just a table of device ids, etc. > lot of work to make sure that their tables are dynamic. Today, they > are static, or in code (eg, switch statements). this sounds like a > very complex solution to the problem, without really a clear vision > for how it would draw together different devices. ... > : perhaps you are pointing to an ideal solution, which however would > : still require significant work on each driver to adapt the current, > : ad-hoc tables to the solution supplied by the bus. > : > : The approach i suggest above allows incremental deployment and i believe > : it still scales well. > > Actually, the solution that I propose requires *NO* changes to any > leaf drivers. None. It only requires changes in the bus level code > to do the lookup and substitution. They are totally ignorant of the > changes that are going on behind the scenes and can treat a new card > just like a card they already support without even knowing that they > are doing this. sorry but now i am the one who doesn't understand how you can move, with *NO* changes to the leaf drivers, from a bunch of drivers using ad-hoc solutions (static tables with variable number of fields, or lookups hardwired in the code, which don't use just the vendor/device fields but also other info e.g. subdevice as in if_re) to one that relies on the bus code for the matching. At the very least you need to replace the part of the *_probe routines with something that uses the bus routines -- and implement something that lets you manipulate the mapping/quirks table at runtime so that if a compatible device with different IDs comes out you don't have to recompile and reload a module. That's the same kind of changes that I expect to be necessary with the way I have in mind. i am sorry i cannot expand this more as i am about to leave for holidays, but will try to come up with some proof-of-concept code when i am back. cheers luigi From imp at bsdimp.com Sat Aug 23 01:54:20 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sat Aug 23 01:54:30 2008 Subject: Magic symlinks redux In-Reply-To: <20080823013912.GA19588@epsilon.local> References: <20080822.160107.511563083.imp@bsdimp.com> <20080822225119.GA65119@onelab2.iet.unipi.it> <20080823013912.GA19588@epsilon.local> Message-ID: <20080822.195018.1129789600.imp@bsdimp.com> In message: <20080823013912.GA19588@epsilon.local> Rui Paulo writes: : On Sat, Aug 23, 2008 at 12:51:19AM +0200, Luigi Rizzo wrote: : > sorry but now i am the one who doesn't understand how you can move, : : [snip] : : I think what Warner was saying was that the BUS code can do a mapping of : device IDs from something not known to the driver to something know by the : driver. : : Take, for example, if_re. if_re knows how to support devid 0x1234. : A new device comes out that works exactly the same way as device 0x1234, but : the the device ID is 0x4567. If we change the BUS code to map devid 0x4567 to : 0x1234, we don't need to change anything in the if_re driver. We just changed : the BUS code with no change to the leaf driver. : : If there is a new device with ID 0x9912 that needs modifications in if_re, we : are basically busted and need to change if_re itself. In this scenario, : we don't change anything in the BUS code because that would be pointless. : : I hope this is what Warner was trying to say. Yes. The bus code would have a mapping table. If you have to do more than just say 'treat this like this other thing' you'll need to hack the driver anyway, so you'd save nothing with a fancier solution that might allow this... In the above example, the pci code would read the device id. It would see it is 0x4567 and lie to the driver saying it is really 0x1234. The driver then matches this, and treats it exactly as if it were a 0x1234, since that's what it thinks the card is. Most of the drivers in the tree support a variety of cards, so some way of telling it which one to use is needed.... There are some complicated drivers that know all the errata for this or that rev of the chip. I'm not sure how those would work out in practice. However, those drivers almost always need tweaks for new parts... Warner From rpaulo at FreeBSD.org Sat Aug 23 02:06:21 2008 From: rpaulo at FreeBSD.org (Rui Paulo) Date: Sat Aug 23 02:07:03 2008 Subject: Magic symlinks redux In-Reply-To: <20080822225119.GA65119@onelab2.iet.unipi.it> References: <20080822193657.GB63527@onelab2.iet.unipi.it> <20080822.141312.732640662.imp@bsdimp.com> <20080822214118.GA64725@onelab2.iet.unipi.it> <20080822.160107.511563083.imp@bsdimp.com> <20080822225119.GA65119@onelab2.iet.unipi.it> Message-ID: <20080823013912.GA19588@epsilon.local> On Sat, Aug 23, 2008 at 12:51:19AM +0200, Luigi Rizzo wrote: > sorry but now i am the one who doesn't understand how you can move, [snip] I think what Warner was saying was that the BUS code can do a mapping of device IDs from something not known to the driver to something know by the driver. Take, for example, if_re. if_re knows how to support devid 0x1234. A new device comes out that works exactly the same way as device 0x1234, but the the device ID is 0x4567. If we change the BUS code to map devid 0x4567 to 0x1234, we don't need to change anything in the if_re driver. We just changed the BUS code with no change to the leaf driver. If there is a new device with ID 0x9912 that needs modifications in if_re, we are basically busted and need to change if_re itself. In this scenario, we don't change anything in the BUS code because that would be pointless. I hope this is what Warner was trying to say. Regards, -- Rui Paulo From kabaev at gmail.com Sat Aug 23 02:06:39 2008 From: kabaev at gmail.com (Alexander Kabaev) Date: Sat Aug 23 02:07:03 2008 Subject: Need a code review In-Reply-To: <20080729.161303.709402272.imp@bsdimp.com> References: <20080729.161303.709402272.imp@bsdimp.com> Message-ID: <20080822213505.1993beda@kan.dnsalias.net> On Tue, 29 Jul 2008 16:13:03 -0600 (MDT) "M. Warner Losh" wrote: > Greetings, > > The FreeBSD/mips efforts are getting close. We're down to 4 patches > against the main tree, divided up among different programs: cc, > binutils, libpam and the CDDL stuff for zfs. > > http://people.freebsd.org/~gonzo/mips2/binutils.diff > http://people.freebsd.org/~gonzo/mips2/cc.diff > http://people.freebsd.org/~gonzo/mips2/cddl.diff > http://people.freebsd.org/~gonzo/mips2/libpam.diff > > If you have an interest in any of these area, or would like to provide > feedback on the patches, now would be a good time to do so. :-) > > We'd like to commit these patches to the tree by the end of next week, > if at all possible. If you are a maintainer of this software, we'd > especially like to get feedback from you on these patches. If we > don't hear back from you, we'll assume that you are fine with them :-) > > Warner cc.diff part is OK, except that files we copy from vendor intact should be marked as such. Ideally, by putting their pristine versions in /vendor and branching into head/src/contrib/gcc. Some comments in new FreeBSD files still claim they are for NetBSD/mips. +#ifdef HANDLE_PRAGMA_PACK_PUSH_POP +#undef HANDLE_PRAGMA_PACK_PUSH_POP +#endif #define HANDLE_PRAGMA_PACK_PUSH_POP 1 Can this be rewritten as #ifndef HANDLE_PRAGMA_PACK_PUSH_POP #define HANDLE_PRAGMA_PACK_PUSH_POP 1 #endif ? -- Alexander Kabaev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080823/aa7ec215/signature.pgp From imp at bsdimp.com Sat Aug 23 02:07:43 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sat Aug 23 02:07:50 2008 Subject: Magic symlinks redux In-Reply-To: <20080823013912.GA19588@epsilon.local> References: <20080822.160107.511563083.imp@bsdimp.com> <20080822225119.GA65119@onelab2.iet.unipi.it> <20080823013912.GA19588@epsilon.local> Message-ID: <20080822.200511.1137957320.imp@bsdimp.com> In message: <20080823013912.GA19588@epsilon.local> : I hope this is what Warner was trying to say. More or less the following, with a less lame way of getting the table into the kernel, and maybe more fields than vendor/device.... The reason this works is that the pci_get_vendor and pci_get_device read out of the area pointed to by cfg. Warner Index: pci.c =================================================================== --- pci.c (revision 182024) +++ pci.c (working copy) @@ -419,6 +419,33 @@ #undef REG } +static struct pci_remap_entry +{ + uint16_t vendor; + uint16_t device; + uint16_t mapped_vendor; + uint16_t mapped_device; +} pci_remap[] = +{ + { 0x1039, 0x0901, 0x1039, 0x0900 } /* Map sis 901 to sis 900 */ +}; +static int pci_remap_entries = 1; + +static void +pci_apply_remap_table(pcicfgregs *cfg) +{ + int i; + + for (i = 0; i < pci_remap_entries; i++) { + if (cfg->vendor == pci_remap[i].vendor && + cfg->device == pci_remap[i].device) { + cfg->vendor = pci_remap[i].mapped_vendor; + cfg->device = pci_remap[i].mapped_device; + return; + } + } +} + /* read configuration header into pcicfgregs structure */ struct pci_devinfo * pci_read_device(device_t pcib, int d, int b, int s, int f, size_t size) @@ -465,6 +492,7 @@ pci_fixancient(cfg); pci_hdrtypedata(pcib, b, s, f, cfg); + pci_apply_remap_table(cfg); if (REG(PCIR_STATUS, 2) & PCIM_STATUS_CAPPRESENT) pci_read_extcap(pcib, cfg); From jhb at freebsd.org Sat Aug 23 03:32:24 2008 From: jhb at freebsd.org (John Baldwin) Date: Sat Aug 23 03:32:30 2008 Subject: Magic symlinks redux In-Reply-To: <20080822.200511.1137957320.imp@bsdimp.com> References: <20080822.160107.511563083.imp@bsdimp.com> <20080823013912.GA19588@epsilon.local> <20080822.200511.1137957320.imp@bsdimp.com> Message-ID: <200808222241.52325.jhb@freebsd.org> On Friday 22 August 2008 10:05:11 pm M. Warner Losh wrote: > In message: <20080823013912.GA19588@epsilon.local> > > : I hope this is what Warner was trying to say. > > More or less the following, with a less lame way of getting the table > into the kernel, and maybe more fields than vendor/device.... > > The reason this works is that the pci_get_vendor and pci_get_device > read out of the area pointed to by cfg. > > Warner > > Index: pci.c > =================================================================== > --- pci.c (revision 182024) > +++ pci.c (working copy) > @@ -419,6 +419,33 @@ > #undef REG > } > > +static struct pci_remap_entry > +{ > + uint16_t vendor; > + uint16_t device; > + uint16_t mapped_vendor; > + uint16_t mapped_device; > +} pci_remap[] = > +{ > + { 0x1039, 0x0901, 0x1039, 0x0900 } /* Map sis 901 to sis 900 */ > +}; > +static int pci_remap_entries = 1; > + > +static void > +pci_apply_remap_table(pcicfgregs *cfg) > +{ > + int i; > + > + for (i = 0; i < pci_remap_entries; i++) { > + if (cfg->vendor == pci_remap[i].vendor && > + cfg->device == pci_remap[i].device) { > + cfg->vendor = pci_remap[i].mapped_vendor; > + cfg->device = pci_remap[i].mapped_device; > + return; > + } > + } > +} > + > /* read configuration header into pcicfgregs structure */ > struct pci_devinfo * > pci_read_device(device_t pcib, int d, int b, int s, int f, size_t size) > @@ -465,6 +492,7 @@ > > pci_fixancient(cfg); > pci_hdrtypedata(pcib, b, s, f, cfg); > + pci_apply_remap_table(cfg); > > if (REG(PCIR_STATUS, 2) & PCIM_STATUS_CAPPRESENT) > pci_read_extcap(pcib, cfg); It might be nice to drive it by hints so users can tweak it on the fly. Maybe something like: hint.pci0...vendor=XXXXX Then users can simply add entries to /boot/loader.conf w/o needing any recompiles for new device IDs that the driver can handle using an existing device id. The lookup table you have still requires patching source somewhere which probably defeats the purpose. -- John Baldwin From gad at FreeBSD.org Sat Aug 23 03:36:06 2008 From: gad at FreeBSD.org (Garance A Drosehn) Date: Sat Aug 23 03:36:12 2008 Subject: Magic symlinks redux In-Reply-To: <20080822161314.GE57443@lor.one-eyed-alien.net> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: At 11:13 AM -0500 8/22/08, Brooks Davis wrote: >On Fri, Aug 22, 2008 at 05:53:58PM +0200, Ivan Voras wrote: > > > If you read the NetBSD's documentation about magiclinks, you'll > > see this set of supported variables: > > > > @domainname Expands to the machine's domain name, ... > > @hostname Expands to the machine's host name, ... > > @kernel_ident Expands to the name of the config(1) file ... > > @machine Expands to the value of MACHINE > > (equivalent to the output of ``uname -m''). > > @machine_arch Expands to the value of MACHINE_ARCH for > > the system (equivalent to the output of > > ``uname -p''). > > @osrelease Expands to the operating system release > > of the running kernel (equivalent to the > > output of ``uname -r''). > > @ostype Expands to the operating system type of > > the running kernel (equivalent to the > > output of ``uname -s''). This will > > always be ``NetBSD'' on NetBSD systems. > > > Many of those are static and can be set on boot, but not all > > of them - for example machine and machine_arch may be different > > when running 32-bit processes on 64-bit machines. Domainname > > and hostname are different in jails. I like the idea of having some of these mostly-static values, although (as you note), we should to think about how these might be effected within jails. I have jails (really chroot areas) which have different @osreleases than the running kernel, for instance. FWIW, I'd prefer to see the dragonfly-ish implementation over the netbsd-ish implementation. > > Your example with uid is solved just like in userland (though > > the names are messed up) and reflect getuid() and geteuid(). > >Small changes to the file system namespace can easily lead to security >issues when applications assume the namespace is static. This is >particularly true for setuid binaries. I am extremely uneasy about adding anything related to uid's or gid's, or similar dynamic values. I can't help but think tbat any case where this would be useful, it would also be very risky with-respect-to setuid() binaries. > > (I don't care about the syntax: @{something} vs ${something}, > > though I think NetBSD made the better choice since these > > variables are not accessing the process environment). > >This is something I've been debating. I've been leading toward >something other than ${something}. Either @{} or %{} or else >going all the way to something like %%something%%. I don't like >the unanchored components netbsd uses. This could easily degenerate into a bikeshed issue, but let me at least say that I think we should avoid "@varname". That's the syntax which AFS/OpenAFS/ARLA uses for their equivalent of variant filename paths, and I think it would be good if we avoid any confusion with that. >One other option we discussed at the devsummit was requiring that >the first character of a variant symlink be special to reduce >parsing overhead. I.e. requiring that variant symlinks start >with @ or % or something. I'd like something with both a right and left delimiter, both required. Something where the two delimiters are different, and easy to distinguish, and easy to "stack" (if for some reason we wanted to allow that). Stack in the sense of %{name1%{name2%}%}, where we can tell we would have to substitute the value for "name2" before doing anything with the "name1"-ish part. I don't know that we need to actually support that ability, but if we do (someday) want to support it, then I want delimiters where it would be obvious what order the substitutions should occur in. As far as the specific delimiters to pick, well, I'm sure anyone's paint is as good as mine. It will be nice to have this ability. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From imp at bsdimp.com Sat Aug 23 04:47:05 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sat Aug 23 04:47:12 2008 Subject: Magic symlinks redux In-Reply-To: <200808222241.52325.jhb@freebsd.org> References: <20080823013912.GA19588@epsilon.local> <20080822.200511.1137957320.imp@bsdimp.com> <200808222241.52325.jhb@freebsd.org> Message-ID: <20080822.224404.691670281.imp@bsdimp.com> In message: <200808222241.52325.jhb@freebsd.org> John Baldwin writes: : On Friday 22 August 2008 10:05:11 pm M. Warner Losh wrote: : > In message: <20080823013912.GA19588@epsilon.local> : > : > : I hope this is what Warner was trying to say. : > : > More or less the following, with a less lame way of getting the table : > into the kernel, and maybe more fields than vendor/device.... : > : > The reason this works is that the pci_get_vendor and pci_get_device : > read out of the area pointed to by cfg. ... : It might be nice to drive it by hints so users can tweak it on the fly. Maybe : something like: : : hint.pci0...vendor=XXXXX : : Then users can simply add entries to /boot/loader.conf w/o needing any : recompiles for new device IDs that the driver can handle using an existing : device id. : : The lookup table you have still requires patching source somewhere which : probably defeats the purpose. That's the whole "less lame of getting data into the kernel" I was talking about. The above was to show the concept, not an actual implementation of the data. I don't like the hint idea so much, but was looking for some other way to get the data into the kernel. Warner From imp at bsdimp.com Sat Aug 23 06:59:08 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Sat Aug 23 06:59:14 2008 Subject: Need a code review In-Reply-To: <20080822213505.1993beda@kan.dnsalias.net> References: <20080729.161303.709402272.imp@bsdimp.com> <20080822213505.1993beda@kan.dnsalias.net> Message-ID: <20080823.005656.1543769327.imp@bsdimp.com> In message: <20080822213505.1993beda@kan.dnsalias.net> Alexander Kabaev writes: : On Tue, 29 Jul 2008 16:13:03 -0600 (MDT) : "M. Warner Losh" wrote: : : > Greetings, : > : > The FreeBSD/mips efforts are getting close. We're down to 4 patches : > against the main tree, divided up among different programs: cc, : > binutils, libpam and the CDDL stuff for zfs. : > : > http://people.freebsd.org/~gonzo/mips2/binutils.diff : > http://people.freebsd.org/~gonzo/mips2/cc.diff : > http://people.freebsd.org/~gonzo/mips2/cddl.diff : > http://people.freebsd.org/~gonzo/mips2/libpam.diff : > : > If you have an interest in any of these area, or would like to provide : > feedback on the patches, now would be a good time to do so. :-) : > : > We'd like to commit these patches to the tree by the end of next week, : > if at all possible. If you are a maintainer of this software, we'd : > especially like to get feedback from you on these patches. If we : > don't hear back from you, we'll assume that you are fine with them :-) : > : > Warner : : cc.diff part is OK, except that files we copy from vendor intact should : be marked as such. Ideally, by putting their pristine versions : in /vendor and branching into head/src/contrib/gcc. Good idea. Now, if only I can figure out how to do that... : Some comments in new FreeBSD files still claim they are for NetBSD/mips. We can fix that :-) Good catch. : +#ifdef HANDLE_PRAGMA_PACK_PUSH_POP : +#undef HANDLE_PRAGMA_PACK_PUSH_POP : +#endif : #define HANDLE_PRAGMA_PACK_PUSH_POP 1 : : Can this be rewritten as : #ifndef HANDLE_PRAGMA_PACK_PUSH_POP : #define HANDLE_PRAGMA_PACK_PUSH_POP 1 : #endif No. The whole reason those changes were introduced was to quiet warnings that HANDLE_PRAGMA_PACK_PUSH_POP was redefined... We can omit them if they will cause problems. I wasn't real concerned about the warnings, but Randal Stewart fixed them since they bugged him. Maybe there's a better way? Any suggestions? Warner From ivoras at freebsd.org Sat Aug 23 08:16:16 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sat Aug 23 08:16:23 2008 Subject: Magic symlinks redux In-Reply-To: <20080822161314.GE57443@lor.one-eyed-alien.net> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: Brooks Davis wrote: > On Fri, Aug 22, 2008 at 05:53:58PM +0200, Ivan Voras wrote: >> Your example with uid is solved just like in userland (though the >> names are messed up) and reflect getuid() and geteuid(). > > Small changes to the file system namespace can easily lead to security > issues when applications assume the namespace is static. This is > particularly true for setuid binaries. > >> Anyway, if the DFBSD framework is properly implemented, it shouldn't >> be hard to add these variables. If you don't want to, I volunteer. > > I'm not completely opposed to adding a static namespace for system > wide variables. I'm not at all keen on the @ruid and @uid variables > because I think they are risky. My current feeling is that I'd like to > move ahead with my current implementation and then either add another > namespace or add this off to the side mostly as is. Ok, how about adding another sysctl enabling ruid and uid (perhaps change their name to uid and euid since NetBSD compatibility isn't maintained) which will be off by default? >> (I don't care about the syntax: @{something} vs ${something}, though I >> think NetBSD made the better choice since these variables are not >> accessing the process environment). > > This is something I've been debating. I've been leading toward something other > than ${something}. Either @{} or %{} or else going all the way to something > like %%something%%. Someone mentioned "@" clashes with AFS :( > I don't like the unanchored components netbsd uses. They could have an use case - see below: > One other option we discussed at the devsummit was requiring that the first > character of a variant symlink be special to reduce parsing overhead. I.e. > requiring that variant symlinks start with @ or % or something. I agree with this - it's elegant on the implementation side and performance hit would be minimal. I'd also be happy with abandoning the free form links and mandating that the entire component be one var symlink (i.e. "/path1/@var/path2" is ok but "/path1/@{path2}.@{path3}" isn't). If you'd implement that special starting character, how would the end-result look like? Something like "#path@{var}"? (for various values of "#")? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080823/9daad309/signature.pgp From ivoras at freebsd.org Sat Aug 23 08:22:40 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sat Aug 23 08:22:47 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: Garance A Drosehn wrote: > I like the idea of having some of these mostly-static values, > although (as you note), we should to think about how these might > be effected within jails. I have jails (really chroot areas) > which have different @osreleases than the running kernel, for > instance. This last case would be problematic since symlinks are resolved in kernel and the kernel can't really see the different userland releases. 64-bit call vs 32-bit is ok. > FWIW, I'd prefer to see the dragonfly-ish implementation over > the netbsd-ish implementation. > >> > Your example with uid is solved just like in userland (though >> > the names are messed up) and reflect getuid() and geteuid(). >> >> Small changes to the file system namespace can easily lead to security >> issues when applications assume the namespace is static. This is >> particularly true for setuid binaries. > > I am extremely uneasy about adding anything related to uid's or > gid's, or similar dynamic values. This argument pops up often without explanation. The only thing I can think of is applications using ".." on a dynamic symlink and ending up somewhere where it doesn't want to, but this can also be said for normal symlinks. Can anyone explain more possible security problems with having @uid in varsymlinks? > I can't help but think tbat > any case where this would be useful, it would also be very risky > with-respect-to setuid() binaries. I posted a nice use-case at the first post: per-user /tmp like in http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080714_0251.html . Of course it's a "nice to have", not a critical feature. >> > (I don't care about the syntax: @{something} vs ${something}, >> > though I think NetBSD made the better choice since these >> > variables are not accessing the process environment). >> >> This is something I've been debating. I've been leading toward >> something other than ${something}. Either @{} or %{} or else >> going all the way to something like %%something%%. I don't like >> the unanchored components netbsd uses. > > This could easily degenerate into a bikeshed issue, but let me > at least say that I think we should avoid "@varname". That's > the syntax which AFS/OpenAFS/ARLA uses for their equivalent of > variant filename paths, and I think it would be good if we avoid > any confusion with that. How about "@{varname}" ? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080823/7509f241/signature.pgp From des at des.no Sat Aug 23 09:27:41 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sat Aug 23 09:27:47 2008 Subject: Magic symlinks redux In-Reply-To: <20080822161314.GE57443@lor.one-eyed-alien.net> (Brooks Davis's message of "Fri, 22 Aug 2008 11:13:14 -0500") References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: <86ljyof5df.fsf@ds4.des.no> Brooks Davis writes: > One other option we discussed at the devsummit was requiring that the first > character of a variant symlink be special to reduce parsing overhead. I.e. > requiring that variant symlinks start with @ or % or something. A much tidier option (which was also discussed) is to set a bit on the cnp if the symlink is found to be variant when it first enters the cache. This way, we don't penalize non-variant symlinks that begin with a @ or %. DES -- Dag-Erling Sm?rgrav - des@des.no From kostikbel at gmail.com Sat Aug 23 12:25:21 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Aug 23 12:25:31 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: <20080823122515.GS1803@deviant.kiev.zoral.com.ua> On Sat, Aug 23, 2008 at 10:22:19AM +0200, Ivan Voras wrote: > Garance A Drosehn wrote: > > >I like the idea of having some of these mostly-static values, > >although (as you note), we should to think about how these might > >be effected within jails. I have jails (really chroot areas) > >which have different @osreleases than the running kernel, for > >instance. > > This last case would be problematic since symlinks are resolved in > kernel and the kernel can't really see the different userland releases. > 64-bit call vs 32-bit is ok. Not exactly true. There is p_osrel member of struct proc, that may be interpreted as osrelease where binary where intended to run. At least, it reflects the crt1.o origin. I am not sure whether such dynamic data naturally maps into varsyms concept. At least, some fallback shall be provided for "other" values of osrel. The fact that Solaris and Linux do not provide variant symlinks may be explained by keyword support in their rtld. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080823/559bd162/attachment.pgp From rwatson at FreeBSD.org Sat Aug 23 12:27:01 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sat Aug 23 12:27:07 2008 Subject: Magic symlinks redux In-Reply-To: References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <20080822161314.GE57443@lor.one-eyed-alien.net> Message-ID: On Sat, 23 Aug 2008, Ivan Voras wrote: >> I am extremely uneasy about adding anything related to uid's or gid's, or >> similar dynamic values. > > This argument pops up often without explanation. The only thing I can think > of is applications using ".." on a dynamic symlink and ending up somewhere > where it doesn't want to, but this can also be said for normal symlinks. > > Can anyone explain more possible security problems with having @uid in > varsymlinks? The issues I'm aware of revolve more about usability than security, although frequently security relies on determinism. Consider setuid tools, such as lpr or sudo, which currently behave deterministically when a path is passed, and the effect of having "@uid" present in a symlink evaluated in the lookup to /tmp: lpr /tmp/my.txt sudo mv /tmp/group.tmp /etc/group While I see arguments going many different ways on this, I think POLA reasonably demands that we not significant disrupt the semantics of /tmp or other situations where, on face value, a uid-based symlink might be used. And, from a general security perspective, maintaining the assumptions of current users, applications, etc, is quite important for avoiding vulnerabilities that stem from changing underlying execution model assumptions. I think Brooks's reimplementation of the DFBSD approach addresses most of my concerns with respect to classic name space manipulation attacks, but even then I would advise extreme caution. Robert N M Watson Computer Laboratory University of Cambridge From ivoras at freebsd.org Sun Aug 24 00:05:08 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Aug 24 00:05:25 2008 Subject: FreeBSD and DEP aka "NX bit"? Message-ID: I stumbled upon this Wikipedia page: http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features and it mentions NX bit is supported in FreeBSD. Is this true? Is it enabled by default? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080824/c03d4a57/signature.pgp From ivoras at freebsd.org Sun Aug 24 00:41:03 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Aug 24 00:41:10 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> Message-ID: <9bbcef730808231741o5e765f3bh546475b28fe51f9b@mail.gmail.com> 2008/8/24 Matthew Macy : > On Sat, Aug 23, 2008 at 5:04 PM, Ivan Voras wrote: >> I stumbled upon this Wikipedia page: >> http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features >> and it mentions NX bit is supported in FreeBSD. Is this true? Is it >> enabled by default? > > Yes. However, it is in the upper word so it only works with PAE or > amd64. "jemalloc" maps the heap NX and thread stacks are mapped NX. > The default process stack currently needs to be executable because > sigcode is placed at the start of the stack at the time of process > creation. Thanks! How useful is it without protecting the default stack? IIRC wasn't stack protection one of the main (marketed) bonuses for NX? (I'm thinking of the majority of currently popular server software like apache (preforked) and PostgreSQL...) From mat.macy at gmail.com Sun Aug 24 00:43:02 2008 From: mat.macy at gmail.com (Matthew Macy) Date: Sun Aug 24 00:43:08 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: References: Message-ID: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> On Sat, Aug 23, 2008 at 5:04 PM, Ivan Voras wrote: > I stumbled upon this Wikipedia page: > http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features > and it mentions NX bit is supported in FreeBSD. Is this true? Is it > enabled by default? Yes. However, it is in the upper word so it only works with PAE or amd64. "jemalloc" maps the heap NX and thread stacks are mapped NX. The default process stack currently needs to be executable because sigcode is placed at the start of the stack at the time of process creation. -Kip From kmacy at freebsd.org Sun Aug 24 00:51:08 2008 From: kmacy at freebsd.org (Kip Macy) Date: Sun Aug 24 00:51:15 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <9bbcef730808231741o5e765f3bh546475b28fe51f9b@mail.gmail.com> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> <9bbcef730808231741o5e765f3bh546475b28fe51f9b@mail.gmail.com> Message-ID: <3c1674c90808231751h3d11d52at2eac1eb21cd8940b@mail.gmail.com> On Sat, Aug 23, 2008 at 5:41 PM, Ivan Voras wrote: > 2008/8/24 Matthew Macy : >> On Sat, Aug 23, 2008 at 5:04 PM, Ivan Voras wrote: >>> I stumbled upon this Wikipedia page: >>> http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features >>> and it mentions NX bit is supported in FreeBSD. Is this true? Is it >>> enabled by default? >> >> Yes. However, it is in the upper word so it only works with PAE or >> amd64. "jemalloc" maps the heap NX and thread stacks are mapped NX. >> The default process stack currently needs to be executable because >> sigcode is placed at the start of the stack at the time of process >> creation. > > Thanks! > > How useful is it without protecting the default stack? IIRC wasn't > stack protection one of the main (marketed) bonuses for NX? (I'm > thinking of the majority of currently popular server software like > apache (preforked) and PostgreSQL...) FreeBSD could certainly take better advantage of it. It also doesn't help that the default process stack always starts at the same address. However, SSP does mitigate some of the risk. -Kip From bu7cher at yandex.ru Mon Aug 25 04:28:54 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Mon Aug 25 04:29:00 2008 Subject: Magic symlinks redux In-Reply-To: <20080822.224404.691670281.imp@bsdimp.com> References: <20080823013912.GA19588@epsilon.local> <20080822.200511.1137957320.imp@bsdimp.com> <200808222241.52325.jhb@freebsd.org> <20080822.224404.691670281.imp@bsdimp.com> Message-ID: <48B234F7.8070407@yandex.ru> M. Warner Losh wrote: > : The lookup table you have still requires patching source somewhere which > : probably defeats the purpose. > > That's the whole "less lame of getting data into the kernel" I was > talking about. The above was to show the concept, not an actual > implementation of the data. I don't like the hint idea so much, but > was looking for some other way to get the data into the kernel. What is about extending pciconf(8) (or making a pcictl(8) alias) for this purposes? -- WBR, Andrey V. Elsukov From imp at bsdimp.com Mon Aug 25 04:43:59 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Mon Aug 25 04:44:05 2008 Subject: Magic symlinks redux In-Reply-To: <48B234F7.8070407@yandex.ru> References: <200808222241.52325.jhb@freebsd.org> <20080822.224404.691670281.imp@bsdimp.com> <48B234F7.8070407@yandex.ru> Message-ID: <20080824.224239.-4055712.imp@bsdimp.com> In message: <48B234F7.8070407@yandex.ru> "Andrey V. Elsukov" writes: : M. Warner Losh wrote: : > : The lookup table you have still requires patching source somewhere which : > : probably defeats the purpose. : > : > That's the whole "less lame of getting data into the kernel" I was : > talking about. The above was to show the concept, not an actual : > implementation of the data. I don't like the hint idea so much, but : > was looking for some other way to get the data into the kernel. : : What is about extending pciconf(8) (or making a pcictl(8) alias) for : this purposes? Lots of things could... However, pciconf(8) likely runs too late... Warner From bu7cher at yandex.ru Mon Aug 25 04:50:32 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Mon Aug 25 04:50:39 2008 Subject: Magic symlinks redux In-Reply-To: <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> Message-ID: <48B23A0E.1030700@yandex.ru> Ivan Voras wrote: > Firstly, it might be useless for your purpose but there are others. > > If you read the NetBSD's documentation about magiclinks, you'll see > this set of supported variables: Originally I wanted to implement 4 layers: 1. System-wide (jail-wide)(global for all jails etc). Only root can set variables in this layer. 2. Kernel-wide. Variables in this layer can be set from KLD's. Some KPI needed for setting these variables (also all netbsd-like variables can be implemented in this layer). Superuser can override these variables via system-wide layer. 3. Per-user. 4. Per-process. -- WBR, Andrey V. Elsukov From imp at bsdimp.com Mon Aug 25 06:26:18 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Mon Aug 25 06:26:25 2008 Subject: Code review Message-ID: <20080825.002316.-1548243307.imp@bsdimp.com> I did this a few years ago when trying to track down a problem with some realtek network chips that I was having problems with at Timing Solutions. I'd like to get this into the tree, since it was helpful then. Comments? Warner -------------- next part -------------- diff -ur src/sys/pci/if_rl.c newcard/src/sys/pci/if_rl.c --- src/sys/pci/if_rl.c 2008-08-23 22:21:15.000000000 -0600 +++ newcard/src/sys/pci/if_rl.c 2008-08-23 22:26:09.000000000 -0600 @@ -1253,18 +1253,120 @@ } static void +rl_twister_update(struct rl_softc *sc) +{ + uint16_t linktest; + static const uint32_t param[4][4] = { + {0xcb39de43, 0xcb39ce43, 0xfb38de03, 0xcb38de43}, + {0xcb39de43, 0xcb39ce43, 0xcb39ce83, 0xcb39ce83}, + {0xcb39de43, 0xcb39ce43, 0xcb39ce83, 0xcb39ce83}, + {0xbb39de43, 0xbb39ce43, 0xbb39ce83, 0xbb39ce83} + }; + + /* + * Tune the so-called twister registers of the RTL8139. These + * are used to compensate for impendence mismatches. The + * method for tuning these registes is undocumented and the + * following proceedure is collected from public sources. + */ + switch (sc->rl_twister) + { + case CHK_LINK: + /* + * If we have a sufficent link, then we can proceed in + * the state machine to the next stage. If not, then + * disable further tuning after writing sane defaults. + */ + if (CSR_READ_2(sc, RL_CSCFG) & RL_CSCFG_LINK_OK) { + CSR_WRITE_2(sc, RL_CSCFG, RL_CSCFG_LINK_DOWN_OFF_CMD); + sc->rl_twister = FIND_ROW; + } else { + CSR_WRITE_2(sc, RL_CSCFG, RL_CSCFG_LINK_DOWN_CMD); + CSR_WRITE_4(sc, RL_NWAYTST, RL_NWAYTST_CBL_TEST); + CSR_WRITE_4(sc, RL_PARA78, RL_PARA78_DEF); + CSR_WRITE_4(sc, RL_PARA7C, RL_PARA7C_DEF); + sc->rl_twister = DONE; + } + break; + case FIND_ROW: + /* + * Read how long it took to see the echo to find the tuning + * row to use. + */ + linktest = CSR_READ_2(sc, RL_CSCFG) & RL_CSCFG_STATUS; + if (linktest == RL_CSCFG_ROW3) + sc->rl_twist_row = 3; + else if (linktest == RL_CSCFG_ROW2) + sc->rl_twist_row = 2; + else if (linktest == RL_CSCFG_ROW1) + sc->rl_twist_row = 1; + else + sc->rl_twist_row = 0; + sc->rl_twist_col = 0; + sc->rl_twister = SET_PARAM; + break; + case SET_PARAM: + if (sc->rl_twist_col == 0) + CSR_WRITE_4(sc, RL_NWAYTST, RL_NWAYTST_RESET); + CSR_WRITE_4(sc, RL_PARA7C, + param[sc->rl_twist_row][sc->rl_twist_col]); + if (++sc->rl_twist_col == 4) { + if (sc->rl_twist_row == 3) + sc->rl_twister = RECHK_LONG; + else + sc->rl_twister = DONE; + } + break; + case RECHK_LONG: + /* + * For long cables, we have to double check to make sure we + * don't mistune. + */ + linktest = CSR_READ_2(sc, RL_CSCFG) & RL_CSCFG_STATUS; + if (linktest == RL_CSCFG_ROW3) + sc->rl_twister = DONE; + else { + CSR_WRITE_4(sc, RL_PARA7C, RL_PARA7C_RETUNE); + sc->rl_twister = RETUNE; + } + break; + case RETUNE: + /* Retune for a shorter cable (try column 2) */ + CSR_WRITE_4(sc, RL_NWAYTST, RL_NWAYTST_CBL_TEST); + CSR_WRITE_4(sc, RL_PARA78, RL_PARA78_DEF); + CSR_WRITE_4(sc, RL_PARA7C, RL_PARA7C_DEF); + CSR_WRITE_4(sc, RL_NWAYTST, RL_NWAYTST_RESET); + sc->rl_twist_row--; + sc->rl_twist_col = 0; + sc->rl_twister = SET_PARAM; + break; + + case DONE: + break; + } + +} + +static void rl_tick(void *xsc) { struct rl_softc *sc = xsc; struct mii_data *mii; + int ticks; RL_LOCK_ASSERT(sc); mii = device_get_softc(sc->rl_miibus); mii_tick(mii); + if (sc->rl_twister != DONE) + rl_twister_update(sc); + if (sc->rl_twister != DONE) + ticks = hz / 10; + else + ticks = hz; rl_watchdog(sc); - callout_reset(&sc->rl_stat_callout, hz, rl_tick, sc); + callout_reset(&sc->rl_stat_callout, ticks, rl_tick, sc); } #ifdef DEVICE_POLLING @@ -1490,6 +1592,13 @@ rl_stop(sc); /* + * Reset twister register tuning state. The twister registers + * and their tuning are undocumented, but are necessary to cope + * with bad links. rl_twister = DONE here will disable this entirely. + */ + sc->rl_twister = CHK_LINK; + + /* * Init our MAC address. Even though the chipset * documentation doesn't mention it, we need to enter "Config * register write enable" mode to modify the ID registers. diff -ur src/sys/pci/if_rlreg.h newcard/src/sys/pci/if_rlreg.h --- src/sys/pci/if_rlreg.h 2008-08-23 22:21:15.000000000 -0600 +++ newcard/src/sys/pci/if_rlreg.h 2008-08-23 22:26:09.000000000 -0600 @@ -309,6 +309,27 @@ #define RL_CMD_RESET 0x0010 /* + * Twister register values. These are completely undocumented and derived + * from public sources. + */ +#define RL_CSCFG_LINK_OK 0x0400 +#define RL_CSCFG_CHANGE 0x0800 +#define RL_CSCFG_STATUS 0xf000 +#define RL_CSCFG_ROW3 0x7000 +#define RL_CSCFG_ROW2 0x3000 +#define RL_CSCFG_ROW1 0x1000 +#define RL_CSCFG_LINK_DOWN_OFF_CMD 0x03c0 +#define RL_CSCFG_LINK_DOWN_CMD 0xf3c0 + +#define RL_NWAYTST_RESET 0 +#define RL_NWAYTST_CBL_TEST 0x20 + +#define RL_PARA78 0x78 +#define RL_PARA78_DEF 0x78fa8388 +#define RL_PARA7C 0x7C +#define RL_PARA7C_DEF 0xcb38de43 +#define RL_PARA7C_RETUNE 0xfb38de03 +/* * EEPROM control register */ #define RL_EE_DATAOUT 0x01 /* Data out */ @@ -801,6 +822,8 @@ bus_addr_t rl_tx_list_addr; }; +enum rl_twist { DONE, CHK_LINK, FIND_ROW, SET_PARAM, RECHK_LONG, RETUNE }; + struct rl_softc { struct ifnet *rl_ifp; /* interface info */ bus_space_handle_t rl_bhandle; /* bus space handle */ @@ -830,6 +853,9 @@ uint32_t rl_rxlenmask; int rl_testmode; int rl_if_flags; + enum rl_twist rl_twister; + int rl_twist_row; + int rl_twist_col; int suspended; /* 0 = normal 1 = suspended */ #ifdef DEVICE_POLLING int rxcycles; @@ -850,6 +876,8 @@ #define RL_FLAG_LINK 0x8000 }; + + #define RL_LOCK(_sc) mtx_lock(&(_sc)->rl_mtx) #define RL_UNLOCK(_sc) mtx_unlock(&(_sc)->rl_mtx) #define RL_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->rl_mtx, MA_OWNED) From dillon at apollo.backplane.com Mon Aug 25 06:27:06 2008 From: dillon at apollo.backplane.com (Matthew Dillon) Date: Mon Aug 25 06:27:12 2008 Subject: Magic symlinks redux References: <20080822150020.GA57443@lor.one-eyed-alien.net> <9bbcef730808220802pa84b597u457100a23b03a80c@mail.gmail.com> <20080822153945.GC57443@lor.one-eyed-alien.net> <9bbcef730808220853q22666b44n5ca2b7add991191f@mail.gmail.com> <48B23A0E.1030700@yandex.ru> Message-ID: <200808250616.m7P6GwEa055070@apollo.backplane.com> The only issue we hit with per-process varsyms is that to really be useful the shells need built-ins to set the process-space variables, since doing so as an exec'd subprocess will not effect the shell or its children. We have no plans to allow one process to modify another process's varsyms as that would cause significant security issues. In fact, even the per-user variables might have security issues (e.g. common-run 'nobody' user utilities, and so forth, for which a pseudo-userid has not been created). I'm kinda thinking of removing per-user variables despite the usefulness. There have been various circumstances where we've thought varsyms would be useful, but ended up not needing to use them. Right now we are looking at possibly using them to point /usr/lib and friends to select 32 or 64 bit ABI library paths, and have the kernel automatically set a varsym when exec'ing an ELF program to the program's ABI. Doing this would allow 32 and 64 bit program, library, and package sets to be run and maintained side-by-side. -Matt From imp at bsdimp.com Mon Aug 25 07:13:59 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Mon Aug 25 07:14:04 2008 Subject: Code review: using ${.TARGET} instead of opt_*.h Message-ID: <20080825.011312.-1956307484.imp@bsdimp.com> I was cleaning out my tree, and noticed that I took a stab at solving the problem opt_*.h being explicitly named in rules rather than ${.TARGET} when building modules. Comments? Warner From imp at bsdimp.com Mon Aug 25 07:16:59 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Mon Aug 25 07:17:05 2008 Subject: Code review: using ${.TARGET} instead of opt_*.h Message-ID: <20080825.011513.-432836673.imp@bsdimp.com> [[ resent with attachment]] I was cleaning out my tree, and noticed that I took a stab at solving the problem opt_*.h being explicitly named in rules rather than ${.TARGET} when building modules. Comments? Warner -------------- next part -------------- Index: wlan_wep/Makefile =================================================================== --- wlan_wep/Makefile (revision 182150) +++ wlan_wep/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: wlan_amrr/Makefile =================================================================== --- wlan_amrr/Makefile (revision 182150) +++ wlan_amrr/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: digi/digi/Makefile =================================================================== --- digi/digi/Makefile (revision 182150) +++ digi/digi/Makefile (working copy) @@ -9,8 +9,8 @@ .if !defined(KERNBUILDDIR) opt_compat.h: - echo "#define COMPAT_43 1" > opt_compat.h - echo "#define COMPAT_FREEBSD6 1" >> opt_compat.h + echo "#define COMPAT_43 1" > ${.TARGET} + echo "#define COMPAT_FREEBSD6 1" >> ${.TARGET} .endif .include Index: wlan_acl/Makefile =================================================================== --- wlan_acl/Makefile (revision 182150) +++ wlan_acl/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: netgraph/sync_ar/Makefile =================================================================== --- netgraph/sync_ar/Makefile (revision 182150) +++ netgraph/sync_ar/Makefile (working copy) @@ -1,13 +1,13 @@ # $FreeBSD$ - + .PATH: ${.CURDIR}/../../../dev/ar KMOD = ng_sync_ar SRCS = if_ar.c if_ar_isa.c if_ar_pci.c SRCS += device_if.h bus_if.h pci_if.h isa_if.h opt_netgraph.h - + .if !defined(KERNBUILDDIR) opt_netgraph.h: - echo "#define NETGRAPH" > opt_netgraph.h + echo "#define NETGRAPH" > ${.TARGET} .endif .include Index: netgraph/sync_sr/Makefile =================================================================== --- netgraph/sync_sr/Makefile (revision 182150) +++ netgraph/sync_sr/Makefile (working copy) @@ -1,13 +1,13 @@ # $FreeBSD$ - + .PATH: ${.CURDIR}/../../../dev/sr KMOD = ng_sync_sr SRCS = if_sr.c if_sr_isa.c if_sr_pci.c SRCS += device_if.h bus_if.h pci_if.h isa_if.h opt_netgraph.h - + .if !defined(KERNBUILDDIR) opt_netgraph.h: - echo "#define NETGRAPH" > opt_netgraph.h + echo "#define NETGRAPH" > ${.TARGET} .endif .include Index: aha/Makefile =================================================================== --- aha/Makefile (revision 182150) +++ aha/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_scsi.h: - echo "#define SCSI_DELAY 15000" > opt_scsi.h + echo "#define SCSI_DELAY 15000" > ${.TARGET} .endif .include Index: ahb/Makefile =================================================================== --- ahb/Makefile (revision 182150) +++ ahb/Makefile (working copy) @@ -3,12 +3,11 @@ .PATH: ${.CURDIR}/../../dev/ahb KMOD= ahb -SRCS= ahb.c opt_cam.h device_if.h bus_if.h \ - eisa_if.h opt_scsi.h +SRCS= ahb.c opt_cam.h device_if.h bus_if.h eisa_if.h opt_scsi.h .if !defined(KERNBUILDDIR) opt_scsi.h: - echo "#define SCSI_DELAY 15000" > opt_scsi.h + echo "#define SCSI_DELAY 15000" > ${.TARGET} .endif .include Index: svr4/Makefile =================================================================== --- svr4/Makefile (revision 182150) +++ svr4/Makefile (working copy) @@ -26,11 +26,11 @@ .if !defined(KERNBUILDDIR) opt_compat.h: - echo "#define COMPAT_43 1" > opt_compat.h + echo "#define COMPAT_43 1" > ${.TARGET} .if defined(DEBUG) opt_svr4.h: - echo "#define DEBUG_SVR4 1" > opt_svr4.h + echo "#define DEBUG_SVR4 1" > ${.TARGET} .endif .endif Index: patm/Makefile =================================================================== --- patm/Makefile (revision 182150) +++ patm/Makefile (working copy) @@ -14,10 +14,10 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} opt_natm.h: - echo "#define NATM 1" > opt_natm.h + echo "#define NATM 1" > ${.TARGET} .endif .include Index: trm/Makefile =================================================================== --- trm/Makefile (revision 182150) +++ trm/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_scsi.h: - echo "#define SCSI_DELAY 15000" > opt_scsi.h + echo "#define SCSI_DELAY 15000" > ${.TARGET} .endif .include Index: wlan_rssadapt/Makefile =================================================================== --- wlan_rssadapt/Makefile (revision 182150) +++ wlan_rssadapt/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: fatm/Makefile =================================================================== --- fatm/Makefile (revision 182150) +++ fatm/Makefile (working copy) @@ -11,10 +11,10 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} opt_natm.h: - echo "#define NATM 1" > opt_natm.h + echo "#define NATM 1" > ${.TARGET} .endif .include Index: ubsec/Makefile =================================================================== --- ubsec/Makefile (revision 182150) +++ ubsec/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_ubsec.h: - echo "#define UBSEC_DEBUG 1" > opt_ubsec.h + echo "#define UBSEC_DEBUG 1" > ${.TARGET} .endif .include Index: ath_rate_onoe/Makefile =================================================================== --- ath_rate_onoe/Makefile (revision 182150) +++ ath_rate_onoe/Makefile (working copy) @@ -61,8 +61,8 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h -# echo > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} +# echo > ${.TARGET} .endif .include Index: pf/Makefile =================================================================== --- pf/Makefile (revision 182150) +++ pf/Makefile (working copy) @@ -15,20 +15,20 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} .if ${MK_INET6_SUPPORT} != "no" opt_inet6.h: - echo "#define INET6 1" > opt_inet6.h + echo "#define INET6 1" > ${.TARGET} .endif opt_bpf.h: - echo "#define DEV_BPF 1" > opt_bpf.h + echo "#define DEV_BPF 1" > ${.TARGET} # pflog can be loaded as a module, have the additional checks turned on opt_pf.h: - echo "#define DEV_PF 1" > opt_pf.h - echo "#define DEF_PFLOG 1" >> opt_pf.h + echo "#define DEV_PF 1" > ${.TARGET} + echo "#define DEF_PFLOG 1" >> ${.TARGET} .endif .include Index: sppp/Makefile =================================================================== --- sppp/Makefile (revision 182150) +++ sppp/Makefile (working copy) @@ -17,13 +17,13 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} opt_inet6.h: - echo "#define INET6 1" > opt_inet6.h + echo "#define INET6 1" > ${.TARGET} opt_ipx.h: - echo "#define IPX 1" > opt_ipx.h + echo "#define IPX 1" > ${.TARGET} .endif .include Index: an/Makefile =================================================================== --- an/Makefile (revision 182150) +++ an/Makefile (working copy) @@ -9,7 +9,7 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} .endif .include Index: ar/Makefile =================================================================== --- ar/Makefile (revision 182150) +++ ar/Makefile (working copy) @@ -10,7 +10,7 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH ${NETGRAPH}" > opt_netgraph.h + echo "#define NETGRAPH ${NETGRAPH}" > ${.TARGET} .endif .endif Index: wlan_xauth/Makefile =================================================================== --- wlan_xauth/Makefile (revision 182150) +++ wlan_xauth/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: ath_rate_sample/Makefile =================================================================== --- ath_rate_sample/Makefile (revision 182150) +++ ath_rate_sample/Makefile (working copy) @@ -61,8 +61,8 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: -# echo "#define IEEE80211_DEBUG 1" > opt_wlan.h - echo > opt_wlan.h +# echo "#define IEEE80211_DEBUG 1" > ${.TARGET} + echo > ${.TARGET} .endif .include Index: rp/Makefile =================================================================== --- rp/Makefile (revision 182150) +++ rp/Makefile (working copy) @@ -7,7 +7,7 @@ .if !defined(KERNBUILDDIR) opt_compat.h: - echo "#define COMPAT_43 1" > opt_compat.h + echo "#define COMPAT_43 1" > ${.TARGET} .endif .include Index: ce/Makefile =================================================================== --- ce/Makefile (revision 182150) +++ ce/Makefile (working copy) @@ -16,12 +16,12 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH ${NETGRAPH}" > opt_netgraph.h + echo "#define NETGRAPH ${NETGRAPH}" > ${.TARGET} .endif .if ${NG_CRONYX} != 0 opt_ng_cronyx.h: - echo "#define NETGRAPH_CRONYX 1" > opt_ng_cronyx.h + echo "#define NETGRAPH_CRONYX 1" > ${.TARGET} .endif .endif Index: cp/Makefile =================================================================== --- cp/Makefile (revision 182150) +++ cp/Makefile (working copy) @@ -16,12 +16,12 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH ${NETGRAPH}" > opt_netgraph.h + echo "#define NETGRAPH ${NETGRAPH}" > ${.TARGET} .endif .if ${NG_CRONYX} != 0 opt_ng_cronyx.h: - echo "#define NETGRAPH_CRONYX 1" > opt_ng_cronyx.h + echo "#define NETGRAPH_CRONYX 1" > ${.TARGET} .endif .endif Index: pflog/Makefile =================================================================== --- pflog/Makefile (revision 182150) +++ pflog/Makefile (working copy) @@ -12,15 +12,15 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} .if ${MK_INET6_SUPPORT} != "no" opt_inet6.h: - echo "#define INET6 1" > opt_inet6.h + echo "#define INET6 1" > ${.TARGET} .endif opt_bpf.h: - echo "#define DEV_BPF 1" > opt_bpf.h + echo "#define DEV_BPF 1" > ${.TARGET} .endif .include Index: cx/Makefile =================================================================== --- cx/Makefile (revision 182150) +++ cx/Makefile (working copy) @@ -15,12 +15,12 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH $(NETGRAPH)" > opt_netgraph.h + echo "#define NETGRAPH $(NETGRAPH)" > ${.TARGET} .endif .if ${NG_CRONYX} != 0 opt_ng_cronyx.h: - echo "#define NETGRAPH_CRONYX 1" > opt_ng_cronyx.h + echo "#define NETGRAPH_CRONYX 1" > ${.TARGET} .endif .endif Index: sr/Makefile =================================================================== --- sr/Makefile (revision 182150) +++ sr/Makefile (working copy) @@ -10,7 +10,7 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH ${NETGRAPH}" > opt_netgraph.h + echo "#define NETGRAPH ${NETGRAPH}" > ${.TARGET} .endif .endif Index: hatm/Makefile =================================================================== --- hatm/Makefile (revision 182150) +++ hatm/Makefile (working copy) @@ -13,10 +13,10 @@ .if !defined(KERNBUILDDIR) opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} opt_natm.h: - echo "#define NATM 1" > opt_natm.h + echo "#define NATM 1" > ${.TARGET} .endif .include Index: wlan/Makefile =================================================================== --- wlan/Makefile (revision 182150) +++ wlan/Makefile (working copy) @@ -14,14 +14,14 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h - echo "#define IEEE80211_AMDPU_AGE 1" >> opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} + echo "#define IEEE80211_AMDPU_AGE 1" >> ${.TARGET} opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} opt_ipx.h: - echo "#define IPX 1" > opt_ipx.h + echo "#define IPX 1" > ${.TARGET} .endif .include Index: ath_rate_amrr/Makefile =================================================================== --- ath_rate_amrr/Makefile (revision 182150) +++ ath_rate_amrr/Makefile (working copy) @@ -61,8 +61,8 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: -# echo "#define IEEE80211_DEBUG 1" > opt_wlan.h - echo > opt_wlan.h +# echo "#define IEEE80211_DEBUG 1" > ${.TARGET} + echo > ${.TARGET} .endif .include Index: hifn/Makefile =================================================================== --- hifn/Makefile (revision 182150) +++ hifn/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_hifn.h: - echo "#define HIFN_DEBUG 1" > opt_hifn.h + echo "#define HIFN_DEBUG 1" > ${.TARGET} .endif .include Index: wlan_tkip/Makefile =================================================================== --- wlan_tkip/Makefile (revision 182150) +++ wlan_tkip/Makefile (working copy) @@ -8,7 +8,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: linux/Makefile =================================================================== --- linux/Makefile (revision 182150) +++ linux/Makefile (working copy) @@ -54,7 +54,7 @@ .if !defined(KERNBUILDDIR) opt_inet6.h: - echo "#define INET6 1" > opt_inet6.h + echo "#define INET6 1" > ${.TARGET} .endif .include Index: wlan_ccmp/Makefile =================================================================== --- wlan_ccmp/Makefile (revision 182150) +++ wlan_ccmp/Makefile (working copy) @@ -10,7 +10,7 @@ .if !defined(KERNBUILDDIR) opt_wlan.h: - echo "#define IEEE80211_DEBUG 1" > opt_wlan.h + echo "#define IEEE80211_DEBUG 1" > ${.TARGET} .endif .include Index: safe/Makefile =================================================================== --- safe/Makefile (revision 182150) +++ safe/Makefile (working copy) @@ -34,7 +34,7 @@ .if !defined(KERNBUILDDIR) opt_safe.h: - echo "#define SAFE_DEBUG 1" > opt_safe.h + echo "#define SAFE_DEBUG 1" > ${.TARGET} .endif .include Index: if_tap/Makefile =================================================================== --- if_tap/Makefile (revision 182150) +++ if_tap/Makefile (working copy) @@ -12,7 +12,7 @@ echo "#define COMPAT_FREEBSD6 1" > ${.TARGET} opt_inet.h: - echo "#define INET 1" > opt_inet.h + echo "#define INET 1" > ${.TARGET} .endif .include Index: ctau/Makefile =================================================================== --- ctau/Makefile (revision 182150) +++ ctau/Makefile (working copy) @@ -15,12 +15,12 @@ .if ${NETGRAPH} != 0 opt_netgraph.h: - echo "#define NETGRAPH $(NETGRAPH)" > opt_netgraph.h + echo "#define NETGRAPH $(NETGRAPH)" > ${.TARGET} .endif .if ${NG_CRONYX} != 0 opt_ng_cronyx.h: - echo "#define NETGRAPH_CRONYX 1" > opt_ng_cronyx.h + echo "#define NETGRAPH_CRONYX 1" > ${.TARGET} .endif .endif From bugmaster at FreeBSD.org Mon Aug 25 11:06:48 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 25 11:07:21 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200808251106.m7PB6lpq027696@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From jhb at freebsd.org Mon Aug 25 20:04:44 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Aug 25 20:04:56 2008 Subject: Code review In-Reply-To: <20080825.002316.-1548243307.imp@bsdimp.com> References: <20080825.002316.-1548243307.imp@bsdimp.com> Message-ID: <200808251037.11126.jhb@freebsd.org> On Monday 25 August 2008 02:23:16 am M. Warner Losh wrote: > I did this a few years ago when trying to track down a problem with > some realtek network chips that I was having problems with at Timing > Solutions. I'd like to get this into the tree, since it was helpful > then. > > Comments? When you are running a faster tick I think want to only call the mii and watchdog stuff once a second still. I know this will break the tx watchdog for example. Since it's kind of tricky to manage that I think you should just use a separate timer for the twister stuff. -- John Baldwin From andrew-freebsd at areilly.bpc-users.org Tue Aug 26 00:00:33 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Tue Aug 26 00:00:40 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080820214627.C30593@desktop> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> <20080819134005.GA85664@duncan.reilly.home> <20080820214627.C30593@desktop> Message-ID: <20080826000009.GA77044@duncan.reilly.home> Hi Jeff, Sorry for the slow follow-up. It's actually quite a pain to tweak the kernel on that machine: it's often in use and it's slow at compiling kernels. Will see if I can get on to it soon. On Wed, Aug 20, 2008 at 09:47:01PM -1000, Jeff Roberson wrote: > On Tue, 19 Aug 2008, Andrew Reilly wrote: > > >On Mon, Aug 18, 2008 at 10:00:12PM -1000, Jeff Roberson wrote: > >>Can you tell me what % cpu the audio application uses while running? Have > >>you tried nice -20 instead of rtprio? > > > >It's currently using about 10%, maybe a bit more. I expect > >it to get heavier as I add more to it. I have hopes of it > >continuing to work even at 60 to 80% of CPU. > > > >I haven't tried nice -20 because I don't want the priority to > >drift or change, which is something that I thought the normal > >levels did. I'll give it a go though, and report back. > > With such a low cpu utilization I wouldn't expect it's the scheduling > algorithm. It may be a difference in preemption settings. Is preemption > enabled in both kernels? Yes, all of the premption and POSIX realtime options (that are usually on, aren't they?) are on in each case. Only difference is selection of scheduler: (This is the whole config file) include GENERIC ident GURNEY nocpu I486_CPU nocpu I586_CPU nooptions SCHED_ULE options SCHED_4BSD Cheers, Andrew From imp at bsdimp.com Tue Aug 26 07:25:54 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Tue Aug 26 07:26:07 2008 Subject: Code review In-Reply-To: <200808251037.11126.jhb@freebsd.org> References: <20080825.002316.-1548243307.imp@bsdimp.com> <200808251037.11126.jhb@freebsd.org> Message-ID: <20080826.012357.1973601375.imp@bsdimp.com> In message: <200808251037.11126.jhb@freebsd.org> John Baldwin writes: : On Monday 25 August 2008 02:23:16 am M. Warner Losh wrote: : > I did this a few years ago when trying to track down a problem with : > some realtek network chips that I was having problems with at Timing : > Solutions. I'd like to get this into the tree, since it was helpful : > then. : > : > Comments? : : When you are running a faster tick I think want to only call the mii and : watchdog stuff once a second still. I know this will break the tx watchdog : for example. Since it's kind of tricky to manage that I think you should : just use a separate timer for the twister stuff. Is this in general, or do you have a specific problem in mind with the rl change? In general, we're not transmitting during this exercise and it happens only once... Is it worth the extra hair? Warner From andrew-freebsd at areilly.bpc-users.org Tue Aug 26 07:50:26 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Tue Aug 26 07:50:33 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> Message-ID: <20080826074943.GB85357@duncan.reilly.home> On Sat, Aug 23, 2008 at 05:13:30PM -0700, Matthew Macy wrote: > On Sat, Aug 23, 2008 at 5:04 PM, Ivan Voras wrote: > > I stumbled upon this Wikipedia page: > > http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features > > and it mentions NX bit is supported in FreeBSD. Is this true? Is it > > enabled by default? > > Yes. However, it is in the upper word so it only works with PAE or > amd64. "jemalloc" maps the heap NX and thread stacks are mapped NX. > The default process stack currently needs to be executable because > sigcode is placed at the start of the stack at the time of process > creation. Oh, I was looking into this a few months ago, and came to the conclusion that NX wasn't turned on at all. How do applications/languages that use JIT or other run-time code generation get around the non-executable heap? Just not use jemalloc? I've been using 7-STABLE on amd64 for a long time, and haven't noticed any problems with Java or SBCL lisp or PLT-scheme, all of which use JIT code generation (but probably neither use jemalloc?) Cheers, -- Andrew From alfred at freebsd.org Tue Aug 26 16:28:08 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Tue Aug 26 16:28:19 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <20080826074943.GB85357@duncan.reilly.home> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> <20080826074943.GB85357@duncan.reilly.home> Message-ID: <20080826162807.GF16977@elvis.mu.org> * Andrew Reilly [080826 00:51] wrote: > On Sat, Aug 23, 2008 at 05:13:30PM -0700, Matthew Macy wrote: > > On Sat, Aug 23, 2008 at 5:04 PM, Ivan Voras wrote: > > > I stumbled upon this Wikipedia page: > > > http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems#Security_features > > > and it mentions NX bit is supported in FreeBSD. Is this true? Is it > > > enabled by default? > > > > Yes. However, it is in the upper word so it only works with PAE or > > amd64. "jemalloc" maps the heap NX and thread stacks are mapped NX. > > The default process stack currently needs to be executable because > > sigcode is placed at the start of the stack at the time of process > > creation. > > Oh, I was looking into this a few months ago, and came to the > conclusion that NX wasn't turned on at all. > > How do applications/languages that use JIT or other run-time > code generation get around the non-executable heap? Just not > use jemalloc? > > I've been using 7-STABLE on amd64 for a long time, and haven't > noticed any problems with Java or SBCL lisp or PLT-scheme, all > of which use JIT code generation (but probably neither use > jemalloc?) mprotect(2)? -- - Alfred Perlstein From jhb at freebsd.org Tue Aug 26 18:02:53 2008 From: jhb at freebsd.org (John Baldwin) Date: Tue Aug 26 18:03:04 2008 Subject: Code review In-Reply-To: <20080826.012357.1973601375.imp@bsdimp.com> References: <20080825.002316.-1548243307.imp@bsdimp.com> <200808251037.11126.jhb@freebsd.org> <20080826.012357.1973601375.imp@bsdimp.com> Message-ID: <200808261033.43091.jhb@freebsd.org> On Tuesday 26 August 2008 03:23:57 am M. Warner Losh wrote: > In message: <200808251037.11126.jhb@freebsd.org> > John Baldwin writes: > : On Monday 25 August 2008 02:23:16 am M. Warner Losh wrote: > : > I did this a few years ago when trying to track down a problem with > : > some realtek network chips that I was having problems with at Timing > : > Solutions. I'd like to get this into the tree, since it was helpful > : > then. > : > > : > Comments? > : > : When you are running a faster tick I think want to only call the mii and > : watchdog stuff once a second still. I know this will break the tx watchdog > : for example. Since it's kind of tricky to manage that I think you should > : just use a separate timer for the twister stuff. > > Is this in general, or do you have a specific problem in mind with the > rl change? In general, we're not transmitting during this exercise > and it happens only once... Is it worth the extra hair? Worried more about the general case. Is mii_tick() going to be ok with being invoked more often? Also, if you are only doing this during attach or interface up, it might be simpler to have a private timer (shoot, if it's during attach the 'struct callout' can be on the stack) just for this bit. -- John Baldwin From andrew-freebsd at areilly.bpc-users.org Wed Aug 27 01:20:24 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Wed Aug 27 01:20:48 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <20080826162807.GF16977@elvis.mu.org> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> <20080826074943.GB85357@duncan.reilly.home> <20080826162807.GF16977@elvis.mu.org> Message-ID: <20080827011949.GA98242@duncan.reilly.home> On Tue, Aug 26, 2008 at 09:28:07AM -0700, Alfred Perlstein wrote: > * Andrew Reilly [080826 00:51] wrote: > > I've been using 7-STABLE on amd64 for a long time, and haven't > > noticed any problems with Java or SBCL lisp or PLT-scheme, all > > of which use JIT code generation (but probably neither use > > jemalloc?) > > mprotect(2)? Fair enough. Good to know that it's actually tweaking the NX permissions, I guess. The man page seems a little vague about when it might succeed, and what effect it might have... Cheers, -- Andrew From rwatson at FreeBSD.org Wed Aug 27 09:10:56 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Aug 27 09:11:02 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: <20080827011949.GA98242@duncan.reilly.home> References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> <20080826074943.GB85357@duncan.reilly.home> <20080826162807.GF16977@elvis.mu.org> <20080827011949.GA98242@duncan.reilly.home> Message-ID: On Wed, 27 Aug 2008, Andrew Reilly wrote: > On Tue, Aug 26, 2008 at 09:28:07AM -0700, Alfred Perlstein wrote: >> * Andrew Reilly [080826 00:51] >> wrote: >>> I've been using 7-STABLE on amd64 for a long time, and haven't noticed any >>> problems with Java or SBCL lisp or PLT-scheme, all of which use JIT code >>> generation (but probably neither use jemalloc?) >> >> mprotect(2)? > > Fair enough. Good to know that it's actually tweaking the NX permissions, I > guess. The man page seems a little vague about when it might succeed, and > what effect it might have... We're behind on the not-mapping-writable stuff, so for better (and worse) quite a few such things in application have been faulted in by other operating systems already. That doesn't mean there won't be issues, but does have the redeeming aspect that things should be less bumpy for us going forward. Hopefully we can start making that progress a bit more quickly... Robert N M Watson Computer Laboratory University of Cambridge From attilio at freebsd.org Wed Aug 27 16:55:55 2008 From: attilio at freebsd.org (Attilio Rao) Date: Wed Aug 27 16:56:02 2008 Subject: Kernel decontextualization -- idea and little proof-of-concept Message-ID: <3bbf2fe10808270955y53b00587m1991e7bf898466e1@mail.gmail.com> Looking at VFS (but this is not a VFS-only e-mail), what immediately pops up is that KPI is rather heavy, someway complicated and too little user-friendly (in particular in regard of locking). Some of this heaviness cames from direct approaching some peculiar problems (vnodes recycling in primis) but another part is totally old-standing and cames out from wrong (or anyways acrossed) assumptions. One of this latter case is the widespread presence of "thread" arguments in the whole kernel. Among all the subsystems, the most plagued is the VFS. You can see a lot of unuseful (or partially, better explained later) thread passed all over the VOP_* functions, their consumers and their callee and this reflects on a lot of key structures too (uio, componentname, namei, etc.). My idea is that we should drop totally this thread bloat from our subsystems (and in particular from VFS) because it is nosense and it adds just obfuscation and overhead for eventual consumer of the KPI. This also will prepare a better ground for further VFS improvements like, for example, namei KPI refactoring and reorganization, file accessings, etc. Small Q&A about possible concerns: Q: Sometimes we need to pass a thread in order to get his credentials, how you will handle this? A: We will simply get the ucred pointed and will switch the thread argument to be a credential Q: curthread accesses are heavy, will this work penalyze kernel performances? A: This work is intended in order to drastically reduce thread pointers movement. This means that, ideally, this will get in having a lot less curthread accesses than the old code. Q: Ideally, you have to complete the whole work fastly but still keeping patches in mealpieces that people can review and test, how do you plan to handle this? A: There is not a simple solution for this. I will try to put a lot of effort in order to have good-sized patches and to do it as fast as I can. For an example, please, look at this patch: http://www.freebsd.org/~attilio/vfsattr.diff which does remove the thread argument from VOP_GETATTR / VOP_SETATTR couplet. There is still room for refinements here, but I need to cleanup other VOP_* functions before. Q: You have been good in avoiding the biggest concern, but here we go! What about MFC troubles caming from a massive KPI breakage? A: I know that MFC to 7 and 6 will became a PITA (in particular for VFS consumers) but this is a good moment in order to decide if we want to keep catering a old-standing (bogus) approach like that or if we want to operate a full cleanup. This will mean working also in the variety of consumers filesystem but with the help of a good line-up of testers I think I can handle this. Let me know what do you think about. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From chuckr at telenix.org Wed Aug 27 17:28:22 2008 From: chuckr at telenix.org (Chuck Robey) Date: Wed Aug 27 17:28:37 2008 Subject: an argument of make(1) Message-ID: <48B58846.6040400@telenix.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I am posting this to our arch list; if I'm wrong, we can move this. I want to argue for some changes to our great make(1) tool. I've long been a fan of make, and in particular, FreeBSD's make. In my own career, I've often had to do more custom creation of Makefiles, and while there are some folks here who definitely DO know our make better, I think I can honestly say I know it pretty well. In creating tools for customers, I've often been forced, unwillingly, to go to the GNU make tool. The reason is just one of compatibility. There are several different reasons for this, which I want to list: 1) while the GNU make has the -v to allow one to both identify the tool and the version, FreeBSD's make has no such facility, that I'm aware of. You can't test & capture the type of make here, except by the rather inadequate trapping of getting no response at all. 2) while some parts of our make don't advertise it, they can be made compatible to the gmake's tools. "include" is a good example (the FreeBSD "include" docs claim that it only works as ".include", but that's a prevarication (and a very good thing it is). What I'd like to argue for is that some things like "if" have their compatibility with gmake enhanced. No, don't make it a mirror image, just make it possible for a programmer to craft a limited set of tests that will work in both places. If you give programmers the ability to detect what make they're in, the ability to craft a limited set of compatible tests, and also the other side (endif stuff) then everything else for portability can be done by using those limits. If something like this were done, then it allows a programmer, finally, to write a REALLY portable makefile. It would still allow one to make use of all of make(1)'s great command set, but not to kill it's use in a gmake-only system. OK, that'st the major argument. I'm going to ask for one thing here, but it's truly extra, just my own bias's showing thru. I wish that a fair number of the changes that have gone into make be taken back. I'm not talking about those that enhanced it's operation, I'm talking about all of the changes that, while trying to increase the elegance of the code, have really destroyed it's porrting portability, in a major way. Make used to depend on a smaller set of libraries, and those libraries were largely those available on other systems, so the task for a programmer, to port our make(1) to a different platform wasn't all that hard to do. Nowadays, with a great number of the changes having been crafted to change dependencies to FreeBSD-only tools, it's a real bear to port it. The code involved is nearly all very, very portable; it's the way that the libraries have been constituted that makes porting this a really bad job. If someone would make up a libmake.so.1, which in itself could be make really portable, that would also go a great long way to improving the popularity of our make(1). I'm NOT asking to roll back any of the distinct improvements that have gone in, only the changes that ruined it's porting-ability (yea, that's portabilty, I just wanted to really point that out again). OK, if someone were to come up with swuch a set of changes, would they be dead on arrival? I know that no one gets prior approval for FreeBSD (I completely agree with that), just didn't want to be totally at odds with everyone, if I'm the only one who sees it this way. Thanks for your time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAki1iEUACgkQz62J6PPcoOkniQCfWg+wlDrQ6rC+g2jGip12Q1VF koQAnRv4Sjs6xnebEEipKcGF1lXYZmRP =6ike -----END PGP SIGNATURE----- From alfred at freebsd.org Wed Aug 27 17:45:31 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Wed Aug 27 17:45:38 2008 Subject: an argument of make(1) In-Reply-To: <48B58846.6040400@telenix.org> References: <48B58846.6040400@telenix.org> Message-ID: <20080827174530.GZ16977@elvis.mu.org> The only excuse I have for top posting here is that I agree with the entirety of this email. -Alfred * Chuck Robey [080827 10:28] wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I am posting this to our arch list; if I'm wrong, we can move this. I want to > argue for some changes to our great make(1) tool. > > I've long been a fan of make, and in particular, FreeBSD's make. In my own > career, I've often had to do more custom creation of Makefiles, and while there > are some folks here who definitely DO know our make better, I think I can > honestly say I know it pretty well. In creating tools for customers, I've > often been forced, unwillingly, to go to the GNU make tool. The reason is just > one of compatibility. There are several different reasons for this, which I > want to list: > > 1) while the GNU make has the -v to allow one to both identify the tool and the > version, FreeBSD's make has no such facility, that I'm aware of. You can't test > & capture the type of make here, except by the rather inadequate trapping of > getting no response at all. > > 2) while some parts of our make don't advertise it, they can be made compatible > to the gmake's tools. "include" is a good example (the FreeBSD "include" docs > claim that it only works as ".include", but that's a prevarication (and a very > good thing it is). What I'd like to argue for is that some things like "if" > have their compatibility with gmake enhanced. No, don't make it a mirror image, > just make it possible for a programmer to craft a limited set of tests that will > work in both places. > > If you give programmers the ability to detect what make they're in, the ability > to craft a limited set of compatible tests, and also the other side (endif > stuff) then everything else for portability can be done by using those limits. > > If something like this were done, then it allows a programmer, finally, to write > a REALLY portable makefile. It would still allow one to make use of all of > make(1)'s great command set, but not to kill it's use in a gmake-only system. > > OK, that'st the major argument. I'm going to ask for one thing here, but it's > truly extra, just my own bias's showing thru. I wish that a fair number of the > changes that have gone into make be taken back. I'm not talking about those > that enhanced it's operation, I'm talking about all of the changes that, while > trying to increase the elegance of the code, have really destroyed it's porrting > portability, in a major way. Make used to depend on a smaller set of libraries, > and those libraries were largely those available on other systems, so the task > for a programmer, to port our make(1) to a different platform wasn't all that > hard to do. Nowadays, with a great number of the changes having been crafted to > change dependencies to FreeBSD-only tools, it's a real bear to port it. > > The code involved is nearly all very, very portable; it's the way that the > libraries have been constituted that makes porting this a really bad job. If > someone would make up a libmake.so.1, which in itself could be make really > portable, that would also go a great long way to improving the popularity of our > make(1). I'm NOT asking to roll back any of the distinct improvements that have > gone in, only the changes that ruined it's porting-ability (yea, that's > portabilty, I just wanted to really point that out again). > > OK, if someone were to come up with swuch a set of changes, would they be dead > on arrival? I know that no one gets prior approval for FreeBSD (I completely > agree with that), just didn't want to be totally at odds with everyone, if I'm > the only one who sees it this way. > > Thanks for your time. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (FreeBSD) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAki1iEUACgkQz62J6PPcoOkniQCfWg+wlDrQ6rC+g2jGip12Q1VF > koQAnRv4Sjs6xnebEEipKcGF1lXYZmRP > =6ike > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- - Alfred Perlstein From peter at wemm.org Wed Aug 27 20:13:02 2008 From: peter at wemm.org (Peter Wemm) Date: Wed Aug 27 20:13:09 2008 Subject: FreeBSD and DEP aka "NX bit"? In-Reply-To: References: <3c1674c90808231713x47e42de5oa9fc2f2f244d2e74@mail.gmail.com> <20080826074943.GB85357@duncan.reilly.home> <20080826162807.GF16977@elvis.mu.org> <20080827011949.GA98242@duncan.reilly.home> Message-ID: On Wed, Aug 27, 2008 at 2:10 AM, Robert Watson wrote: > On Wed, 27 Aug 2008, Andrew Reilly wrote: > >> On Tue, Aug 26, 2008 at 09:28:07AM -0700, Alfred Perlstein wrote: >>> >>> * Andrew Reilly [080826 00:51] >>> wrote: >>>> >>>> I've been using 7-STABLE on amd64 for a long time, and haven't noticed >>>> any problems with Java or SBCL lisp or PLT-scheme, all of which use JIT code >>>> generation (but probably neither use jemalloc?) >>> >>> mprotect(2)? >> >> Fair enough. Good to know that it's actually tweaking the NX permissions, >> I guess. The man page seems a little vague about when it might succeed, and >> what effect it might have... > > We're behind on the not-mapping-writable stuff, so for better (and worse) > quite a few such things in application have been faulted in by other > operating systems already. That doesn't mean there won't be issues, but > does have the redeeming aspect that things should be less bumpy for us going > forward. Hopefully we can start making that progress a bit more quickly... I recall seeing config.h code chunks to turn sections of the stack on/off for execution on (I think) sparc64. It might have been for netbsd. If my memory serves correctly, libgcc grew code to do mprotect(), and the gcc code generator would call it as appropriate when it needed to do its magic. I think this was for an older version of gcc though. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From andrew-freebsd at areilly.bpc-users.org Wed Aug 27 23:39:02 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Wed Aug 27 23:39:08 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080820214627.C30593@desktop> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> <20080819134005.GA85664@duncan.reilly.home> <20080820214627.C30593@desktop> Message-ID: <20080827233831.GA16705@duncan.reilly.home> On Wed, Aug 20, 2008 at 09:47:01PM -1000, Jeff Roberson wrote: > On Tue, 19 Aug 2008, Andrew Reilly wrote: > >I haven't tried nice -20 because I don't want the priority to > >drift or change, which is something that I thought the normal > >levels did. I'll give it a go though, and report back. > > With such a low cpu utilization I wouldn't expect it's the scheduling > algorithm. It may be a difference in preemption settings. Is preemption > enabled in both kernels? I've just done a set of tests with setprio(... -20) vs rtprio(...10), and with SCHED_ULE vs SCHED_4BSD. The results are essentially as I reported before except that regular prio -20 seems to be just as reliable as rtprio 10 under 4BSD and just as unhelpful under _ULE. To summarise: SCHED_ULE: rtprio 10: network activity causes audio underruns SCHED_ULE: setprio -20: network activity causes audio underruns SCHED_4BSD: rtprio 10: no audio underruns SCHED_4BSD: setprio -20: no audio underruns For what it's worth, my audio buffering setup has a fragment size of 0.7ms, but several buffers. How is device driver activity prioritized? Does the scheduler in use effect how device interrupts are handled, as well as user-land tasks? I have kernels built with both schedulers sitting arround on this machine now, so it's easy to switch back and forth if there are some specific tests that I could do or other information that I could provide. Cheers, Andrew From keramida at freebsd.org Thu Aug 28 01:26:11 2008 From: keramida at freebsd.org (Giorgos Keramidas) Date: Thu Aug 28 01:26:18 2008 Subject: an argument of make(1) In-Reply-To: <48B58846.6040400@telenix.org> (Chuck Robey's message of "Wed, 27 Aug 2008 13:00:54 -0400") References: <48B58846.6040400@telenix.org> Message-ID: <8763pmt011.fsf@kobe.laptop> On Wed, 27 Aug 2008 13:00:54 -0400, Chuck Robey wrote: > I am posting this to our arch list; if I'm wrong, we can move this. I > want to argue for some changes to our great make(1) tool. > > I've long been a fan of make, and in particular, FreeBSD's make. In > my own career, I've often had to do more custom creation of Makefiles, > and while there are some folks here who definitely DO know our make > better, I think I can honestly say I know it pretty well. In creating > tools for customers, I've often been forced, unwillingly, to go to the > GNU make tool. The reason is just one of compatibility. There are > several different reasons for this, which I want to list: I can certainly feel the `pain' of reluctantly choosing GNU make, or even of going the automake/autoconf/libtool way. I have written a fair amount of build glue myself too, and I know that I'd love to have BSD make in other systems too. Since the topic of (re)using FreeBSD make on non-BSD platforms seems to pop up more or less every 4-6 months the last year, I've started at least two attempts at porting 'FreeBSD make' to non BSD systems. The last attempt was aiming for a 'clean' set of minimal changes to the source of make. And it's done. At least the *binary* of make builds and tries to pull bsd.*.mk files on Linux and Solaris 10 here. The actual *source* code changes needed to build a BSD-make executable on Linux are pretty 'small': keramida@mithra:/home/keramida/tools$ uname -a Linux mithra 2.6.24-19-generic #1 SMP Fri Jul 11 23:41:49 UTC 2008 i686 GNU/Linux keramida@mithra:/home/keramida/tools$ hg short -r1 1:664527963082 | 2008-07-25 16:47 +0300 | keramida: Import FreeBSD make-20080724 snapshot (svn 180782) The diff from the first snapshot import of FreeBSD make touches only a few files: keramida@mithra:/home/keramida/tools$ hg diff -r 1:tip bin/make | diffstat -p1 bin/make/Makefile | 8 +++++- bin/make/compat.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++ bin/make/compat.h | 36 +++++++++++++++++++++++++++ bin/make/job.c | 8 +++++- bin/make/main.c | 17 ++++++++++++ bin/make/pathnames.h | 2 - 6 files changed, 135 insertions(+), 3 deletions(-) keramida@mithra:/home/keramida/tools$ Now I'm looking for some spare time to clean-up the changes a bit and integrate them in `http://hg.hellug.gr/bmake/gker/'. The tricky part is, however, that a lot of the functionality of make(1) depends on the `share/mk/bsd.*.mk' files. I have finished writing the automake glue that installs these in `$prefix/share/mk' but now I am kind of stuck while thinking about a good way to make the bsd.*.mk files actually usable on, say, FreeBSD, Linux and Solaris. > 1) while the GNU make has the -v to allow one to both identify the > tool and the version, FreeBSD's make has no such facility, that I'm > aware of. You can't test & capture the type of make here, except by > the rather inadequate trapping of getting no response at all. Both -v and -V are taken, and I don't really like the idea of adding long options to make(1). Maybe we can partially ``hijack'' the -v option to also display a verbose version string, i.e.: % make -v FreeBSD make (version 5200408120) make: no target to make. % It's probably ok to print a version number when 'being extra verbose' :) > OK, that'st the major argument. I'm going to ask for one thing here, > but it's truly extra, just my own bias's showing thru. I wish that a > fair number of the changes that have gone into make be taken back. > I'm not talking about those that enhanced it's operation, I'm talking > about all of the changes that, while trying to increase the elegance > of the code, have really destroyed it's porrting portability, in a > major way. Make used to depend on a smaller set of libraries, and > those libraries were largely those available on other systems, so the > task for a programmer, to port our make(1) to a different platform > wasn't all that hard to do. Nowadays, with a great number of the > changes having been crafted to change dependencies to FreeBSD-only > tools, it's a real bear to port it. make(1) is actually pretty 'easy' to compile on, say, Linux, as I wrote above. The only bits that are missing on Linux are: * An errc() function. I added one in compat.c of my diffstat output above, and this accounts for most of the lines in the patches. * Our make(1) uses arc4random(). #ifdef HAVE_ARC4RANDOM and a few lines to call srandom() or srandomdev() and random() when it's unavailable are probably `ok' for this. * FreeBSD make uses getprogname(), but saving argv[0] and re-using that is trivial to add when HAVE_GETPROGNAME if undefined. * On FreeBSD/pc98 systems make tries to get the value of the "machdep.ispc98" by calling sysctlbyname("machdep.ispc98", ...). Since this part of the code is to be able to build make on pre-7.0 FreeBSD/pc98 systems, it may be ok to #ifdef HAVE_SYSCTLBYNAME and ignore this compatibility code on non-FreeBSD systems. This pretty much sums all the source code changes I had to make to build on Linux and Solaris. I'll see how much of the changes I can clean up and post online at `http://hg.hellug.gr/bmake/gker/'. It should be pretty easy, so please ping me in a couple of days if I haven't followed up with an "ACK, all done now". The 'porting' of share/mk/bsd.*.mk files may be a bit trickier, and it may even turn out to be much more difficult. I can certainly use all the help I can get there... - Giorgos From joseph.koshy at gmail.com Thu Aug 28 03:34:10 2008 From: joseph.koshy at gmail.com (Joseph Koshy) Date: Thu Aug 28 03:34:28 2008 Subject: an argument of make(1) In-Reply-To: <48B58846.6040400@telenix.org> References: <48B58846.6040400@telenix.org> Message-ID: <84dead720808272007l9643d45idcbbafa384185370@mail.gmail.com> > hard to do. Nowadays, with a great number of the changes having been crafted to > change dependencies to FreeBSD-only tools, it's a real bear to port it. If you are looking for a portable BSD-compatible make(1) you could try the "portable" version of NetBSD's make(1). This is available at: http://www.crufty.net/help/sjg/bmake.html There are a small number of differences between the languages accepted by this make and FreeBSD make, but for the most part they are compatible. Koshy From gary.jennejohn at freenet.de Thu Aug 28 10:37:14 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Thu Aug 28 10:37:26 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080827233831.GA16705@duncan.reilly.home> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> <20080819134005.GA85664@duncan.reilly.home> <20080820214627.C30593@desktop> <20080827233831.GA16705@duncan.reilly.home> Message-ID: <20080828123708.45964271@peedub.jennejohn.org> On Thu, 28 Aug 2008 09:38:31 +1000 Andrew Reilly wrote: > On Wed, Aug 20, 2008 at 09:47:01PM -1000, Jeff Roberson wrote: > > On Tue, 19 Aug 2008, Andrew Reilly wrote: > > >I haven't tried nice -20 because I don't want the priority to > > >drift or change, which is something that I thought the normal > > >levels did. I'll give it a go though, and report back. > > > > With such a low cpu utilization I wouldn't expect it's the scheduling > > algorithm. It may be a difference in preemption settings. Is preemption > > enabled in both kernels? > > I've just done a set of tests with setprio(... -20) vs > rtprio(...10), and with SCHED_ULE vs SCHED_4BSD. The results > are essentially as I reported before except that regular prio > -20 seems to be just as reliable as rtprio 10 under 4BSD and > just as unhelpful under _ULE. > > To summarise: > > SCHED_ULE: rtprio 10: network activity causes audio underruns > SCHED_ULE: setprio -20: network activity causes audio underruns > SCHED_4BSD: rtprio 10: no audio underruns > SCHED_4BSD: setprio -20: no audio underruns > > For what it's worth, my audio buffering setup has a fragment > size of 0.7ms, but several buffers. How is device driver > activity prioritized? Does the scheduler in use effect how > device interrupts are handled, as well as user-land tasks? > > I have kernels built with both schedulers sitting arround on > this machine now, so it's easy to switch back and forth if there > are some specific tests that I could do or other information > that I could provide. > Ah yes, but do you have options PREEMPTION set, which was Jeff's question? --- Gary Jennejohn From des at des.no Thu Aug 28 11:03:39 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Aug 28 11:05:07 2008 Subject: an argument of make(1) In-Reply-To: <48B58846.6040400@telenix.org> (Chuck Robey's message of "Wed, 27 Aug 2008 13:00:54 -0400") References: <48B58846.6040400@telenix.org> Message-ID: <86zlmx4d11.fsf@ds4.des.no> Chuck Robey writes: > [BSD make should really be GNU make] cd /usr/ports/devel/automake110 make install clean DES -- Dag-Erling Sm?rgrav - des@des.no From rwatson at FreeBSD.org Thu Aug 28 11:14:58 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Thu Aug 28 11:15:05 2008 Subject: Kernel decontextualization -- idea and little proof-of-concept In-Reply-To: <3bbf2fe10808270955y53b00587m1991e7bf898466e1@mail.gmail.com> References: <3bbf2fe10808270955y53b00587m1991e7bf898466e1@mail.gmail.com> Message-ID: On Wed, 27 Aug 2008, Attilio Rao wrote: > Small Q&A about possible concerns: > Q: Sometimes we need to pass a thread in order to get his credentials, > how you will handle this? > A: We will simply get the ucred pointed and will switch the thread > argument to be a credential I tend to agree with the approach you are proposing, and have been considering similar changes for the network stack for much the same reasons. Thre points: (1) We may need to explicitly pass one or more credentials in places where we don't currently do so. This is certainly true in the network stack, and similar considerations in VFS wouldn't surprise me. Most frequently, this construct is used when work occurs in an asynchronous context from the requesting thread and use of the original authorizing credential is required. (2) Keep a careful eye out for cases where an implicit use of the passed thread is to establish context for copyin(9). You might argue that copyin(9) should accept an address space or thread argument and then assert that it is the current one... (3) Take a look at the on-going virtualization work, as we may want to apply the same virtualization techniques to VFS in the future, in which case we'll need to make sure that the approaches are compatible. Robert N M Watson Computer Laboratory University of Cambridge From andrew-freebsd at areilly.bpc-users.org Thu Aug 28 14:28:33 2008 From: andrew-freebsd at areilly.bpc-users.org (Andrew Reilly) Date: Thu Aug 28 14:28:39 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080828123708.45964271@peedub.jennejohn.org> References: <20080819025019.GA27997@duncan.reilly.home> <20080818215813.H952@desktop> <20080819134005.GA85664@duncan.reilly.home> <20080820214627.C30593@desktop> <20080827233831.GA16705@duncan.reilly.home> <20080828123708.45964271@peedub.jennejohn.org> Message-ID: <20080828224129.5fa7c8da@duncan.reilly.home> On Thu, 28 Aug 2008 12:37:08 +0200 Gary Jennejohn wrote: > Ah yes, but do you have options PREEMPTION set, which was Jeff's question? I believe that I answered that question in an earlier post, but for what it's worth, the answer is an emphatic "yes": PREEMPTION is turned on in GENERIC (along with _KPOSIX_PRIORITY_SCHEDULING and SMB), and my (posted) kernel config is essentially include GENERIC, turn off I486_CPU and I586_CPU, and override SCHED_ULE (or not). So unless the config include mechanism is broken, I've got PREEMPTION, (and so has nearly everyone else). Cheers, Andrew From areilly at bigpond.net.au Thu Aug 28 22:53:13 2008 From: areilly at bigpond.net.au (Andrew Reilly) Date: Thu Aug 28 22:53:20 2008 Subject: SCHED_ULE problem: slow single processor, realtime prio vs network stack In-Reply-To: <20080828071804.GA54269@duncan.reilly.home> References: <20080827233831.GA16705@duncan.reilly.home> <000c01c908db$f78d9180$01000001@china.huawei.com> <20080828071804.GA54269@duncan.reilly.home> Message-ID: <20080828225300.GA51771@duncan.reilly.home> On Thu, Aug 28, 2008 at 05:18:04PM +1000, Andrew Reilly wrote: > Hi Jian, > > On Thu, Aug 28, 2008 at 03:01:59PM +0800, ?? wrote: > > I found the network interrupt thread might take too long to run if > > net.isr.direct=1 > > > > I suspect your problem might be because the network kernel thread spend so > > long time that the sound interrupt could not find time slot to process. > > That sounds like what I think is happening, but I'm still > curious about why the same network stack manaages to be > interrupted by the audio driver when running the 4BSD scheduler, > but not the ULE sheduler. > > > You might just try to turn netisr off when running ULE > > > > sysctl -w net.isr.direct=0 > > I'll give that a try, as soon as possible. As promised to Jian, here's my report on how or whether that helped: no. If anything, it seemed to make the network-induced breakup of the audio timing a little worse, but I did no measurements to verify that impression. Thanks for the suggestion, though. Cheers, Andrew From chuckr at telenix.org Thu Aug 28 23:53:34 2008 From: chuckr at telenix.org (Chuck Robey) Date: Thu Aug 28 23:53:41 2008 Subject: an argument of make(1) In-Reply-To: <20080827174530.GZ16977@elvis.mu.org> References: <48B58846.6040400@telenix.org> <20080827174530.GZ16977@elvis.mu.org> Message-ID: <48B73A0B.401@telenix.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alfred Perlstein wrote: > The only excuse I have for top posting here is that I agree > with the entirety of this email. Well, thanks, I wanted to get a feeling if I was alone in feeling this way; I guess I'm not. With this in mind, I've ported our BSD make tool twice (I lost my first try, felt fairly stupid for having to do it twice). Now I have this much agreement, I feel good enough about this to begin my own effort to make all the changes I asked for. I intend to code up changes in separate patches, so folks won't get a "all or nothing" option from me. My only problem in this is, since my health went bad on me, now I'm kinda more disabled than I used to me, I can't work as many hours as I used to, so it's definitely going to make me longer than most others. On top of this, I honestly don't cooperate all that well. I can take orders fine, but I generally either do it all or let YOU do it all, I'm just not a really great team player. So, if you have the time, leave it to me, I'll get it done, but if you want it soon, well, you might want to do it yourself. Other than that, I don't much see where this needs more discussion right now, unless someone has some more suggestions about what changes you want. I'm only (myself) interested in code portability, version options, and make conditionals test capabilities. I don't really want to craft a copy of GNU make. We can discuss this more when I have patches to show off. Once again, thanks for the clear, rapid response. > > -Alfred > > * Chuck Robey [080827 10:28] wrote: > I am posting this to our arch list; if I'm wrong, we can move this. I want to > argue for some changes to our great make(1) tool. > > I've long been a fan of make, and in particular, FreeBSD's make. In my own > career, I've often had to do more custom creation of Makefiles, and while there > are some folks here who definitely DO know our make better, I think I can > honestly say I know it pretty well. In creating tools for customers, I've > often been forced, unwillingly, to go to the GNU make tool. The reason is just > one of compatibility. There are several different reasons for this, which I > want to list: > > 1) while the GNU make has the -v to allow one to both identify the tool and the > version, FreeBSD's make has no such facility, that I'm aware of. You can't test > & capture the type of make here, except by the rather inadequate trapping of > getting no response at all. > > 2) while some parts of our make don't advertise it, they can be made compatible > to the gmake's tools. "include" is a good example (the FreeBSD "include" docs > claim that it only works as ".include", but that's a prevarication (and a very > good thing it is). What I'd like to argue for is that some things like "if" > have their compatibility with gmake enhanced. No, don't make it a mirror image, > just make it possible for a programmer to craft a limited set of tests that will > work in both places. > > If you give programmers the ability to detect what make they're in, the ability > to craft a limited set of compatible tests, and also the other side (endif > stuff) then everything else for portability can be done by using those limits. > > If something like this were done, then it allows a programmer, finally, to write > a REALLY portable makefile. It would still allow one to make use of all of > make(1)'s great command set, but not to kill it's use in a gmake-only system. > > OK, that'st the major argument. I'm going to ask for one thing here, but it's > truly extra, just my own bias's showing thru. I wish that a fair number of the > changes that have gone into make be taken back. I'm not talking about those > that enhanced it's operation, I'm talking about all of the changes that, while > trying to increase the elegance of the code, have really destroyed it's porrting > portability, in a major way. Make used to depend on a smaller set of libraries, > and those libraries were largely those available on other systems, so the task > for a programmer, to port our make(1) to a different platform wasn't all that > hard to do. Nowadays, with a great number of the changes having been crafted to > change dependencies to FreeBSD-only tools, it's a real bear to port it. > > The code involved is nearly all very, very portable; it's the way that the > libraries have been constituted that makes porting this a really bad job. If > someone would make up a libmake.so.1, which in itself could be make really > portable, that would also go a great long way to improving the popularity of our > make(1). I'm NOT asking to roll back any of the distinct improvements that have > gone in, only the changes that ruined it's porting-ability (yea, that's > portabilty, I just wanted to really point that out again). > > OK, if someone were to come up with swuch a set of changes, would they be dead > on arrival? I know that no one gets prior approval for FreeBSD (I completely > agree with that), just didn't want to be totally at odds with everyone, if I'm > the only one who sees it this way. > > Thanks for your time. _______________________________________________ freebsd-arch@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAki3OgsACgkQz62J6PPcoOl00gCgjo1YlwlDSrcahO8DK+VijWx/ w90An028vyzNklvjS97D/yv5qO+IDquP =ADNQ -----END PGP SIGNATURE----- From info at seminarscompany.com Sun Aug 31 16:46:43 2008 From: info at seminarscompany.com (Seminars Company) Date: Sun Aug 31 16:47:08 2008 Subject: Direcci =?utf-8?b?w7M=?= n de Organizaciones de la Sociedad Civil:. Message-ID: <20080831161638.EC6E039819B@ws07.host4g.com> Si no puede ver correctamente este mensaje haga clic aqui